<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GAN-based Image Generation Techniques Exploiting Latent Vector Distribution and Edge Loss Methods on Limited Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yun-Gyeong Song</string-name>
          <email>songyg1020@gnu.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gun-Woo Kim</string-name>
          <email>gunwoo.kim@gnu.ac.kr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of AI Convergence Engineering, Gyeongsang National University</institution>
          ,
          <addr-line>Jinju</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ISE 2023: 2nd International Workshop on Intelligent Software Engineering</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Computer Science Gyeongsang National University</institution>
          ,
          <addr-line>Jinju</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Generative Adversarial Networks (GANs) have demonstrated remarkable capabilities in image generation, surpassing the performance of previous image generation models. However, GANs require large training datasets to facilitate proper learning. GANs have inherent problems such as the mode collapse problem, where identical images are generated, and instability problem, where the generator and the discriminator fail to form a successful adversarial relationship. These problems are particularly common when the availability of training data is limited. In this paper, we propose three techniques to address these challenges. Firstly, Common Feature Training (CFT) is introduced to enhance performance by training the Generator to recognize common features, thereby mitigating instability problems. Secondly, Mean Rescaling (MR) is employed to mitigate the mode collapse problem arising from sampling latent vectors with identical means and variances. Thirdly, an edge loss method is implemented, where the edge difference values between real and generated images are added to the GANs loss. This contributes to the classification of shapes, thereby mitigating the mode collapse problem and instability problem. Comparative experimental results illustrate improvements in the highlighted issues, and the performance enhancement is validated by metrics, namely Fréchet Inception Distance (FID) and Inception Score (IS).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Generative models have demonstrated remarkable performance improvements with the advent
of large datasets and deep neural networks. It is used for a variety of tasks, including inpainting
and image translation tasks [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ]. Generative Adversarial Networks (GANs) are a framework for
estimating generative models through adversarial learning of generators and discriminators. The
discriminator estimates the probability that the input data is real, while the generator generates
fake data that mimics the distribution of real data to deceive the discriminator. Due to these
characteristics, the discriminator is trained to distinguish real data from fake data, and the
generator is trained to produce fake data that closely approximates real data. Training GANs
inherently require a large dataset, as insufficient data can lead to instability problem such as a
lack of adversarial relationship between the generator and discriminator, and a mode collapse
problem, where the generator primarily produces an identical image [
        <xref ref-type="bibr" rid="ref1 ref11">1,11</xref>
        ].
      </p>
      <p>
        Extensive research has been conducted to mitigate the mode collapse problem and instability
problem in traditional GANs by improving the model’s architecture and loss function [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
However, most of these researches heavily rely on the use of large datasets. The basic solution to
mode collapse problem and instability problem is to augment the image or add more datasets
[
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ]. Otherwise, without such large datasets, producing diverse and high-quality data becomes
a challenge. In this paper, we propose following three techniques to address the problems of
0009-0008-6692-1925 (Y. G. Song); 0000-0001-5643-4797 (G. W. Kim)
© 2023 Copyright for this paper by its authors.
      </p>
      <p>Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>CEUR Workshop Proceedings (CEUR-WS.org)
instability and mode collapse. Figure 1 shows the learning process and architecture of our
proposed GAN model.</p>
      <p>1. Common Feature Training (CFT): We propose the Common Features Training (CFT) to
facilitate the Generator's training and establish an adversarial relationship between the
Generator and Discriminator to address the instability problem.
2. Mean Rescaling (MR): We propose the Mean Rescaling (MR), which rescales the mean of
the sampled vector z by replacing it with a random mean to mitigate the mode collapse
problem.
3. edge loss: We propose edge loss to inhibit the fast learning of the discriminator and use
GANs + edge loss to classify not only the authenticity of the image but also its shape, which
mitigates the problems of mode collapse and instability.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        GANs have a different approach to generating high-quality images compared to traditional
generative models, and they have demonstrated excellent performance in image generation since
the release of Deep Convolutional Generative Adversarial Networks (DCGAN), which was
changed to Deep Convolution Nets [
        <xref ref-type="bibr" rid="ref1 ref3">1,3</xref>
        ]. However, when generating an image with limited data,
the generator and discriminator may not form an adversarial relationship due to lack of data,
which leads to problems of instability and mode collapse.
      </p>
      <p>
        To address the mode collapse problem, Diverse and Limited Data Generative Adversarial
Networks (DeLiGAN) were proposed [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. DeLiGAN creates various components through a
Gaussian mixture model and reparametrizes the latent vector and latent space, through random
sampling. This research follows the approach of modifying the latent distribution to obtain
samples in the high-probability region [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, this approach involves the training of mean
and variance components, which can be time-consuming and does not address the instability
problem.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>
        Latent vectors are randomly sampled from a normal distribution but have entangled features
because one feature is related to another [16]. We sample from a random normal distribution,
the learning nature of the discriminator will cause the generator to synthesize an identical image
that will best fool the discriminator [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The mode collapse problem can be mitigated to an extent
by reparametrizing the distribution of the latent vector [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They can be trained with additional
information to direct the generative model and slow down the learning of the discriminator [15].
This mitigates the instability problem.
      </p>
      <p>distribution. w is the weight of the common feature,  
 
means each feature. Take an input of
sampling size  by a normal distribution and run it through a linear layout to get a trained  of
size 
+</p>
      <p>. In this procedure, each feature's value is learned in such a way that it reduces the loss
value, and a variable  with a common feature is obtained.</p>
      <p>FD is a quantification of the degree to which a feature is included, and choosing a high FD value
can result in less diverse data because the distribution learns detailed features. Conversely,
selecting a low FD value may result in a lack of common features being incorporated. Therefore, it
is important to set the correct FD value. Additionally, if the data does not have common features
between individual data points, the technique may not be very effective, so prior data analysis is</p>
      <sec id="sec-3-1">
        <title>3.1. Common Feature training</title>
        <p>Common Feature Training (CFT) is a technique to enhance the stability of learning. It consists of
reconstructing a feature vector z into a distribution that incorporates the common features of the
dataset, allowing for the learning of these common features. In this technique, we adopt a
hyperparameter called Feature Differences (FD). This hyperparameter reflects the extent to which
common features are considered and is initially set to 1. It is incremented by 1 until no instability
problem occurs beyond epoch 30, at which point we adopt an appropriate FD value. This helps the
generator learn to create an adversarial relationship. The equation of the reconstruction process
is as follows:
 =  + 
(1)
(2)
required.
follows:
follows:</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Edge loss</title>
      </sec>
      <sec id="sec-3-3">
        <title>3.2. Mean rescaling</title>
        <p>Mean Rescaling (MR) is a technique that involves adding a random average value, ranging from
1 to 1, to the feature vector  after it has undergone CFT. This technique ensures that each latent
vector  has a unique average, devised to mitigate the mode collapse problem occurring when the
initial latent vectors  have the same mean and variance. The equation of this technique is as
where  is the value of the potential vector generated by the CFT and μ is the mean value added
to  and is an arbitrary value between -1 and 1.</p>
        <p>MR enables the latent vector  to have the mean of a user-specified range. This technique
mitigates the mode collapse problem and overcomes the problem of generating only specific
classes of data, enabling the generation of a variety of data categories.</p>
        <p>Most of the instability problem is caused by the faster training of the Discriminator compared to
the Generator. To address this problem, we incorporate the similarity of the edges between the
generated image and the real image into the loss function. The equation of loss function is as
 ( ,  ) =   ~ 
( ) [log  ( ) + ∑( ( ) −  )2] +   ~  ( ) [log (1 − D(G(z)))]
(3)
difference between the image  ( ) generated by the Generator and the real image  , which also
allows the Discriminator to learn the shape of the image. The Generator generates images that
look like the real image through 1 − D(G(z)).</p>
        <p>Since the discriminator is responsible for distinguishing between real and generated data, it
does not recognize the shape of the generated data. However, with formula (3), the discriminator
takes shape into consideration. This technique not only ensures the reliability of the data but also
allows for a more accurate understanding of the image shape information. This mitigates the
mode collapse problem and mitigates the instability problem by slowing down the learning speed
of the Discriminator to compete with the Generator.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Metrics</title>
        <p>
          The experimental environment in this paper is as follows: We used PyTorch version 1.12.1, the
optimization algorithm is Adam [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the learning rate is 1e-2, and the batch size is 32. Figure 2
shows the training loss graph for each dataset and model, and we can see that our proposed GAN
is more stable in all datasets compared to other models.
        </p>
        <p>
          For the evaluation, we used Fréchet Inception Distance (FID) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and Inception Score (IS) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. FID
is a metric that estimates the similarity of the distribution of real and generated data and
calculates the distance, indicating how similar the two data are. The equation for FID is as follows:
Emoji
FID
where | −   | is the feature vector mean of the true image distribution and the generated
image distribution. ∑+ ∑ is the sum of the true and generated image covariance matrices, and
(∑∑ )2is the 1 square root matrix of the two covariances. Add | −   | to the value of Tr to get
the FID. The lower the FID, the more similar the generated data is to the real data.
        </p>
        <p>IS is another metric used to evaluate the quality and diversity of generated data, predicting the
class of generated data using the Inception Network. The equation for IS is as follows:
where   ( ( | )|| ( )) represents the Kullback-Leibler divergence between the predicted
image distribution for image  and the true image distribution. This quantifies the predictive
value of the difference between these two probability distributions. The higher the IS, the better
the performance in terms of quality and variety of data generated.</p>
        <p>The above metrics were used to compare the performance of each model and used to
demonstrate the performance of the proposed GAN.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Dataset</title>
        <p>In this paper, we conducted experiments using the Emoji Dataset and CIFAR Dataset. The Emoji
Dataset comprises 402 positive classes and 402 negative classes, totaling 804 images. The
CIFAR10 dataset consists of 10 classes (airplane, car, bird, cat, deer, dog, frog, horse, ship, truck), with
6,000 images per class, resulting in a total of 60,000 images. However, for this research, 1,000
images were used.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results and Performance</title>
        <p>In Figure 3, as well as in Table 1, the IS and FID values for each dataset and model are displayed.
Figure 3 reveals that our proposed GAN has the optimal performance, boasting the lowest FID
values and the highest IS. In the CIFAR-10 dataset of Figure 3, the FID of DCGAN is
overwhelmingly large in Figure 3-(a), so we exclude it and show the FID of DeLiGAN and our
proposed GAN in Figure 3-(b).</p>
        <p>When compared to DCGAN, our proposed GAN exhibits a 62.90% reduction in FID and a 30.77%
increase in IS on the emoji dataset, as well as a 99.57% reduction in FID and a 31.82% increase
in IS on the CIFAR-10 dataset. In comparison to DeLiGAN, our proposed GAN shows a 32.35%
reduction in FID and a 13.33% increase in IS on the emoji dataset, and a 66.67% reduction in FID
and an 11.54% increase in IS on the CIFAR-10 dataset.</p>
        <p>Examining the generated images in Figure 4, it becomes visually evident that our proposed
GAN outperforms others by generating images that are not only clearer but also more diverse.
This underlines that the model, enhanced with the proposed technique and compared against
DCGAN and DeLiGAN in experiments, produces exceptionally high-quality images that bear a
striking resemblance to real images.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we propose Common Feature training (CFT), Mean rescaling (MR), and edge loss to
resolve the learning instability problem and mode collapse problem.</p>
      <p>Common Feature training (CFT) is intended to train the latent vector z on the overall shape of
the data. This causes the generator to learn a common feature to compete with the discriminator.
whereby mitigates the instability problem.</p>
      <p>Mean rescaling (MR) is a technique to mitigate the mode collapse problem caused by a latent
vector z sampled from a distribution with the same mean. Mitigate the mode collapse problem by
sampling latent vectors z with different means.</p>
      <p>Edge loss is a loss function that adds the difference between the edge of real data and the edge
of generative data to GANs loss and does not simply classify whether it is real or generative, but
also learns image shape information to generate various images and slows down D's learning
speed. whereby mitigate the mode collapse problem and instability problem. As a result, Figure 2
shows that the proposed GAN trains reliably compared to other models, and Figure 3 shows that
our proposed GAN produces higher quality and more diverse images compared to other models.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research was supported by Basic Science Research Program through the National Research
Foundation of Korea (NRF), funded by the Ministry of Education, Science and Technology
(NRF2021R1G1A1006381).
[15] MehdiMirza, SimonOsindero, "ConditionalGenerativeAdversarialNets", arXiv preprint
arXiv:1411.1784, 2014
[16] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel, "InfoGAN:
Interpretable Representation Learning by Information Maximizing Generative Adversarial
Nets", 30th Conference on Neural Information Processing Systems, 2016, pp 2180-2188</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , Jean Pouget Abadie, Mehdi Mirza, Bing Xu, David Warde Farley, Sherjil Ozair, Aaron Courville and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>"Generative adversarial nets"</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          , volume
          <volume>27</volume>
          ,
          <year>2014</year>
          , pp
          <fpage>139</fpage>
          -
          <lpage>144</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Xudong</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Qing</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Haoran</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Raymond Y.K. Lau</surname>
            ,
            <given-names>Zhen</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          and Stephen Paul Smolley Stephen,
          <article-title>"Least squares generative adversarial networks"</article-title>
          ,
          <source>Proceedings of the IEEE international conference on computer vision (ICCV)</source>
          ,
          <year>2017</year>
          , pp
          <fpage>2794</fpage>
          -
          <lpage>2802</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Luke Metz and
          <string-name>
            <given-names>Soumith</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <article-title>"Unsupervised representation learning with deep convolutional generative adversarial networks"</article-title>
          ,
          <source>in Proc. Int. Conf. Learn. Representations</source>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Swaminathan</given-names>
            <surname>Gurumurthy</surname>
          </string-name>
          , Ravi Kiran Sarvadevabhatla and
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Ven katesh Babu, "Deligan: Generative adversarial networks for diverse and limited data"</article-title>
          ,
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp
          <fpage>166</fpage>
          -
          <lpage>174</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Tommi</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Jaakkola</surname>
            and
            <given-names>Michael I. Jordan</given-names>
          </string-name>
          ,
          <article-title>"Improving the mean field approximation via the use of mixture distributions"</article-title>
          ,
          <source>NATO ASI Series D Behaviroural Social Sci.,</source>
          volume
          <volume>89</volume>
          ,
          <year>1998</year>
          , pp
          <fpage>528</fpage>
          -
          <lpage>534</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Heusel</surname>
          </string-name>
          , Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler and
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <article-title>"Gans trained by a two time-scale update rule converge to a local nash equilibrium"</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          , volume
          <volume>30</volume>
          ,
          <year>2017</year>
          , pp
          <fpage>6626</fpage>
          -
          <lpage>6637</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Salimans</surname>
          </string-name>
          , Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford and Xi Chen, “
          <article-title>Improved techniques for training gans”</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          , volume
          <volume>29</volume>
          ,
          <year>2016</year>
          , pp
          <fpage>2226</fpage>
          -
          <lpage>2234</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kingma</surname>
          </string-name>
          and Jimmy Lei Ba, ”
          <article-title>Adam: A method for stochastic optimization”</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980</source>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Phillip</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jun-Yan</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Tinghui Zhou, Alexei A.
          <string-name>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>"image-to-image translation with conditional adversarial networks"</article-title>
          ,
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)</source>
          ,
          <year>2017</year>
          , pp
          <fpage>1125</fpage>
          -
          <lpage>1134</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jun-Yan</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Taesung Park, Phillip Isola,
          <article-title>Alexei A. Efros, "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks"</article-title>
          ,
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          (ICCV),
          <year>2017</year>
          , pp
          <fpage>2223</fpage>
          -
          <lpage>2232</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Martin</surname>
            <given-names>Arjovsky</given-names>
          </string-name>
          , Léon Bottou,
          <article-title>"Towards Principled Methods for Training Generative Adversarial Networks"</article-title>
          ,
          <source>In Proc. ICLR</source>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Connor</surname>
            <given-names>Shorten</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taghi M. Khoshgoftaar</surname>
          </string-name>
          ,
          <article-title>"A survey on image data augmentation for deep learning"</article-title>
          ,
          <source>Journal of Big Data</source>
          , volume
          <volume>6</volume>
          ,
          <year>2019</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Dan</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Anna Khoreva,
          <article-title>"PA-GAN: Improving GAN Training by Progressive Augmentation"</article-title>
          ,
          <source>In Proc. ICLR</source>
          ,
          <year>2018</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Tim</surname>
            <given-names>Salimans</given-names>
          </string-name>
          , Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, Xi Chen,
          <article-title>"Improved techniques for training gans"</article-title>
          ,
          <source>Advances in neural information processing systems 29</source>
          ,
          <year>2016</year>
          , pp
          <fpage>2234</fpage>
          -
          <lpage>2242</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>