<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lifelong VAEGAN for inpainting of damaged image regions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Sineglazov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yehor Khomik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>Beresteiskyi Ave., 37, Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The work is devoted to the automatic restoration (inpainting) of hidden or damaged areas of images. To solve the problem, a hybrid generative-adversarial neural network Lifelong VAEGAN with a bufer of previous samples and a U-Net generator with skip connections and a self-attention mechanism is proposed. A review of modern methods of image inpainting and lifelong learning is conducted, and for the first time a modular Lifelong VAEGAN is proposed, which is capable of efectively restoring images thanks to a dual-encoder architecture and a self-attention block.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Lifelong VAEGAN</kwd>
        <kwd>self-attention</kwd>
        <kwd>inpainting</kwd>
        <kwd>latent space</kwd>
        <kwd>replay</kwd>
        <kwd>U-Net generator</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Artificial intelligence has become indispensable in scientific and engineering disciplines by improving
the accuracy of UAV visual navigation and suppressing sensor noise [1, 2] ,by providing hybrid ensemble
neural network architectures for advanced analytics [3], and by forming systematic taxonomies of
multicriteria optimization methods and neural network topologies that guide both fundamental research and
practical development [4, 5].</p>
      <p>In particular, image inpainting has a wide range of applications. For example, in satellite images,
clouds often cover the surface of the image [6, 7]; there is also the problem of restoring old archival
photographs that have been damaged over time [8] and restoring masked human faces [9]. Previously,
it was necessary to retake pictures or redraw images manually, now all this can be automated thanks to
artificial intelligence.</p>
      <p>All these tasks share a common underlying principle. Any image restoration task requires embedding
prior assumptions about the structure of the restored area. It is necessary to take into account the
structure, spectral, semantic properties, and global consistency with the context.</p>
      <p>Previously, images were restored in two main ways: by difusing pixels from the edges of the intact
area inward [10] and by copying similar fragments from entire areas [11, 12]. Such methods were
good at taking local textures into account, but ignored the overall semantics of the images, which
could lead to inconsistencies [13, 14]. Modern models have significantly improved the quality of
automatic reconstruction. While generative adversarial networks (GANs) produce realistic structures,
variational autoencoders (VAEs) provide stable learning and interpretable latent representations [15, 16].
Combining these approaches preserves good quality and semantic consistency of the reconstructed
regions [17, 18].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Lifelong VAEGAN architecture and features</title>
      <p>In the literature, most image restoration models are trained on only one set of images, without further
replenishment on others. With the popularization of artificial intelligence, it is very important that
the user can restore any images, so the ability to perform lifelong learning is necessary. However,
sequential training on new samples leads to catastrophic forgetting. When training on new data, the
network forgets knowledge from previous domains [19, 20, 21].</p>
      <p>Lifelong VAEGAN (L-VAEGAN) is a generative model that combines VAE and GAN architectures
and provides the ability to train sequentially on multiple datasets [22]. L-VAEGAN, unlike previous
generative replay methods that did not have a separate autoencoder, is able to build a common latent
space for all domains.</p>
      <p>The basic variables in the architecture of the L-VAEGAN model are:
•  is an input image from a specific domain. In continual learning, there is a sequence of image
sets {1, 2, . . . , }, each of which corresponds to a certain domain;
•  is a continuous latent variable that encodes the style or content of the image;
•  is a discrete latent variable that reflects the categorical content of the image. We do not use
it in our implementation, but in many datasets, for example with animals, it is necessary for
high-quality restoration;
•  is a discrete domain variable (dataset). In lifelong learning, the encoder estimates which domain
 belongs to, and the generator reproduces the image, having received a certain .</p>
      <p>These variables build a probabilistic model of images from continuous factors such as style and object
variations and discrete factors such as class and domain.</p>
      <p>The main components of Lifelong VAEGAN are an inference encoder [23], a generator, a discriminator,
and a generative replay mechanism. The inference encoder encodes the input image into a set of latent
variables. For each input image, a vector of means  () and variances  2() is generated, which
define the distribution ( | ). Next, a reparameterisation trick is used to select the value  for the
generator. In general, the L-VAEGAN generator is a decoder that builds images from latent variables.
In the context of VAE, the generator reconstructs the original image from the code, in the context of
GAN, it generates images that are so realistic that the discriminator cannot distinguish them from
the real ones. However, in our implementation, the generator has an additional encoder that extracts
the usual spatial features. The discriminator receives real or generated images as input and learns to
distinguish them. It is a critic of the generator.Thus, the generator’s task is to produce an image that the
discriminator cannot distinguish from real ones, and the discriminator’s task is to distinguish between
real and generated images.</p>
      <p>Another important mechanism of L-VAEGAN is generative replay. During replay, the generator
reproduces samples from previous domains, thus the model does not forget them (Figure 1).</p>
      <p>However, in our implementation, generative replay was abandoned, and a bufer of previous samples
was used [24, 25].</p>
      <p>Although the original Lifelong VAEGAN documentation stated the ineficiency of the bufer [ 22], in
the inpainting task it showed better results. Instead of restoring damaged generated images, a small
sample of real ones was used. When training on multiple datasets, no memory overrun problems were
observed, but the GPU overhead was lower. One of the reasons for abandoning generative replay is
its tendency to produce less diverse images due to the shift toward the new domain. With generative
replay, the more training epochs there are, the poorer the samples the model can reproduce, which
impairs the retention of prior knowledge. However, when training on a large number of diferent
domains, generative replay may have better results. Figure 2 shows the architecture of the Lifelong
VAEGAN model.</p>
      <p>The left part (Figure 2a) shows the generator and discriminator, which are trained adversarially.
The right part (Figure 2b) shows the inference network [22]. The encoder estimates the distribution
( | ) and the domain classifier  ( | ).</p>
      <p>However, in our implementation, the domain variable is estimated directly from the input image
( | ). The generator produces a restored image ′, which is then compared by the discriminator
with the original images.</p>
      <p>
        The Lifelong VAEGAN model has two learning phases - Wake and Dreaming [26]. In the first, the
generator and discriminator are trained adversarially, in the second, the variational autoencoder part is
updated on real and generated data. In the Wake phase, the network is trained using the Wasserstein
GAN loss with gradient penalty (WGAN-GP). The loss function is shown in equation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
min max GAN(, ) = E ∼(), ∼(), ∼()
 
︀[ (︀ (, , ))︀]
      </p>
      <p>
        − E ∼() ︀[ ()]︀
+  E ˜∼(˜) (‖∇˜(˜)‖2 − 1) 2
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
• (, , ) is the generated image;
• (·) is the discriminator output;
• () is the prior distribution of the latent vector ;
• (), () are the prior distributions of the discrete variables  and ;
• Gradient penalty limits the discriminator’s gradient and prevents mode collapse.
In the Dreaming phase, the model learns to maximize the log-likelihood of the training data by optimizing
the ELBO, which is calculated by the formula (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ).
      </p>
      <p>
        VAE(, ; ) = E (,,|)︀[ log  ( | , , )]︀ −  KL︀[ (, ,  | ) ‖ (, , )]︀ .
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
• E(,,|)[log  ( | , , )] estimates the reconstruction accuracy;
• the second term (KL divergence) acts as a regularization, preventing overfitting and ensuring a
smooth latent space.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed model architecture</title>
      <p>The visual results show that the basic L-VAEGAN model can restore the damaged area. Figure 3 confirms
that the model correctly captures the domain and main facial features.</p>
      <p>However, factors such as hair, skin, and eye color do not quite correspond to the original area. It can
be concluded that the basic model has limitations in capturing various small details. To eliminate the
shortcomings, the following modifications were implemented.</p>
      <p>A U-Net generator was implemented, which forms a dual-encoder architecture. The model has both
a separate inference encoder and a generator with the U-Net architecture, which also has an encoder
that “extracts” spatial features. Figure 4 presents a general diagram of the updated architecture.</p>
      <p>The model has two processing paths. The inference encoder compresses the image to the latent vector
z and determines the domain label a. The U-Net generator encoder compresses the image, extracting
multi-level features. At the bottleneck level, the concatenation of [, ] occurs, their transformation
through a fully connected layer into a tensor of the desired size and the addition of spatial features. Also
at the bottleneck level, a self-attention block is added. After that, the decoder performs upsampling to
the original size taking into account skip connections.</p>
      <sec id="sec-3-1">
        <title>3.1. U-Net generator with two inputs</title>
        <p>The generator based on the U-Net architecture has a symmetric encoder-decoder path with skip
connections [27]. The encoder part of the generator consists of a cascade of Conv→Norm→ReLU
layers, which reduce the image dimension to 4 × 4 × 512 . In our implementation, the generator has
ifve encoder and decoder blocks. Since the input images had a size of 128 × 128 , at the fifth level the
feature size becomes 4 × 4 , further reducing it is ineficient, since such actions can lead to the loss of
information about the spatial location of objects. If, on the contrary, the network depth is made smaller,
the model generates more generalized and inconsistent objects.</p>
        <p>The inference encoder specifies the vector  and . Both encoders converge at the bottleneck level.
The following approach was used to combine information: a continuous vector  is concatenated with
a vector of the domain variable  (one-hot representation of the domain [28]. Then the resulting vector
passes through a fully connected layer, which expands it into a 4 × 4 × 512 tensor with subsequent
ReLU activation. The resulting tensor is treated as additional noise, then it is passed element by element
to the output bottleneck representation of the U-Net encoder. Such actions are a form of information
fusion, where the latent vector brings semantic features, and spatial features provide local context. As a
result, the enriched tensor is fed to the decoding part of the generator.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Decoder and skip connections</title>
        <p>The decoder performs sequential upsampling. Transposed convolutions (ConvTranspose2d) restore the
spatial size of the layer first to 8 × 8 , then to 16 × 16 and so on up to the original size of 128 × 128 .
At each level, the decoder receives a skip connection from the corresponding encoder layer of the
generator. Skip connections are very important for the image reconstruction task. Without their use,
the generator would reconstruct the image based only on the low-dimensional representation in the
bottleneck, which would lead to the loss of many structures and blurring of the result. Thanks to skip
connections, the decoder directly receives high-resolution features from the encoder that would be lost
during downsampling. In our implementation, skip connections are performed as a concatenation of
tensors. The output activations of the encoder blocks are combined by channels with the corresponding
decoder tensor after upsampling to the appropriate size.</p>
        <p>As a result, even considering that the latent vector  carries only the global information of the image,
local details are restored thanks to skip connections, which makes the reconstruction more consistent.</p>
        <p>At the output of the decoder, a filled image ′ is obtained, which realistically fills the masked areas.
As a result, thanks to the variational encoder, the images have high semantic consistency, and thanks to
U-Net+GAN they are reproduced with high reliability.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Self-attention module</title>
        <p>To improve the restoration of damaged areas, the self-attention mechanism was added to the generator
in the bottleneck zone. At this stage, the spatial size is very small, but the channel depth is high, so each
of the feature map elements contains information about a large area of the original sample. This makes
it possible to take into account long-range relationships between diferent parts of the image and to
coordinate them with each other. This is quite important for the continuation of the background and
symmetry of diferent parts of the objects. After adding self-attention, brightness mismatches across a
face were greatly reduced.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Wake and dreaming phases</title>
        <p>During the Wake phase, the model receives real data from the current domain. The inference encoder
calculates  and , while the generator restores damaged areas, acquiring new knowledge. The parameters
of the inference encoder and generator are updated based on the gradients from the reconstruction and
adversarial losses (the discriminator is also updated). During each iteration, examples of old domains
from the image bufer are mixed in [ 29], so we can say that the model "dreams" of past experiences
within the main learning phase. In our implementation, both encoders and the generator are involved
in the Wake phase, but are trained on a combination of new and bufer data, performing the role of
consolidation (dreaming) [30]. This approach is similar to the idea of of-line sleep in neural networks,
where during learning the model switches between repeating external cues and replaying internal
memories. Our result demonstrates that even with the bufering approach the model prevents the
forgetting of old skills. Figure 5 illustrates this.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Adding perceptual loss</title>
        <p>
          During training, an additional perceptual loss was used, which is calculated not at the pixel level, but at
the level of features extracted by the VGG16 model pre-trained on ImageNet. Minimizing only MSE
tends to blur fine textures by averaging them. For human perception, images can be similar if they
have the same structure, even if they are diferent pixel by pixel. Perceptual loss compares the outputs
of certain internal layers, and not pixel values. Layers were selected that cover both low-level and
high-level features. The original and generated images are passed through VGG16 and for each layer,
the feature tensors (orig) and (gen) are calculated. Perceptual loss is calculated as the average
1-norm of the diference of these features, summed over all selected layers, formula (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ):
perc(orig, gen) = ∑︁   ⃦⃦ (gen) −  (orig)⃦⃦ 1,
        </p>
        <p>
          However, MSE does not take into account the peculiarities of human perception. For this problem,
the peak signal-to-noise ratio (PSNR) was used. This metric shows how much the maximum signal
exceeds the existing error level, formula (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ).
        </p>
        <p>PSNR(, ˆ) = 10 log10
︂(</p>
        <p>MAXI2 )︂
MSE(, ˆ)
dB,
•  is the value of the -th pixel of the original image;
• ˆ is the value of the corresponding i-th pixel of the reconstructed image;
•  is the total number of pixels in the image.
• MAX is the maximum possible signal value;
• log10 is a decimal logarithm that converts the linear scale to a scale close to human perception.</p>
        <p>However, MSE and PSNR only take into account pixel diferences. To assess structural similarity, the
SSIM metric was used. This metric efectively takes into account changes that humans see (contrast,
texture). It is based on comparing the means, variances and covariances of local blocks of the original
(x) and reconstructed (y) images, formula (6)</p>
        <p>SSIM(, ) =</p>
        <p>(2   + 1) (2  + 2)
( 2 +  2 + 1) ( 2 +  2 + 2)
• ℒ is the set of indices of the selected layers;
•   are the weighting factors for each layer.
perception, without artifacts and with correct textures.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Image-quality assessment metrics</title>
      <p>As a result, thanks to the perceptual loss, results were obtained that are of higher quality for human
The quality of the restored images was described through the following indicators: MSE, PSNR, SSIM
and FID. None of the metrics covers all aspects of perception. Therefore, for better objectivity, one
metric is not enough.</p>
      <p>
        The mean square error (MSE) is defined as the arithmetic mean of the squares of the diference
between the pixels of the original and reconstructed image, formula (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ).
      </p>
      <p>
        MSE(, ˆ) =

=1

1 ∑︁(︀  − ˆ ︀) 2,
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(6)
formula (7)
•  ,   are the mean values of pixels in the local window for images  and ;
•  2,  2 are the variances of pixel values in the window for  and ;
•   is the covariance between corresponding pixels of the two images;
• 1, 2 are small positive constants that avoid division by 0.
      </p>
      <p>Frechet Inception Distance (FID) was also used, which compares statistical properties. Inception v3
neural network was used, which receives feature vectors that are compared as two Gaussian distributions,
FID = ‖  −</p>
      <p>︁(
‖22 + Tr Σ  + Σ  − 2 (Σ
 Σ )1/2)︁ ,
(7)
•  ,   - average values of feature vectors for real and generated images;
• Σ , Σ  - covariance matrices of these images;
• ‖ · ‖ 2 - Euclidean norm;
• Tr - trace of the matrix (sum of elements on the main diagonal).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and discussion</title>
      <p>The model was trained on two datasets: first on CelebA, then on Facade.</p>
      <p>The following results were obtained on CelebA: MSE = 0.205, PSNR = 25.65, SSIM = 0.9139, FID = 6.81.
According to the SSIM indicator, it is clear that the model perfectly preserves the key structural features
of the faces. FID indicates good generation quality: the generated images almost do not difer from the
real ones in terms of their features. The visual results of CelebA restoration are shown in Figure 6.</p>
      <p>On the Facade dataset, which contains a relatively small number of images, the results obtained are:
MSE = 0.303, PSNR = 23.94 dB, SSIM = 0.8643, FID = 44.24.</p>
      <p>Given the small amount of training data, these results are quite high.</p>
      <p>The visual results of Facade restoration are shown in Figure 7.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>An intelligent system for restoring damaged parts of images based on the Lifelong VAEGAN model has
been developed. The basic architecture has been extended to a dual-encoder architecture. A separate
inference encoder forms the latent vector, and a generator encoder with a U-Net structure preserves
spatial features. Then both encoders are combined into a bottleneck with a self-attention mechanism.
Thanks to skip connections, high-resolution details are transmitted to the decoder, while self-attention
better matches distant image areas. For lifelong training, a real-sample bufer was used, which mixes
data from previous domains into the current ones. Due to this, catastrophic forgetting was avoided.
Perceptual loss on VGG16 features was also added, which eliminates blurring and improves textural
plausibility. Results on the CelebA and Facade sets confirmed the efectiveness of the model. he results
(MSE = 0.205, PSNR = 25.65, SSIM = 0.9139, FID = 6.81) after training on the new domain are quite
convincing and show that catastrophic forgetting was avoided.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
artificial neurons and networks, in: Studies in Computational Intelligence, volume 904, Springer,
2021, pp. 1–58. doi:10.1007/978-3-030-48453-8_1.
[6] M. Xu, F. Deng, S. Jia, X. Jia, A. J. Plaza, Attention mechanism-based generative adversarial
networks for cloud removal in landsat images, Remote Sensing of Environment 271 (2022) 112902.
doi:10.1016/j.rse.2022.112902.
[7] M. Czerkawski, P. Upadhyay, C. Davison, A. Werkmeister, J. Cardona, R. Atkinson, C. Michie,
I. Andonovic, M. Macdonald, C. Tachtatzis, Deep internal learning for inpainting of cloud-afected
regions in satellite imagery, Remote Sensing 14 (2022) 1342. doi:10.3390/rs14061342.
[8] C. Mendoza-Dávila, D. Porta-Montes, W. Ugarte, Photorestorer: Restoration of old or damaged
portraits with deep learning, in: Proceedings of the 19th International Conference on Web Information
Systems and Technologies (WEBIST 2023), 2023, pp. 104–112. doi:10.5220/0012190000003584.
[9] W. Li, Z. Lin, K. Zhou, L. Qi, Y. Wang, J. Jia, MAT: Mask-aware transformer for large-hole
image inpainting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2022, pp. 10758–10768. doi:10.1109/CVPR52688.2022.01049.
[10] M. Bertalmío, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in: Proceedings of the 27th
Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’00), 2000, pp.
417–424. doi:10.1145/344779.344972.
[11] A. Criminisi, P. Pérez, K. Toyama, Object removal by exemplar-based inpainting, in: Proceedings
of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR ’03), 2003, pp. 721–728. doi:10.1109/CVPR.2003.1211538.
[12] I. Ostroumov, et al., Modelling and simulation of DME navigation global service volume, Advances
in Space Research 68 (2021) 3495–3507. doi:10.1016/j.asr.2021.06.027.
[13] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, A. A. Efros, Context encoders: Feature learning
by inpainting, in: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp. 2536–2544. doi:10.1109/CVPR.2016.278.
[14] S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent image completion, ACM</p>
      <p>Transactions on Graphics 36 (2017) 107:1–107:14. doi:10.1145/3072959.3073659.
[15] D. P. Kingma, M. Welling, Auto-encoding variational bayes, 2013. doi:10.48550/arXiv.1312.</p>
      <p>6114. arXiv:1312.6114.
[16] F. Yanovsky, Inferring microstructure and turbulence properties in rain through observations
and simulations of signal spectra measured with doppler-polarimetric radars, in: NATO Science
for Peace and Security Series C: Environmental Security, volume 117, 2011, pp. 501–542. doi:10.
1007/978-94-007-1636-0_19.
[17] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using
a learned similarity metric, in: Proceedings of the 33rd International Conference on Machine
Learning (ICML), 2016, pp. 1558–1566. doi:10.48550/arXiv.1512.09300, arXiv.1512.09300.
[18] N. Ruzhentsev, et al., Radio-heat contrasts of UAVs and their weather variability at 12 ghz, 20 ghz,
34 ghz, and 94 ghz frequencies, ECTI Transactions on Electrical Engineering, Electronics, and
Communications 20 (2022) 163–173. doi:10.37936/ecti-eec.2022202.246878.
[19] T. Lesort, H. Caselles-Dupré, M. Garcia-Ortiz, A. Stoian, D. Filliat, Generative models from the
perspective of continual learning, 2018. doi:10.48550/arXiv.1812.09111. arXiv:1812.09111.
[20] H. Shin, J. K. Lee, J. Kim, J. Kim, Continual learning with deep generative replay, in: Proceedings
of the 30th Conference on Neural Information Processing Systems (NeurIPS 2017), 2017, pp.
2990–2999. doi:10.48550/arXiv.1705.08690.
[21] K. Dergachov, et al., GPS usage analysis for angular orientation practical tasks solving, in:
Proceedings of the IEEE 9th International Conference on Problems of Infocommunications Science
and Technology (PICST), 2022, pp. 187–192. doi:10.1109/PICST57299.2022.10238629.
[22] F. Ye, A. G. Bors, Learning latent representations across multiple data domains using lifelong
VAEGAN, in: Proceedings of the European Conference on Computer Vision (ECCV 2020), volume
LNCS 12365, Springer, 2020, pp. 777–795. doi:10.1007/978-3-030-58565-5_46.
[23] J. Zhu, D. Zhao, B. Zhang, B. Zhou, Disentangled inference for gans with latently invertible
autoencoder, International Journal of Computer Vision 130 (2022) 1259–1276. doi:10.1007/
s11263-022-01598-5.
[24] J. Bang, H. Kim, Y. Yoo, J.-W. Ha, J. Choi, Rainbow memory: Continual learning with a memory
of diverse samples, in: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 2021, pp. 8218–8227. doi:10.1109/CVPR46437.2021.00812.
[25] A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. S. Torr, M. Ranzato,
On tiny episodic memories in continual learning, 2019. doi:10.48550/arXiv.1902.10486.
arXiv:1902.10486.
[26] G. E. Hinton, P. Dayan, B. J. Frey, R. M. Neal, The wake–sleep algorithm for unsupervised neural
networks, Science 268 (1995) 1158–1161. doi:10.1126/science.7761831.
[27] L. Yin, W. Tao, D. Zhao, T. Ito, K. Osa, M. Kato, T.-W. Chen, Unet–: Memory-eficient and
featureenhanced network architecture based on u-net with reduced skip-connections, in: Proceedings of
the 17th Asian Conference on Computer Vision (ACCV 2024), volume 15478, Springer, 2024, pp.
185–201. doi:10.1007/978-981-96-0963-5_11.
[28] S. Huang, C. He, R. Cheng, Sologan: Multi-domain multimodal unpaired image-to-image translation
via a single generative adversarial network, IEEE Transactions on Artificial Intelligence 3 (2022)
722–737. doi:10.1109/TAI.2022.3187384.
[29] D. Lopez-Paz, M. Ranzato, Gradient episodic memory for continual learning, in: Advances in
Neural Information Processing Systems 30 (NeurIPS 2017), ????, pp. 6467–6476. doi:10.48550/
arXiv.1706.08840.
[30] A. Sorrenti, G. Bellitto, F. P. Salanitri, M. Pennisi, S. Palazzo, C. Spampinato, Wake-sleep
consolidated learning, IEEE Transactions on Neural Networks and Learning Systems (2024) 1–12.
doi:10.1109/TNNLS.2024.3458440.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Ishchenko</surname>
          </string-name>
          ,
          <article-title>Intelligent visual navigation system of high accuracy</article-title>
          ,
          <source>in: 2019 IEEE 5th International Conference Actual Problems of Unmanned Aerial Vehicles Developments (APUAVD)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>127</lpage>
          . doi:
          <volume>10</volume>
          .1109/APUAVD47061.
          <year>2019</year>
          .
          <volume>8943916</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Lutsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Ishchenko</surname>
          </string-name>
          ,
          <article-title>Suppression of noise in visual navigation systems</article-title>
          ,
          <source>in: IEEE 6th International Conference on Actual Problems of Unmanned Aerial Vehicles Development (APUAVD)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1109/APUAVD53804.
          <year>2021</year>
          .
          <volume>9615405</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kot</surname>
          </string-name>
          ,
          <article-title>Design of hybrid neural networks of the ensemble structure</article-title>
          ,
          <source>EasternEuropean Journal of Enterprise Technologies</source>
          (
          <year>2021</year>
          )
          <fpage>31</fpage>
          -
          <lpage>45</lpage>
          . doi:
          <volume>10</volume>
          .15587/
          <fpage>1729</fpage>
          -
          <lpage>4061</lpage>
          .
          <year>2021</year>
          .
          <volume>225301</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zgurovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          , E. Chumachenko,
          <article-title>Classification and analysis of multicriteria optimization methods</article-title>
          ,
          <source>in: Studies in Computational Intelligence</source>
          , volume
          <volume>904</volume>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>174</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -48453-
          <issue>8</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zgurovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sineglazov</surname>
          </string-name>
          , E. Chumachenko,
          <article-title>Classification and analysis topologies known</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>