<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop of IT-professionals on Artificial Intelligence, October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Intelligent GAN-Based Framework for High-Resolution Satellite Imagery Enhancement</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Sineglazov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Holovachov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>37 Prospect Beresteiskyi, Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State university "Kyiv aviation institute"</institution>
          ,
          <addr-line>1 Lubomyra Huzara ave., Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>We propose a novel design that incorporates a dual-discriminator strategy: one focused on ensuring global image consistency, and the other on preserving local texture accuracy. Experiments conducted on benchmark datasets demonstrate that our approach surpasses prior solutions both quantitatively and in terms of perceptual quality. The proposed architecture is particularly efective at recovering fine textures and sharp edges while maintaining a natural appearance. We investigate the influence of diferent loss function combinations and training strategies on the fidelity-perceptual quality trade-of. Our findings contribute to the growing field of deep learning-based super-resolution, ofering practical insights into architectural refinements that balance computational eficiency with output quality.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;super-resolution</kwd>
        <kwd>generative adversarial networks</kwd>
        <kwd>satellite imagery</kwd>
        <kwd>deep learning</kwd>
        <kwd>dual discriminator</kwd>
        <kwd>aerial imagery</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The super-resolution (SR) problem implies the task of recovering low-resolution (LR) images to their
high-resolution (HR) counterpart. The technology has been used in a wide spectrum of applications:
medical imaging, satellite photography, and even surveillance systems. An especially important domain
is unmanned aerial vehicle (UAV) navigation, where high-quality visual data directly influences the
accuracy and reliability of flight control and decision-making. Previous research highlights the role of
intelligent visual navigation systems of high accuracy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], suppression of noise in visual navigation
systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], landmarks-based navigation software [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and integrated complexes for UAV detection
and identification [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These studies emphasize the demand for enhanced image clarity and robustness,
making SR techniques highly relevant for UAV applications. Despite its recent emergence, this field has
already accumulated a variety of methods.
      </p>
      <p>In this paper, we address limitations of existing methods by proposing a novel architecture that
extends recent generative adversarial network (GAN) based image synthesis methods and incorporates
extensions for super-resolution tasks. Our approach attempts to overcome the present limitations
through a dual-discriminator architecture that significantly improves the quantitative and qualitative
attributes of super-resolution outputs.</p>
      <p>Overall, our main contributions: (1) We propose dual-discriminator architecture which simultaneously
addresses two diferent aspects of image reconstruction: local features precision and global image
coherence. With this design, our model is able to generate high-resolution outputs with both fine-grained
texture details and general image consistency. (2) In advance, we propose to conduct an investigation
of diferent loss functions to determine which one has the most impact on GAN performance. This
analysis provides insightful information on the inherent trade-ofs between conflicting optimization
objectives and has practical relevance to architecture design decisions.</p>
      <sec id="sec-1-1">
        <title>1.1. Traditional Super-Resolution Techniques</title>
        <p>
          The traditional techniques for super-resolution are mostly consist of diferent interpolation techniques
such as bicubic, bilinear, and nearest neighbor algorithms [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ]. While these techniques are
computationally eficient, they tend to generate blurry images with insuficient texture detail. More advanced
techniques employed example-based methods [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ] and sparse coding models[
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ], which were
superior but poor in reconstructing complex patterns.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Deep Learning for Super-Resolution</title>
        <p>
          Implementation of deep learning has led to significant advancements in the field of image
superresolution [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The initial work was carried out by Dong et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] using SRCNN, it demonstrated
that a moderately deep convolutional neural network could outperform traditional methods. This
was followed by huge number of CNN-based approaches. Kim et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] presented VDSR, which
employed residual learning and much deeper architectures to further improve the performance. Shi et al.
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] proposed ESPCN with eficient sub-pixel convolution layer for upscaling with less computational
complexity by extracting features directly from low-resolution inputs.
        </p>
        <p>
          With improvements in network architectures, Lai et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] introduced LapSRN, a progressive
reconstruction approach based on a Laplacian pyramid architecture. Lim et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] created EDSR, which
removed the redundant modules from base ResNet architectures and significantly expanded network
capacity. Zhang et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] created RDN, which combined residual and dense connections to allow
feature extraction at multiple levels. These works demonstrated that neural network architectures could
achieve significant improvements in reconstruction quality, particularly with fine-tuning for pixel-wise
accuracy metrics such as PSNR and SSIM.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. GAN-based Super-Resolution</title>
        <p>Generative Adversarial Networks resulted in a huge leap in super-resolution research. Ledig et al. [18]
introduced SRGAN, the first GAN-based approach to photo-realistic super-resolution, with perceptual
loss functions along with adversarial training. ESRGAN [19] also refined this approach with residual
in residual dense blocks and relativistic adversarial loss. Zhang et al. [20] proposed SFTGAN, which
incorporated semantic segmentation information to promote structural consistency. Despite these
refinements, GAN-based methods still have a tendency to sufer from loss of small details quality and
are hard to train.</p>
      </sec>
      <sec id="sec-1-4">
        <title>1.4. Dual-Discriminator Approaches</title>
        <p>Recent research has shown that the utilization of more than one discriminator can enhance GAN
performance on various tasks. For example, in the field of image-to-image translation, Isola et al. [ 21]
employed a PatchGAN discriminator to successfully model local textures. Taking this concept further,
Wang et al. [22] introduced multi-scale discriminators for high-resolution image synthesis. In the
context of super-resolution, Sajjadi et al. [23] utilized a texture discriminator alongside a standard
discriminator, but focused primarily on texture transfer rather than structural coherence.</p>
        <p>Our contribution consist in implementation of dual discriminator, designed to capture both local
texture coherence and global image consistency. The local discriminator operates on randomly sampled
patches, ensuring realistic textures and preserving fine details, while the global discriminator evaluates
the overall composition and perceptual quality of the entire image. By combining these complementary
perspectives, the approach efectively overcomes the typical trade-of between sharp, realistic textures
and coherent global structure.</p>
      </sec>
      <sec id="sec-1-5">
        <title>1.5. Loss Functions</title>
        <p>The choice of loss function plays crucial role in determining model performance. Conventional
pixelwise loss functions like L1 and L2 are efective for optimizing PSNR but often generate overly smooth and
less detailed outputs. Johnson et al. [24] initiated perceptual loss, which leverages feature activations
from pre-trained network to capture semantic information beyond pixel-level accuracy. Wang et al.
[25] proposed the LPIPS metric, which has good correlation with human judgment and was employed
as an evaluation metric and training objective.</p>
        <p>Adversarial losses have evolved from the initial GAN formulation to more stable formulations, such as
WGAN [26], WGAN-GP [27], and relativistic GAN [28]. These formulations mitigate training instability
often encountered in GAN-based super-resolution. Additionally, various approaches combine diferent
content losses, adversarial losses, and regularization terms to achieve an optimal balance between
reconstruction fidelity and perceptual quality.</p>
        <p>Our proposed method advances these baseline approaches by addressing their limitations through
a novel dual-discriminator architecture, tailored specifically for super-resolution task. We conduct
a comprehensive analysis of various loss function combinations to identify training strategies that
optimally balance reconstruction accuracy with perceptual quality.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <p>The primary objective is to reconstruct low-resolution input image  and generate its super-resolved
counterpart . During training, the high-resolution serves as groun truth reference. Bicubic
interpolation is used to downsample images and create the low-resolution inputs.</p>
      <sec id="sec-2-1">
        <title>2.1. Super Resolution Generative Adversarial Network</title>
        <p>Following Goodfellow et al. [29], the GAN framework can be defined as a generator  and discriminator
. The primary goal is to optimize generator  together with discriminator  to address the adversarial
min-max problem (1).</p>
        <p>min max E∼ data ()[log ()] + E∼ ()[log(1 − (()))]
(1)</p>
        <p>Here () represents the distribution of real data samples, () is the prior distribution of input
noise variables, () generates fake samples, and () outputs the probability that  came from the
real data distribution rather than the generator’s distribution. In the context of super-resolution, the
generator  maps low-resolution images to high-resolution counterparts, while the discriminator 
attempts to distinguish between real high-resolution images and the generator’s super-resolved outputs.</p>
        <p>SRGAN has a conventional GAN structure: a discriminator and a generator, without auxiliary models,
but their structures are modified to address super-resolution task Fig. 1 and 2.</p>
        <p>For our research, we will adopt SRGAN as the foundational model. This choice is driven by its proven
ability to generate high-resolution images. By leveraging SRGAN’s architecture, which combines
adversarial training with perceptual loss, we can ensure that the resulting images not only have
improved pixel density but also maintain visual realism. Its established success in super-resolution tasks
provides a reliable starting point for further modifications and optimizations tailored to our specific
objectives.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dual Discriminator Structure</title>
        <p>Building upon SRGAN’s architecture, the modified model for our task retains the original generator
to preserve its high-quality image reconstruction capabilities. The primary change lies in the
discriminator, which is replaced by a dual-discriminator setup designed to evaluate diferent aspects of the
generated images Fig. 3. This modification enables us to investigate how employing two complementary
discriminators influences both the training dynamics and the final image quality.</p>
        <p>Generator G takes a low-resolution image  as an input and upscales it to produce super-resolution
image . Two discriminators 1 and 2, then receive both generated super-resolution image  and
original high-resolution image  as inputs. Their task is to distinguish between real high-resolution
and synthetically generated image</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Structure of the global discriminator</title>
        <p>The structure of the global discriminator consists of three main parts: a downsampling block, a
bottleneck and a residual block. Their interaction allows for efective classification of the input images.
The architecture of the global discriminator is illustrated in Fig. 4.</p>
        <p>The network takes as input an image that passes through an initial convolutional layer with a kernel
size of 4x4 and 64 feature maps, and then a Leaky ReLU layer. Next, let’s take a closer look at the
structure of each block and the task they perform when processing the input image.</p>
        <p>The first part is the downsampling block. It consists of 5 blocks of a similar structure, including a
convolutional layer, batch normalization, Leaky ReLU, but in each subsequent block the number of
feature maps doubles. Such structure allows to gradually reduce the image discretization, which allows
the network to obtain image features at diferent sizes. Initial blocks receive basic patterns such as
edges and textures, when deeper blocks can recognize higher-level semantic features.</p>
        <p>Next part of the network is the sifting block. It also consists of a standard structure: a convolutional
layer with a kernel dimension of 1x1 and 1024 feature maps, a batch normalization, LeakyReLU, a
convolutional layer with a kernel dimension of 1x1 and 512 feature maps, and another batch normalization.
Its task is to compress information and create cross-channel integration. It also reduces the number of
parameters and thus the cost of calculations.</p>
        <p>The last part is the residual block. It contains of a convolutional layer of 1x1 kernels and 128 feature
maps, followed by a layer of batch normalization, Leaky ReLU, which are repeated twice with the only
diference being the size of the 3x3 kernel of the convolutional layer. After, there is a convolutional layer
of 3x3 kernels and 512 feature maps, batch normalization, and the elementwise sum. Its implementation
helps prevent the problem of vanishing gradients.</p>
        <p>In the final stage of the global discriminator, the output from the preceding blocks is passed through
a LeakyReLU activation, followed by a flattening layer and a linear activation function. This produces a
ifnal prediction, indicating whether the analyzed input image is classified as real or fake.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Structure of the local discriminator</title>
        <p>Unlike the global discriminator, which evaluates the image as a whole, the local discriminator focuses
on analyzing smaller regions. This approach enables the model to examine fine-grained features in
greater detail, such as textures, surface patterns, object edges, and other local structures.</p>
        <p>By assessing these localized areas, the discriminator can provide the generator with precise feedback
on the quality of each region. This, in turn, helps enhance local detail—particularly important when
restoring images containing numerous small objects and intricate features that might otherwise be lost
during resolution enhancement.</p>
        <p>The local discriminator has more compact architecture compared to the global discriminator,
consisting primarily of a single feature extraction block. It processes several cropped patches of fixed size, as
specified in the network parameters. The overall structure of the local discriminator is illustrated in
Fig. 5.</p>
        <p>Sliding window with determined parameters picks parts of the images, which are used as input of the
local discriminator. This parts of image pass through the following initial layers: convolutional with
4x4 kernel size and 64 feature maps and Leaky ReLU layer. Next, the main part begins, where features
are extracted from the input data. It consists of three blocks of similar structure: a convolutional layer
with 4x4 kernel size, batch normalization, and Leaky ReLU. The only diference between them is in
the size of the feature maps, which, with increasing depth, have the following sizes: 128, 256, and 512.
Finally, the obtained features are passed through a straightening layer and a linear activation function.</p>
        <p>Similarly to the global discriminator, the 4x4 kernel size allows us to maintain a balance between the
complexity of calculations and the capture of local features. This convolutional size also prevents any
loss of features.</p>
        <p>The combination of global and local discriminators improves the image analysis and thus increases
the percentage of correctly identified images. As a result, the generator has much more feedback, and
therefore better results.</p>
        <p>A sliding window, configured with predefined parameters, extracts image patches that serve as inputs
for the local discriminator. These patches first pass through an initial processing stage consisting of a
convolutional layer with a 4x4 kernel, 64 feature maps, and a Leaky ReLU activation.</p>
        <p>The main feature extraction stage follows, comprising three sequential blocks of identical structure.
Each block contains a convolutional layer with a 4x4 kernel, batch normalization, and a Leaky ReLU
activation. The primary diference between these blocks lies in the number of feature maps, which
increase with network depth: 128, 256, and 512, respectively. The extracted features are then flattened
and passed through a fully connected layer with a linear activation function.</p>
        <p>By combining the outputs of both the global and local discriminators, the network benefits from
additional perspectives: global structure evaluation and fine-grained local analysis. This synergy
improves image assessment accuracy, providing the generator with richer and more precise feedback,
that leads to higher-quality results.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Loss function</title>
        <p>Defining the loss function is a crucial factor for the quality of the network. In many similar works, a
common option is to use the mean square error (MSE) loss function [30, 31]. In this work, we use a
modified version of MSE (2), which incorporates additional image analysis through a separate VGG19
network [32].</p>
        <p>ℒ = ℒ + ℒ  + ℒ (2)</p>
        <p>Here ℒ is the common MSE loss function, ℒ  is the loss function associated with the VGG19
network and ℒ is a adversarial loss.
2.5.1. MSE loss
The pixel-by-pixel MSE loss (3) is well known function which measures the average squared diference
between corresponding pixels in predicted () and target  images of width  and height .
ℒ
 =
  =1 =1
1
∑︁ ∑︁ (︁ , − G (︀ )︀ ,︁) 2
(3)</p>
        <p>Since this loss function cannot provide enough information to support the recovery of small image
details, resulting in smoothing of textures and poor appearance of the generated image, we introduce
an additional perceptual loss function based on VGG19 [33], which will bring the results closer to
generating realistic images.
2.5.2. Perceptual loss function</p>
        <p>ℒ  = 2 * 10 −6 · ︀( ⃦⃦  (︀ )︀ −  (︀ )︀ ⃦⃦ 2︀) 2 (4)</p>
        <p>As previously noted, the perceptual loss function (4) uses a pre-trained VGG19 neural network [32] to
extract high-level feature representations of images. It is defined as the Euclidean distance between the
feature maps produced by VGG network (·) when applied to real image ( and generated image
.
2.5.3. Adversarial loss</p>
        <p>Here Global () and Local () are results of the local and global discriminators, respectively
and ℒ is Binary Cross-Entropy (BCE) [34].</p>
        <p>In addition we also utilize the adversarial part of our GAN network. The associated loss function
drives the generator  tooward producing more natural-looking images generation capable of deceiving
the network’s dual discriminator. Given that our model utilizes two discriminators, the loss function is
shown in (5), (6), and (7).</p>
        <p>The overall loss function efectively improves high-resolution image generation by integrating three
components: MSE, VGG, and adversarial loss. This integration achieves an optimal balance between
reconstruction accuracy, fine-detail preservation, and perceptual realism, making it a highly efective
approach for generating high-quality images.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Dataset</title>
        <p>The dataset used for training the network is the DOTA dataset [35]. This dataset contains 2806 aerial
images collected from various sensors and platforms, each of them is approximately 4000×4000 pixels
in size, although the dataset includes images ranging from 800×800 to 20,000×20,000 pixels.</p>
        <p>The images are taken from several sources, including Google Earth, GF-2 and JL-1 satellites, and
others. The dataset contains both RGB and grayscale images, providing a variety of spectral information
for analysis.</p>
        <p>DOTA covers a wide range of geographic locations, such as airports, agricultural fields, urban areas,
etc. and objects - airplanes, ships, cars, buildings, etc. Wide range of scales, orientations, and shapes
enriches training data, reducing overfitting and supporting efective image reconstruction across both
ifne and coarse details.</p>
        <p>However, original dataset does not fully meet the requirments for network training must be
preprocessed. Each image will be cropped to a fixed size of 1024x1024 pixels. Additionally, since some images
contain black borders, further actions will be applied to remove these artifacts and avoid potential
training issues. Examples of processed images are illustrated in Fig. 6.</p>
        <p>After preprocessing, the dataset contains 1,900 images, which are split into training (70%), validation
(20%), and test (10%) subsets. This refined dataset supports efective model training and reliable
performance evaluation.</p>
      </sec>
      <sec id="sec-2-7">
        <title>2.7. Results</title>
        <p>Initial training process for the generator lasted 50 epochs, after which adversarial training for double
discriminator GAN was conducted witrh a length of 1200 epochs with training parameters:
• Batch size: 16.
• Learning rate decay: 0.1.</p>
        <p>• Optimizer: Adam with parameters ℎ = 10−4 ;  1 = 0.9;  2 = 0.999.</p>
        <p>Model evaluated its performance every 5 epochs on a validation subset. According to visual
assessment, the model demonstrated quite good results. Fig. 7 demonstrates the results of increasing image
resolution.</p>
        <p>In this image, the left column shows the original low-resolution images from the dataset, the center
column presents the results produced by the modified generative adversarial network, and the right
column displays the original high-resolution images.</p>
        <p>To evaluate objective quality of super-resolution results, we computed numerical metrics including
Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure
(SSIM). Table 1 compares these values between initial and final training epochs.</p>
        <p>The results confirm the efectiveness of the developed modified generative adversarial network for
satellite image super-resolution. Similar metric values between training and test sets indicate good
model generalization without overfitting. Despite limitations in recovering fine details in complex
images, the model shows stable performance and is suitable for practical satellite image enhancement
with 4x resolution upscaling.</p>
      </sec>
      <sec id="sec-2-8">
        <title>2.8. Comparison with base model</title>
        <p>To further assess efectiveness of the dual discriminator model, we conducted an additional comparison
with a baseline version of the network containing only global discriminator. For a fair evaluation, both
models were configured identically and trained for the same number of epochs. After training the
baseline model for 1200 epochs, the results presented in Table 3 were obtained.</p>
        <p>The results clearly indicate that proposed double discriminator model outperforms base model in all
metrics. Moreover, the diference in the obtained values is significant.</p>
      </sec>
      <sec id="sec-2-9">
        <title>2.9. Comparison with RealESRGAN</title>
        <p>To evaluate the performance of our dual discriminator approach against state-of-the-art super-resolution
methods, we conducted a comparative analysis with RealESRGAN, a widely recognized model for
realworld image super-resolution. RealESRGAN employs a U-Net generator with skip connections and
spectral normalization, trained with a combination of L1 loss, perceptual loss, and adversarial loss using
a single discriminator architecture.</p>
        <p>For fair comparison, both models were evaluated on the same test dataset containing satellite images
with 4x upscaling factor. The RealESRGAN model was fine-tuned on our satellite image dataset to
ensure optimal performance for this specific task.</p>
        <p>The quantitative results demonstrate that our dual discriminator architecture achieves superior
performance across all evaluated metrics. Our model consistently outperforms RealESRGAN across all
metrics.</p>
        <p>Visual comparison reveals that our approach produces sharper edges and better preserves
finegrained details in satellite imagery, particularly in areas with complex textures such as urban regions
and vegetation boundaries. RealESRGAN produces smoothed results that lose important structural
information crucial for satellite image analysis applications. Fig. 8 presents a qualitative comparison
between the two methods.</p>
        <p>The improved performance can be attributed to the complementary roles of global and local
discriminators in our architecture. While the global discriminator ensures overall image coherence similar
to RealESRGAN’s approach, the local discriminator specifically focuses on enhancing fine details and
texture quality, preventing smoothing and leading to more realistic and detailed super-resolution results.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>We propose a novel approach for enhancing the resolution of satellite images using a GAN architecture
with dual discriminators. Each discriminator targets diferent quality features, addressing existing
limitations in image enhancement and enabling more precise visual improvements.</p>
      <p>The dual discriminator approach improves training stability and mitigates mode collapse, while the
use of a diverse training dataset increases model’s adaptability in diferent satellite conditions, such as
changing lighting conditions, seasonal fluctuations, and geographical terrain.</p>
      <p>Experimental evaluations on both validation and test datasets show strong performance. For the
test set, the model achieved MSE = 0.0089, PSNR = 26.95 dB, and SSIM = 0.7663. The generated images
demonstrate high visual fidelity and photorealistic detail.</p>
      <p>This research holds significant potential for practical applications in environmental monitoring,
urban planning, emergency response, and crop assessment and other fields where high-resolution
imagery is important for decision-making.</p>
      <p>In conclusion, the proposed dual-discriminator GAN ofers an efective and stable solution for satellite
image super-resolution, combining computational eficiency with substantial improvements in image
quality.</p>
    </sec>
    <sec id="sec-4">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4o in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp.
2472-2481).
[18] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... &amp; Shi, W. (2017).
Photorealistic single image super-resolution using a generative adversarial network. In Proceedings of
the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[19] Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., ... &amp; Change Loy, C. (2018). Esrgan: Enhanced
super-resolution generative adversarial networks. In Proceedings of the European conference on
computer vision (ECCV) workshops (pp. 0-0).
[20] Zhang, Y., Li, X., &amp; Zhou, J. (2019). SFTGAN: a generative adversarial network for pan-sharpening
equipped with spatial feature transform layers. Journal of Applied Remote Sensing, 13(2),
026507026507.
[21] Isola, P., Zhu, J. Y., Zhou, T., &amp; Efros, A. A. (2017). Image-to-image translation with conditional
adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 1125-1134).
[22] Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., &amp; Catanzaro, B. (2018). High-resolution image
synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference
on computer vision and pattern recognition (pp. 8798-8807).
[23] Sajjadi, M. S., Scholkopf, B., &amp; Hirsch, M. (2017). Enhancenet: Single image super-resolution
through automated texture synthesis. In Proceedings of the IEEE international conference on
computer vision (pp. 4491-4500).
[24] Johnson, J., Alahi, A., &amp; Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and
super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The
Netherlands, October 11-14, 2016, Proceedings, Part II 14 (pp. 694-711). Springer International
Publishing.
[25] Wang, Z., Simoncelli, E. P., &amp; Bovik, A. C. (2003, November). Multiscale structural similarity for
image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems &amp;
Computers, 2003 (Vol. 2, pp. 1398-1402). Ieee.
[26] Arjovsky, M., Chintala, S., &amp; Bottou, L. (2017, July). Wasserstein generative adversarial networks.</p>
      <p>In International conference on machine learning (pp. 214-223). PMLR.
[27] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., &amp; Courville, A. C. (2017). Improved training
of wasserstein gans. Advances in neural information processing systems, 30.
[28] Jolicoeur-Martineau, A. (2018). The relativistic discriminator: a key element missing from standard</p>
      <p>GAN. arXiv preprint arXiv:1807.00734.
[29] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... &amp; Bengio, Y.</p>
      <p>(2014). Generative adversarial nets. Advances in neural information processing systems, 27.
[30] Dong, C., Loy, C. C., He, K., &amp; Tang, X. (2015). Image super-resolution using deep convolutional
networks. IEEE transactions on pattern analysis and machine intelligence, 38(2), 295-307.
[31] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., ... &amp; Wang, Z. (2016). Real-time
single image and video super-resolution using an eficient sub-pixel convolutional neural network.</p>
      <p>In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1874-1883).
[32] Simonyan, K., &amp; Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556.
[33] Gatys, L., Ecker, A. S., &amp; Bethge, M. (2015). Texture synthesis using convolutional neural networks.</p>
      <p>Advances in neural information processing systems, 28.
[34] Ruby, U., &amp; Yendapalli, V. (2020). Binary cross entropy with deep learning technique for image
classification. Int. J. Adv. Trends Comput. Sci. Eng, 9(10).
[35] Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., ... &amp; Zhang, L. (2018). DOTA: A large-scale
dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 3974-3983).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sineglazov</surname>
            ,
            <given-names>V.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ischenko</surname>
          </string-name>
          , V.S.
          <source>Intelligent Visual Navigation System of High Accuracy 2019 IEEE 5th International Conference Actual Problems of Unmanned Aerial Vehicles Developments Apuavd 2019 Proceedings</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>127</lpage>
          ,
          <fpage>8943916</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Lutsky</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sineglazov</surname>
            ,
            <given-names>V.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishchenko</surname>
          </string-name>
          , V.S.
          <source>Suppression of Noise in Visual Navigation Systems2021 IEEE 6th International Conference on Actual Problems of Unmanned Aerial Vehicles Development Apuavd 2021 Proceedings</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Sineglazov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Landmarks navigation system</article-title>
          software2014
          <source>IEEE 3rd International Conference on Methods and Systems of Navigation and Motion Control Msnmc 2014 Proceedings</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>62</fpage>
          -
          <lpage>65</lpage>
          ,
          <fpage>6979731</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Sineglazov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>M. Multi-functional integrated complex of detection and identification of</article-title>
          <source>UAVs2015 IEEE 3rd International Conference Actual Problems of Unmanned Aerial Vehicles Developments Apuavd 2015 Proceedings</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>320</fpage>
          -
          <lpage>323</lpage>
          ,
          <fpage>7346631</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Keys</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>1981</year>
          ).
          <article-title>Cubic convolution interpolation for digital image processing</article-title>
          .
          <source>IEEE transactions on acoustics, speech, and signal processing</source>
          ,
          <volume>29</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1153</fpage>
          -
          <lpage>1160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Orchard</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>New edge-directed interpolation</article-title>
          .
          <source>IEEE transactions on image processing</source>
          ,
          <volume>10</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1521</fpage>
          -
          <lpage>1527</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Freeman</surname>
            ,
            <given-names>W. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>T. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pasztor</surname>
            ,
            <given-names>E. C.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Example-based super-resolution</article-title>
          .
          <source>IEEE Computer graphics and Applications</source>
          ,
          <volume>22</volume>
          (
          <issue>2</issue>
          ),
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Glasner</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Irani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2009</year>
          ,
          <article-title>September)</article-title>
          .
          <article-title>Super-resolution from a single image</article-title>
          .
          <source>In 2009 IEEE 12th international conference on computer vision</source>
          (pp.
          <fpage>349</fpage>
          -
          <lpage>356</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Image super-resolution via sparse representation</article-title>
          .
          <source>IEEE transactions on image processing</source>
          ,
          <volume>19</volume>
          (
          <issue>11</issue>
          ),
          <fpage>2861</fpage>
          -
          <lpage>2873</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Zeyde</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Protter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2010</year>
          , June).
          <article-title>On single image scale-up using sparserepresentations</article-title>
          .
          <source>In International conference on curves and surfaces</source>
          (pp.
          <fpage>711</fpage>
          -
          <lpage>730</lpage>
          ). Berlin, Heidelberg: Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Zgurovsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sineglazov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chumachenko</surname>
          </string-name>
          , E.
          <source>Classification and Analysis Topologies Known Artificial Neurons and Neural Networks Studies in Computational Intelligence</source>
          ,
          <year>2021</year>
          ,
          <volume>904</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>58</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loy</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Learning a deep convolutional network for image super-resolution</article-title>
          .
          <source>In Computer Vision-ECCV</source>
          <year>2014</year>
          : 13th European Conference, Zurich, Switzerland, September 6-
          <issue>12</issue>
          ,
          <year>2014</year>
          , Proceedings,
          <string-name>
            <surname>Part IV</surname>
          </string-name>
          13 (pp.
          <fpage>184</fpage>
          -
          <lpage>199</lpage>
          ). Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J. K.</given-names>
          </string-name>
          ,&amp;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Accurate image super-resolution using very deep convolutional networks</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          (pp.
          <fpage>1646</fpage>
          -
          <lpage>1654</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caballero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huszár</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Totz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aitken</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Real-time single image and video super-resolution using an eficient sub-pixel convolutional neural network</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          (pp.
          <fpage>1874</fpage>
          -
          <lpage>1883</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>W. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahuja</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Deep laplacian pyramid networks for fast and accurate super-resolution</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          (pp.
          <fpage>624</fpage>
          -
          <lpage>632</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Son</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nah</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mu Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Enhanced deep residual networks for single image super-resolution</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition workshops</source>
          (pp.
          <fpage>136</fpage>
          -
          <lpage>144</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Residual dense network for image super-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>