Efficiency Increasing of No‐Reference                                                                 Image         Quality
Assessment in UAV Applications1
Oleg Ieremeieva, Vladimir Lukina, Krzysztof Okarmab and Karen Egiazarianc
a
  National Aerospace University, Chkalova 17, Kharkiv, 61070, Ukraine
b
  West Pomeranian University of Technology in Szczecin, al. Piastów 17, Szczecin, 70-310, Poland
c
  Tampere University of Technology, Kalevantie 4, Tampere, FIN 33101, Finland


                 Abstract
                 Unmanned aerial vehicle (UAV) imaging is a dynamically developing field, where the
                 effectiveness of imaging applications highly depends on quality of the acquired images. No-
                 reference image quality assessment is widely used for quality control and image processing
                 management. However, there is a lack of accuracy and adequacy of existing quality metrics
                 for human visual perception. In this paper, we demonstrate that this problem persists for
                 typical applications of UAV images. We present a methodology to improve the efficiency of
                 visual quality assessment by existing metrics for images obtained from UAVs, and introduce
                 a method of combining quality metrics with the optimal selection of the elementary metrics
                 used in this combination. A combined metric is designed based on a neural network trained to
                 utilize subjective assessments of visual quality. The metric was tested using the TID2013
                 image database and a set of real UAV images with embedded distortions. Verification results
                 have demonstrated the robustness and accuracy of the proposed metric.
                 Keywords
                 image quality assessment, no-reference metric, visual quality, UAV images, correlation
                 analysis, artificial neural network

1. Introduction

   A scope of applications of drones and other unmanned aerial vehicles (UAVs) has expanded
rapidly in recent few decades. Since most of UAVs contain cameras, there is a growing interest in
analysis and processing of visual data. UAVs mainly use optical band cameras, thus, the existing
digital image processing solutions are applicable [1-2]. However, the mobility and autonomy of these
systems can impose significant restrictions and one must consider all these factors.
   The key problems of applying digital image processing in UAV applications are as follows:
   1. UAVs require an adaptive integrated approach to suppress the present noise, motion blur, and
   other typical distortions, which can only partially be compensated by a camera stabilization.
   2. For data transmission from drones, a wireless connection is used. The range and reliability of
   data transmission determine one of the key characteristics of UAVs – the flight range. In this
   sense, the efficiency of processing and compression of high-resolution data for transmission over a
   radio channel with limited bandwidth is decisive.
   3. The data obtained at the end device, in addition to storage and more complex post-processing,
   can be used in various applications. Among them, high level vision tasks based on machine
   learning, such as detection, recognition, etc., become more widespread [1, 3, 4].
   Common to all above mentioned challenges, there is a need to accurately assess image quality and
measure distortion parameters, which will be used in image reconstruction methods, to enhance the


The Sixth International Workshop on Computer Modeling and Intelligent Systems (CMIS-2023), May 3, 2023, Zaporizhzhia, Ukraine
EMAIL: o.ieremeiev@khai.edu (O. Ieremeiev); v.lukin@khai.edu (V. Lukin); okarma@zut.edu.pl (K. Okarma);
karen.eguiazarian@tuni.fi (K. Egiazarian)
ORCID: 0000-0001-7865-0570 (O. Ieremeiev); 0000-0002-1443-9685 (V. Lukin); 0000-0002-6721-3241 (K. Okarma);
0000-0002-8135-1085 (K. Egiazarian)
            © 2023 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org) Proceedings
visual quality of the acquired images. In addition, an effective lossy compression is required. Certain
results of UAV image processing have already been reported [5, 6, 7]. Nevertheless, robust methods
that can accurately assess the visual component and determine the optimal parameters for subsequent
image processing methods are required.
    Image quality assessment (IQA) is usually applied by visual quality metrics. To improve their
accuracy, some features of human perception are employed. There are two main classes of visual
quality assessment methods. Full-reference (FR) visual quality metrics are widely used to verify
image processing methods by evaluating the relative changes in image quality. No-reference (NR)
metrics assess the quality based on the characteristics of the image itself and can be applied as a tool
in many UAV applications [8, 9].
    There are many developed NR IQA methods, but their common problem is a low accuracy, due to
only limited amount of information available for analysis, and these metrics inability to accurately
separate image elements (textures, borders, gradients, etc.) from distortions (noise, blur, etc.) [10, 11].
    To design and verify visual quality metrics, special test image databases [11] are used. They
contain images distorted by certain types of distortions. For each image, a visual quality score (mean
opinion score (MOS)) is formed based on the results of a large number of subjective experiments with
volunteers. Correlation analysis between metric values and MOS serves as a quantitative indicator of
its compliance with human vision. Considering the most universal and large test image databases with
tens of distortion types such as TID2013 [12], the efficiency of no-reference metrics usually does not
exceed 0.5, according to the Spearman rank order correlation coefficient (SROCC).
    Fortunately, one can increase accuracy of IQA using existing methods through their joint use, e.g.,
using methods presented in [13, 14]. In this paper, we propose a method of combining no-reference
visual quality metrics based on an artificial neural network (ANN) that is focused on solving various
problems of processing UAV images. Since many tasks with UAVs require the mobility of computing
devices, the priority of this work is to ensure high accuracy of visual quality estimation while
maintaining acceptable performance.

2. The efficiency of metrics for UAV purposes

    Drones can lead to a significant amount of various distortions for an image during its acquisition,
processing, compression and transmission over a communication channel. In this regard, the design of
a combined metric requires the presence of test image databases that allow simulating such situations.
As a result of the analysis of many image databases [11], we have chosen TID2013.
    A distinctive feature of this image database is that it contains 24 types of various distortions,
including such unique ones as bit errors in the transmission of compressed images. TID2013 contains
25 reference images that have been distorted by 24 types of distortion at 5 levels of intensity, for a
total of 3000 test images. A complete list of distortions and their applicability to solving the current
problem is given in Table 1.
    Let us analyze the distortions listed in Table 1 and their relation to imaging from UAVs:
         Additive Gaussian noise (##1-2) is the basic model for representing most of the physical
    processes that cause noise. It is more pronounced in low light conditions.
         Spatially correlated noise (#3) is a characteristic of optical images due to the use of the Bayer
    filter or its modifications on sensors. It significantly increases with digital zoom.
         Impulse noise (#6) may be a manifestation of dead pixels and a lot of other causes such as
    coding/decoding artifacts.
         Quantization noise (#7) may occur during image acquisition and transformations.
         Blurring (#8) is one of the most relevant distortions due to the motion and vibrations of the
    UAV.
         Denoising (#9) is a manifestation of the noise reduction built into most cameras.
         Compression (##10-11) is a typical stage in the image processing chain to reduce data
    redundancy.
         Transmission errors (##12-13) are typical for wireless communication channels, especially
    over long distances.
      Changes in brightness, contrast and saturation (## 16-18) allow simulating changes in lighting
   conditions at different time instances of a day and weather conditions.
      Multiplicative noise (#19) is relevant because sensor noise is mostly signal-dependent.
      Noise (#20) allows the simulation of some artifacts of image processing and compression.
      Lossy compression of noisy images (#21) is a typical example of a real situation where an
   image with some noise is compressed.
      Chromatic aberration (#23) is a result of the refraction of light in the camera's optics.

Table 1
List of TID2013 distortions and their relevance for UAV purposes
   ##                              Distortion type                        Relevance for UAV imaging
    1                          Additive Gaussian noise                                +
    2                              Additive noise                                     +
                       (more intensive in color components)
    3                        Spatially correlated noise                                +
    4                               Masked noise                                       –
    5                           High‐frequency noise                                   –
    6                              Impulse noise                                       +
    7                            Quantization noise                                    +
    8                               Gaussian blur                                      +
    9                             Image denoising                                      +
   10                            JPEG compression                                      +
   11                          JPEG2000 compression                                    +
   12                         JPEG transmission errors                                 +
   13                      JPEG2000 transmission errors                                +
   14                      Non‐eccentricity pattern noise                              –
   15            Local block‐wise distortions of different intensity                   –
   16                        Mean shift (intensity shift)                              +
   17                             Contrast change                                      +
   18                        Change of color saturation                                +
   19                      Multiplicative Gaussian noise                               +
   20                              Comfort noise                                       +
   21                    Lossy compression of noisy images                             +
   22                   Image color quantization with dither                           –
   23                          Chromatic aberrations                                   +
   24                             Sparse sampling                                      –

   The listed 18 distortions comprehensively allow a use of the vast majority of noise types and
distortions that can occur in UAV images or be the result of weather conditions. These distortion
types give together 2250 test images from the TID2013 dataset that will be used in the paper.
   Let us analyze the performance of the existing NR metrics on this subset of images. Since our task
is to ensure high accuracy of estimation, the maximum possible number of different metrics is
included. The SROCC values for the entire TID2013 database and the selected subset are given in
Table. 2.
   As it can be seen from the results in Table 2, the best performance is demonstrated by the ILNIQE
metric, but its SROCC values (equal to 0.492 for all and 0.529 for the selected 18 UAV distortions)
are relatively (inappropriately) low. It should be noted that Table 2 shows the absolute SROCC values
because the metrics have been developed using different image databases that can evaluate the visual
quality (MOS values) in two ways: as a higher value for better quality, or vice versa - a higher value
as a larger difference from the perfect quality.
Table 2
SROCC values of no‐reference IQA on TID2013 subsets
##          Metric        SROCC       SROCC              ##        Metric         SROCC       SROCC
                            (All)     (UAV)                                        (All)      (UAV)
 1       ILNIQE [15]       0.492      0.529              23      DIQU [34]        0.240       0.251
 2      CORNIA [16]        0.435      0.521              24       SDQI [35]       0.224       0.248
 3        HOSA [17]        0.471      0.515              25      DIPIQ [36]       0.140       0.209
 4      C‐DIIVINE[18]      0.373      0.448              26       MLV [37]        0.201       0.195
 5      BLIINDS2 [19]      0.395      0.425              27     FISHBB [38]       0.145       0.152
 6      BRISQUE[20]        0.367      0.416              28      JNBM [39]        0.141       0.152
 7         BIQI [21]       0.405      0.409              29    DESIQUE[40]        0.069       0.150
 8        SSEQ [22]        0.341      0.406              30    GMLOG [41]         0.109       0.139
 9        NIQE [23]        0.313      0.403              31    NIQMC [42]         0.113       0.124
10         QAC [24]        0.372      0.379              32     ARISM [43]        0.145       0.109
11    SISBLIM_SM [25]      0.318      0.360              33    CPBDM [44]         0.112       0.109
12         LPSI [26]       0.395      0.357              34       LSSn [31]       0.168       0.105
13       LPC‐SI [27]       0.323      0.354              35        PSS [31]       0.022       0.087
14    SISBLIM_SFB[25]      0.336      0.348              36       LSSs [31]       0.114       0.084
15       DIIVINE [28]      0.344      0.343              37    ARISMc [43]        0.138       0.081
16        BIBLE [29]       0.281      0.333              38        PSI [45]       0.001       0.075
17       OG‐IQA [30]       0.276      0.327              39    SMETRIC[46]        0.097       0.074
18        BPRI [31]        0.229      0.313              40       FISH [38]       0.052       0.041
19        TCLT [32]        0.233      0.308              41    NR‐PWN [47]        0.016       0.039
20 SISBLIM_WFB [25]        0.293      0.301              42       NMC [48]        0.054       0.033
21      MSGF‐PR [33]       0.244      0.274              43      BLUR [49]        0.008       0.020
22 SISBLIM_WM [25]         0.239      0.265              44      NJQA [50]        0.100       0.007

3. The problem of metrics selection
   It is possible to increase the accuracy of image quality assessing by combining several metrics.
Successfully selected metrics are able to complement each other and provide a comprehensive
analysis of the image taking into account various types of distortions. As it was shown in [13], the
greatest efficiency is achieved through multi-parameter optimization using artificial neural networks.
Combining the listed 44 metrics can potentially give the best accuracy of visual quality assessment.
However, most of these metrics can make a low contribution requiring significant computing
resources. High mobility and minimal computing costs are among the key requirements for UAV
applications. Therefore, it is necessary to reduce the number of metrics without a significant decrease
in the accuracy of IQA. Several possible solutions can be employed for the correct choice of
elementary metrics (listed in Table 2) as inputs of an ANN, but not all of them are feasible or give an
effective solution:
   1. A complete enumeration of options is not possible in practice, since even for 5 or 10
   incoming metrics, it will be necessary to calculate 1.6×108 and 2.7×1016 combinations,
   respectively.
   2. The choice of the best metrics with high SROCC rates or the exclusion of similar metrics with
   high cross-correlation values has shown insufficient efficiency in [51].
   3. “Intelligent” selection of appropriate metrics. As a possible solution, the approach of using
   regularization was tested in [13] and proven to be effective. Lasso (least absolute shrinkage and
   selection operator) regularization is widely used in machine learning to reduce the model
   complexity and prevent overfitting. As a result of introducing restrictions, it allows determining
   the least important input features (corresponding metrics) and excludes them by setting zero
   weight coefficients. This approach can be applied to reduce the number of metrics.
   To display the influence of the number of elementary metrics used on the accuracy of the trained
ANN, we employ several of their combinations defined using Lasso in the range of values from the
minimum 3-5 to all 44 metrics. The Lasso parameters were selected in such a way as to obtain non-
zero weights for a given number of the metrics. Totally, 10 dimensions are considered in the paper: 4,
5, 7, 10, 16, 20, 25, 30, 35, and 44. Metric combinations with 16 metrics and less, which are focused
on, are presented in Table 3.

Table 3
List of the metrics, defined by Lasso
           Metrics’ number                                     Metrics’ names
                   4                                   ARISM, CORNIA, DIPIQ, ILNIQE
                   5                                          Above 4 + LPCSI
                   7                                      Above 5 + MLV, NIQMC
                  10                                   Above 7 + MSGF‐PR, NIQE, PSS
                  16                        Above 10 + C‐DIIVINE, GMLOG, HOSA, JNBM, PSI, TCLT

4. Preliminary results

    Despite the popularity of neural networks, their use in the field of image quality assessment has
some limitations.
    First, there are limited variety and size of datasets, because only image databases containing MOS
values can be applied. It should be noted that due to the limited number of distortion levels and the
variety of reference images, it can be assumed that some test images have unique properties and their
distribution into training or test subset may affect the accuracy of the trained neural networks.
Therefore, it is impossible to choose exactly which images should be in each of these sets. To ensure a
result approaches the optimal one, for each ANN configuration over 100 repetitions with a random
distribution of images on training (70%) and testing (30%, respectively) subsets have been completed.
    Second, the choice of the type of ANN can have a significant impact on the final efficiency. Two
types of networks are considered: feed-forward and cascade networks, which have a non-linear
relationship between layers since the resulting value of each layer, including the input one, affects all
subsequent layers.


                                                   (a)


                                                (b)
Figure 1: Generalized schemes of the used feed‐forward (a) and cascade (b) networks

   Further, the efficiency of ANN is also determined by its structure (the number of hidden layers and
the number of neurons in each of them). Since a significant number of factors affecting the efficiency
of the final neural network have already been indicated, several basic configurations are used at the
preliminary stage of the analysis. A more precise configuration of the ANN will be determined at the
final stage of creating the combined metric. At this stage, variants of the neural network structure with
1-3 hidden layers are used. For each of them, there are two options for the number of neurons N in
each layer: 1) in all layers, it is equal to the number of input metrics M (N = M), and 2) each next
layer starting from the second one the number is divided by two (N1 = M, N2 = M/2, N3 = M/4).
There are only 5 options totally because for a single-layer network they are identical.
   As the activation function, a sigmoid function is used, which allows, regardless of the value ranges
of the used metrics, to obtain, after the 1st hidden layer, the values in the fixed range [0,1]. This
procedure allows us to implement the built-in function fitting and value normalization. This stage
involves the construction of 10,000 variants of ANN (2 types × 5 configs × 100 repetitions × 10
metric combinations). All calculations were performed using the MatLab software.
   Let us analyze the results obtained after training all these ANNs. The main dependence is that the
accuracy of the combined metric grows with the number of elementary metrics used. The maximum is
achieved for all 44 metrics. The graph of SROCC dependence on the number of metrics is shown in
Fig. 2 for ANNs with maximum SROCC rates among repetitions of each configuration for the feed-
forward network.
   Based on these results, several conclusions can be drawn. Thus, the use of an ANN for metrics
combination is an effective solution for UAV applications, since even the minimal number of them
(4) significantly exceeds in accuracy the maximum result among elementary metrics (SROCC =
0.53). The current 5 configurations of the ANN structures give similar indicators, their comparison
will be carried out in more detail later. This graph allows making some recommendations for
choosing the structure of an ANN depending on the requirements and constraints of the problem
solved. For example, if it is necessary to ensure maximum performance, the desired choice would be a
combined metric of 5 elementary ones, its result reaches SROCC = 0.74, which is much higher than
for 4 metrics, but a further increase of accuracy with the number of input parameters is slow.
Nevertheless, if accuracy or balance with performance is a priority, then the options of 10 or 16
elementary metrics can be useful. Their accuracy reaches 0.82 – 0.84 of SROCC. Further, the
accuracy at the level of 0.85 is practically independent of the number of metrics. Considering that one
of the requirements of this study is to maintain acceptable performance with high accuracy, we will
use a combined metric consisting of 10 elementary metrics.


Figure 2: Dependence of SROCC on the number of elementary metrics selected by the Lasso criterion

    To display the main statistical indicators and some problems, Fig. 3 shows a box chart for 4 (full
graph and limited range higher than 0.5 under it), 5 (similarly to the previous one), 10, and all 44
metrics. Its advantage is the ability to display simultaneously the median, the lower (0.25) and upper
(0.75) quartiles, any outliers (computed using the interquartile range), and the minimum and
maximum values that are not outliers. From these graphs, it can be noted that with a small number of
metrics (4 and 5), the complexity of the neural network (number of neurons) is not enough for proper
training, as a result of which anomalous results were obtained - incorrectly trained neural networks
with indicators below individual metrics. This is also the problem of multilayer neural networks with
fewer neurons in each layer. For 10 and more metrics, this problem is no longer observed. The highest
values for each presented network configuration are already denoted in Fig 1. Quantitative indicators
of the best neural networks for the feed-forward network from Fig. 1 and Fig. 2 are given in Table 4,
where M means the number of elementary metrics.


Figure 3: Box charts of the results of the obtained neural networks for 4, 5, 10, and 44 input metrics.

Table 4
Results of the best feed‐forward networks for different numbers of inputs (4, 5, 10, and 44)
  NN         Description (in                                SROCC
 config        NN layers)          M=4               M=5              M = 10              M = 44
   1              [M]              0.683             0.722             0.805               0.858
   2             [M, M]            0.692             0.723             0.818               0.840
   3           [M, M, M]           0.700             0.742             0.797               0.846
   4            [M, M/2]           0.697             0.724             0.795               0.853
   5         [M, M/2, M/4]         0.688             0.732             0.809               0.881

5. Final network modifications

    In the first phase of experiments, when forming a neural network for 10 input metrics, the
following configurations were used for neural networks with 1-3 hidden layers: [10], [10, 10], [10, 10,
10], [10, 5] and [10, 5, 2].
    The general trend in Fig. 2 shows that the number of neurons in layers less than 10 may not be
enough. Therefore, additional configurations with a number of neurons up to 20 per layer (×2
compared to the number of input metrics) were additionally built. More than 30 configurations for
both network types have been used and the best results of the ANN for each number of hidden layers
and some statistics are partly shown in Table 5. It shows lists of neural network configurations, both
the best 2 from the initial five and additionally trained for 10 input metrics (50 repetitions).
    To evaluate the effectiveness of each configuration and both types of networks, some statistical
indicators are given: the maximum (best neural network) and minimum value, median, skewness, and
quartiles 0.75 and 0.95. Skewness is a measure of the asymmetry of the data around the sample mean.
If skewness is positive, the data spread out more to the higher values. The skewness of the normal
distribution (or any perfectly symmetric distribution) is zero.
    The maximum performance for both types of networks in Table 5 has been achieved by
configuration #8. Despite the random learning process, in general, for a feed-forward network, an
increase in the number of neurons to 20 leads to an increase in accuracy. This is also confirmed by the
values of the quartiles 0.75 and 0.95. A further increase in the number of neurons does not provide a
significant improvement. According to skewness values, it can be noted that there is a slight tendency
toward obtaining neural networks with low performance, and in the worst cases they differ a little
from elementary metrics (SROCC can be less than 0.6). Cascade neural networks do not provide any
advantages demonstrating somewhat lower performance for almost all configurations. This network
shows the advantage in terms of maximum SROCC for configurations with a small number of
neurons (#2 and #4), therefore, it is presumably the most effective for solutions with a small amount
of input data and simpler layer structures.
    According to the results of Table 5, the ANN with the maximum Spearman correlation coefficient
of 0.8307 was chosen as a combined metric for visual quality assessment tasks. The list of metrics
used in it and a visual comparison of its effectiveness for elementary metrics is shown in Fig. 4. This
metric is available at https://github.com/OlegIeremeiev/CNNM-UAV.git .

Table 5
Results of the best feed‐forward networks for 10 inputs
 NN          Description                                      SROCC
config     (in NN layers)      Max      Min        Median       Skewness        0.75          0.95
                                                                               Quartile      Quartile
                                       Feed‐forward network
  1         [10] [10]         0.8178    0.6433     0.7351        ‐0.0261        0.7621       0.8033
  2          [10] [5]         0.7949    0.6256     0.7487        ‐0.9202        0.7716       0.7899
  3            [20]           0.8111    0.6475     0.7483        ‐0.5799        0.7704       0.8041
  4         [20] [10]         0.8074    0.6312     0.7563        ‐0.7113        0.7862       0.8024
  5         [20] [20]         0.8280    0.6787     0.7625        ‐0.2461        0.7906       0.8209
  6       [15] [10] [10]      0.8214    0.6430     0.7452        ‐0.3190        0.7726       0.8116
  7       [10] [15] [20]      0.8050    0.6383     0.7386        ‐0.3225        0.7608       0.8009
  8       [20] [15] [10]      0.8307    0.6500     0.7607        ‐0.5689        0.7806       0.8161
  9       [10] [20] [15]      0.8195    0.5945     0.7387        ‐0.6796        0.763        0.8120
                                          Cascade network
  1         [10] [10]         0.8010    0.6074     0.7507        ‐1.2935        0.7698       0.7902
  2          [10] [5]         0.8213    0.5437     0.7435        ‐1.6415        0.7643       0.7844
  3            [20]           0.7989    0.5954     0.7521        ‐1.0453        0.7685       0.7966
  4         [20] [10]         0.8176    0.5606     0.7582        ‐1.5616        0.7811       0.8098
  5         [20] [20]         0.8080    0.6108     0.7577        ‐1.5696        0.7784       0.8000
  6       [15] [10] [10]      0.8126    0.6253     0.7487        ‐0.9623        0.7739       0.8040
  7       [10] [15] [20]      0.8185    0.6326     0.7618        ‐0.9489        0.7821       0.8041
  8       [20] [15] [10]      0.8214    0.6193     0.7726        ‐1.3170        0.7895       0.8053
  9       [10] [20] [15]      0.8177    0.6681     0.7572        ‐0.2515        0.7797       0.8122


Figure 4: Scatter plot of the 10 elementary metrics and the combined metric (CNNM) including
them.
    A visual representation of the effectiveness of assessing the quality of certain types of distortions
is shown in the graph in Fig. 5. The numbers of distortions correspond to the serial number of the
distortions selected for analysis (see Table 1). It can be seen that the combined metric provides
consistently high results with a decrease in accuracy at distortions #12 (mean shift) and #14 (change
of color saturation), these distortions are problematic for all the metrics used in the paper.


Figure 5: Dependency of the metrics’ SROCC values on the type of distortion (absolute values)

6. Combined metric analysis

   The purpose of creating a combined metric was to improve the accuracy of the visual quality
assessment of images in various UAV tasks. However, there is a limitation: general-purpose color
image database TID2013 with the corresponding MOS values was taken to train the neural network.
Therefore, it is necessary to analyze the effectiveness of the obtained metric in practice for real
images.
   It should be noted that the application area and its inherent types of distortion significantly affect
the results obtained. Thus, in [14], the visual quality metric was proposed for the assessment of
remote sensing images. Its SROCC value reached the level of 0.8813. At the same time, verification
on UAV-related distortions from TID2013 showed significantly worse results - SROCC has decreased
to 0.7083. The reason lies in the different sets of distortions. In particular, transmission errors are rare
in remote sensing practice, since these systems operate in more static and predictable conditions.
Distortions in brightness and contrast as a factor of weather and daylight conditions changing were
also not taken into account in the design in [14]. This confirms the fact that individual metrics are
often not enough for application areas with unique features and the combined approach based on
neural networks allows for an increase of 50% or more.
   The practicality and applicability of the proposed solution can only be assessed on the basis of real
images from the UAV. At the same time, this approach has significant limitations: the absence of
MOS values and the complexity of obtaining images with all the considered distortions and needed
combinations. Taking this into account, a number of assumptions and simplifications have been made,
and the results obtained are mostly illustrative.
   1. Verification of visual metrics requires MOS, which values can only be obtained from a
   significant amount of subjective experiments and require considerable time. The first
   simplification is that the missing MOS values can to some extent be replaced by objective
   indicators, the accuracy of which significantly exceeds the analyzed metrics. For a comparative
   analysis of the combined and individual metrics, this may be sufficient. Such a condition can be
   provided by full-reference quality metrics - the accuracy of some of them reaches SROCC = 0.9
   for the entire TID2013 and more than 0.96 for certain types of distortions and significantly
   exceeds SROCC for existing no-reference metrics.
   2. It is technically difficult to ensure the presence of real test images with the considered
   distortions, therefore, it is proposed to artificially simulate their presence by adding the distortions
   under the interest of different intensities to the selected images.
   3. The level of distortion should preferably have a wide range of intensities from inconspicuous
   to significant.
   To verify the metrics, real images from UAVs were used. As a basis, some images of the UAVDT
(Unmanned Aerial Vehicle Benchmark Object Detection and Tracking) dataset [52] were taken,
examples of which are shown in Fig. 6. The dataset contains more than 40,000 images with a
resolution of 1080 × 540 pixels. Of these, 16 images were selected with different terrain, daylight, and
weather conditions.


Figure 6: Examples of reference images of the UAV test set.

    Creation of test images with the necessary types of distortion requires special skills. TID2013
distortions were generated in accordance with a certain strategy, however, their generation code is not
available. Therefore, our mechanisms for generating distortions are used in the paper, and from the
list of selected types of distortions, 9 main ones are taken into account:
         Gaussian white noise;
         Multiplicative noise;
         Gaussian blur;
         Denoising (applying BM3D filter to images with Gaussian white noise);
         JPEG and JPEG2000 compression;
         Brightening, darkening, and mean shift (darkening and lightening).
    According to the variety of intensities, 9 different levels were chosen for a more accurate gradation
of distortion, in contrast to 5 levels for TID2013. Their intensity varies from inconspicuous to
significant. The distribution of peak signal-to-noise ratio (PSNR) values is shown in Fig. 7.
Figure 7: Histogram of PSNR values of the test images

    As a result, the verification test set based on real UAV images consists of 1296 images (16 images
x 9 distortions x 9 intensity levels).
    In the role of MOS values for no-reference metrics verification, the best full-reference quality
metrics are used. The SROCC values of some well-known FR IQA for all TID2013 images and UAV
-related test set are given in Table 6. Since their problems and solutions are similar to those solved in
the article, a combined full-reference metric was formed to improve the accuracy. It uses the metrics
listed in Table 6 as input and consists of a two-layer neural network (marked as C_MOS) with the
number of neurons [16, 8] and all other parameters listed above. Since its SROCC for the task
considered is almost 0.04 higher than for the best of elementary metrics, this combined metric has
been chosen as the analog of MOS for UAV test images.

Table 6
SROCC values of the full‐reference visual metrics on the TID2013 image dataset
 Metric       VSI       PSIM     MDSI      HaarPSI UNIQUE CVSSI         IQM2        ADM         C_MOS
              [53]      [54]     [55]      [56]      [57]      [58]     [59]        [60]
 SROCC        0.8967 0.8926 0.8897 0.8730 0.8599               0.8090 0.7955        0.7861      0.9107
 SROC(UAV) 0.8274 0.8519 0.8873 0.8811 0.8496                  0.8478 0.8507        0.8075      0.9261

   The results of the verification of the combined and elementary no-reference metrics are shown in
Table 7. In addition to the overall assessment, the SROCC values for individual types of distortions
are also shown. The two best results for each type of distortion are highlighted in bold.

 Table 7
 The results of verification of the no‐reference metrics on the UAV test set
Distortion ARISM PSS          CORNIA DIPIQ ILNIQE NIQE LPCSI MLV               MSGF NIQMC CNNM
All          0.069 0.307 0.581          0.109 0.548 0.616 0.330 0.024          0.499 0.074 0.659
AWGN         0.804 0.202 0.861          0.470 0.890 0.886 0.511 0.239          0.794 0.180 0.874
Multiplica
tive noise 0.702 0.257 0.784            0.053 0.722 0.701 0.557 0.313          0.624    0.309     0.903
Blur         0.942 0.898 0.904          0.254 0.869 0.900 0.966 0.949          0.784    0.200     0.956
Denoise      0.360 0.072 0.186          0.196 0.004 0.153 0.020 0.181          0.228    0.118     0.154
JPEG         0.769 0.958 0.868          0.534 0.713 0.728 0.166 0.087          0.866    0.367     0.787
JP2k         0.134 0.378 0.738          0.337 0.240 0.632 0.054 0.486          0.028    0.433     0.764
Brighten 0.524 0.070 0.027              0.027 0.166 0.035 0.183 0.372          0.197    0.378     0.474
Darken       0.359 0.044 0.057          0.575 0.341 0.301 0.342 0.460          0.021    0.052     0.281
Mean
shift        0.515 0.328 0.027          0.429 0.335 0.442 0.169 0.391          0.253    0.054     0.520
   From the obtained results, it can be seen that despite the limitations of the approximate MOS
values, the combined metric provides the maximum overall accuracy and is one of the best for most of
the indicated types of distortions, providing the best balance between various distortions. It should be
noted that these results have been obtained for the most common types of distortions, which are
commonly used in the design of elementary metrics. Considering the types of distortions used in
TID2013, but not modeled in this set (e.g. transmission errors, etc.), it can be expected that the
combined metric can have additional benefits by providing more stable visual quality estimation.

7. Conclusions
   The paper is devoted to visual quality assessment of UAV images, which is actual for automating
the image processing and improving image quality for UAV applications. A list of more than 40
known no-reference visual quality metrics is considered. To analyze the effectiveness of visual quality
metrics, the TID2013 image database and a subset with actual types of distortions have been selected.
The verification of existing visual quality metrics has shown an accuracy of less than 0.53 for the best
one and less than 0.3 for most metrics according to SROCC. Therefore, the method of combining
visual quality metrics using the neural network has been proposed to improve the accuracy of visual
quality assessment. The problem of the optimal choice of elementary metrics for reducing the
redundancy and rational use of computing resources has been considered and the solution based on
the Lasso regularization method has been proposed, which determines the weight coefficient equal to
0 for the excluded and least important metrics. Training the neural networks of different types and
their configurations has been carried out, taking into account the limitations of the test image database
used in experiments. The analysis of the effectiveness of this approach, which reaches a result of
about 0.85 for 20 metrics or more, has been carried out, and the dependence on the number of metrics
used in the paper together with the main statistics is shown. For 10 metrics, as the optimal solution for
high accuracy and performance, the results have been refined with the training of additional
configurations of the structure of neural networks. It is shown that the accuracy of the final combined
metric reaches SROCC = 0.83.
   To evaluate the effectiveness of the metric on real images, a test image database of almost 1300
images was formed. As an alternative to the missing MOS values, a combined full-reference metric
has been created. Its accuracy reaches 0.926 for the used TID2013 distortion set and is significantly
higher than the values of any no-reference metric, which is acceptable for their comparison. It is
shown that, on this test set, the obtained metric provides the best result.
   In the future, research in this area can be expanded by adding new distortions typical for UAV
images and new neural network models including deep-learning models of limited complexity.

8. References
[1] R. C. Gonzalez, R. E. Woods, Digital Image Processing, 4th ed., Pearson, New York, NY, 2018.
[2] W. Burger, M.J. Burge, Principles of Digital Image Processing, Springer, New York, NY, 2009.
[3] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd ed., Springer,
    New York, NY, 2009. doi: 10.1007/978-0-387-84858-7.
[4] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
[5] R. Wang, X. Xiao, B. Guo, Q. Qin, R. Chen, An Effective Image Denoising Method for UAV
    Images via Improved Generative Adversarial Networks, Sensors 18 (2018 ) 1–23.
    doi: 10.3390/s18071985.
[6] T. Sieberth, R. Wackrow, J. H. Chandler, UAV image blur - its influence and ways to correct it,
    The International Archives of the Photogrammetry, Remote Sensing and Spatial Information
    Sciences XL-1/W4 (2015) 33–39. doi: 10.5194/isprsarchives-XL-1-W4-33-2015.
[7] V. S. Alfio, D. Costantino, M. Pepe, Influence of Image TIFF Format and JPEG Compression
    Level in the Accuracy of the 3D Model and Quality of the Orthophoto in UAV Photogrammetry,
    J. Imaging 6 (2020) 1–22. doi: 10.3390/jimaging6050030.
[8] M. Kedzierski, D. Wierzbicki, Radiometric quality assessment of images acquired by UAV’s in
     various lighting and weather conditions, Measurement 76 (2015) 156–169.
     doi: 10.1016/j.measurement.2015.08.003.
[9] G. Koretsky, J. Nicoll, M. Taylor, A Tutorial on Electro-Optical/ Infrared (EO/IR) Theory and
     Systems, IDA Document D-4642, 2013.
[10] W. Lin, C.-C. Jay Kuo, Perceptual Visual Quality Metrics: A Survey, Journal of Visual
     Communication and Image Representation 22 (2011) 297–312. doi: 10.1016/j.jvcir.2011.01.005.
[11] Y. Niu, Y. Zhong, W. Guo, Y. Shi, P. Chen, 2D and 3D Image Quality Assessment: A Survey of
     Metrics and Challenges, IEEE Access 7 (2018) 782–801. doi: 10.1109/ACCESS.2018.2885818.
[12] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, etc, Image database TID2013:
     Peculiarities, results and perspectives, Signal Processing: Image Communication 30 (2015),
     pp. 57–77. doi: 10.1016/j.image.2014.10.009.
[13] O. Ieremeiev, V. Lukin, K. Okarma, K. Egiazarian, Full-Reference Quality Metric Based on
     Neural Network to Assess the Visual Quality of Remote Sensing Images, Remote Sensing 12
     (2020) 1–31. doi: 10.3390/rs12152349.
[14] A. Rubel, O. Ieremeiev, V. Lukin, J. Fastowicz, K. Okarma, Combined No-Reference Image
     Quality Metrics for Visual Quality Assessment Optimized for Remote Sensing Images, Applied
     Sciences 12 (2022) 1–19. doi: /10.3390/app12041986.
[15] L. Zhang, L. Zhang, A.C. Bovik, A Feature-Enriched Completely Blind Image Quality
     Evaluator, IEEE Trans. Image Processing 24 (2015) 2579–2591. doi: 10.1109/TIP.2015.2426416
[16] P. Ye, J. Kumar, L. Kang, D. Doermann, Unsupervised Feature Learning Framework for No-
     Reference Image Quality Assessment, in: Proceedings of the IEEE Conference on Computer
     Vision and Pattern Recognition, CVPR ’2012, Providence, RI, USA, 2012, pp. 1098–1105.
     doi: 10.1109/CVPR.2012.6247789.
[17] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, D. Doermann, Blind Image Quality Assessment Based on
     High Order Statistics Aggregation, IEEE Trans. Image Process. 25 (2016) 4444–4457.
     doi: 10.1109/TIP.2016.2585880.
[18] Y. Zhang, A.K. Moorthy, D.M. Chandler, A.C. Bovik, C-DIIVINE: No-Reference Image Quality
     Assessment Based on Local Magnitude and Phase Statistics of Natural Scenes, Signal Process.
     Image Commun. 29 (2014) 725–747. doi: 10.1016/j.image.2014.05.004.
[19] M. A. Saad, A. C. Bovik, C. Charrier, Blind Image Quality Assessment: A Natural Scene
     Statistics Approach in the DCT Domain, IEEE Trans. Image Process. 21 (2012) 3339–3352.
     doi: 10.1109/TIP.2012.2191563.
[20] A. Mittal, A. K. Moorthy, A. C. Bovik, No-Reference Image Quality Assessment in the Spatial
     Domain, IEEE Trans. Image Process. 21 (2012) 4695–4708. doi: 10.1109/TIP.2012.2214050.
[21] A. K. Moorthy, A.C. Bovik, A Two-Step Framework for Constructing Blind Image Quality
     Indices, IEEE Signal Process. Lett. 17 (2010) 513–516. doi: 10.1109/LSP.2010.2043888.
[22] L. Liu, B. Liu, H. Huang, A.C. Bovik, No-Reference Image Quality Assessment Based on
     Spatial and Spectral Entropies, Signal Process. Image Commun. 29 (2014) 856–863.
     doi: 10.1016/j.image.2014.06.006.
[23] A. Mittal, R. Soundararajan, A. C. Bovik, Making a “Completely Blind” Image Quality
     Analyzer, IEEE Signal Process. Lett. 20 (2013) 209–212. doi: 10.1109/LSP.2012.2227726.
[24] W. Xue, L. Zhang, X. Mou, Learning without Human Scores for Blind Image Quality
     Assessment, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition,      CVPR      ’2013,    Portland,    OR,     USA,     2013,    pp.   995–1002.
     doi: 10.1109/CVPR.2013.133.
[25] K. Gu, G. Zhai, X. Yang, W. Zhang, Hybrid No-Reference Quality Metric for Singly and
     Multiply     Distorted    Images,    IEEE      Trans.   Broadcast.   60    (2014)   555–567.
     doi: 10.1109/TBC.2014.2344471.
[26] Q. Wu, Z. Wang, H. Li, A Highly Efficient Method for Blind Image Quality Assessment, in:
     Proceedings of the IEEE Int. Conf. on Image Processing, ICIP ’2015, Quebec City, QC, Canada,
     2015, pp. 339–343, 2015. doi: 10.1109/ICIP.2015.7350816.
[27] R. Hassen, Z. Wang, M. M. A. Salama, Image Sharpness Assessment Based on Local Phase
     Coherence, IEEE Trans. Image Process. 22 (2013) 2798–2810. doi: 10.1109/TIP.2013.2251643.
[28] A. K. Moorthy, A. C. Bovik, Blind Image Quality Assessment: From Natural Scene Statistics to
     Perceptual      Quality,    IEEE    Trans.    Image      Process    20     (2011)     3350–3364.
     doi: 10.1109/TIP.2011.2147325.
[29] L. Li, W. Lin, X. Wang, G. Yang, K. Bahrami, A. C. Kot, No-Reference Image Blur Assessment
     Based on Discrete Orthogonal Moments, IEEE Trans. Cybern. 46 (2016) 39–50.
     doi: 10.1109/TCYB.2015.2392129.
[30] L. Liu, Y. Hua, Q. Zhao, H. Huang, A. C. Bovik, Blind Image Quality Assessment by Relative
     Gradient Statistics and Adaboosting Neural Network, Signal Process. Image Commun. 40 (2016)
     1–15. doi: 10.1016/j.image.2015.10.005.
[31] X. Min, K. Gu, G. Zhai, J. Liu, X. Yang, C.W. Chen, Blind Quality Assessment Based on
     Pseudo-Reference       Image,     IEEE     Trans.    Multimed.      20    (2018)      2049–2062.
     doi: 10.1109/TMM.2017.2788206.
[32] Q. Wu, H. Li, F. Meng, K.N. Ngan, B. Luo, C. Huang, B. Zeng, Blind Image Quality Assessment
     Based on Multichannel Feature Fusion and Label Transfer, IEEE Trans. Circuits Syst. Video
     Technol. 26 (2016) 425–440. doi: 10.1109/TCSVT.2015.2412773.
[33] Q. Wu, H. Li, F. Meng, K. N. Ngan, S. Zhu, No Reference Image Quality Assessment Metric via
     Multi-Domain Structural Information and Piecewise Regression, J. Vis. Commun. Image
     Represent. 32 (2015) 205–216. doi: 10.1016/j.jvcir.2015.08.009.
[34] L. Li, Y. Yan, Z. Lu, J. Wu, K. Gu, S. Wang, No-Reference Quality Assessment of Deblurred
     Images Based on Natural Scene Statistics, IEEE Access 5 (2017) 2163–2171.
     doi: 10.1109/ACCESS.2017.2661858.
[35] M. Rakhshanfar, M. A. Amer, Sparsity-Based No-Reference Image Quality Assessment for
     Automatic Denoising, Signal Image Video Process. 12 (2018) 739–747. doi: 10.1007/s11760-
     017-1215-3.
[36] K. Ma, W. Liu, T. Liu, Z. Wang, D. Tao, DipIQ: Blind Image Quality Assessment by Learning-
     to-Rank Discriminable Image Pairs, IEEE Trans. Image Process. 26 (2017) 3951–3964.
     doi: 10.1109/TIP.2017.2708503.
[37] K. Bahrami, A. C. Kot, A Fast Approach for No-Reference Image Sharpness Assessment Based
     on Maximum Local Variation, IEEE Signal Process. Lett. 21 (2014) 751–755.
     doi: 10.1109/LSP.2014.2314487.
[38] P. V. Vu, D. M. Chandler, A Fast Wavelet-Based Algorithm for Global and Local Image
     Sharpness      Estimation,    IEEE     Signal     Process.    Lett.    19    (2012)     423–426.
     doi: 10.1109/LSP.2012.2199980.
[39] R. Ferzli, L. J. Karam, A No-Reference Objective Image Sharpness Metric Based on the Notion
     of Just Noticeable Blur (JNB), IEEE Trans. Image Process. 18 (2009) 717–728.
     doi: 10.1109/TIP.2008.2011760.
[40] Y. Zhang, D. M. Chandler, No-Reference Image Quality Assessment Based on Log-Derivative
     Statistics    of    Natural    Scenes,   J.    Electron.    Imaging     22     (2013)    043025.
     doi: 10.1117/1.JEI.22.4.043025.
[41] W. Xue, X. Mou, L. Zhang, A. C. Bovik, X. Feng, Blind Image Quality Assessment Using Joint
     Statistics of Gradient Magnitude and Laplacian Features, IEEE Trans. Image Process. 23 (2014)
     4850–4862. doi: 10.1109/TIP.2014.2355716.
[42] K. Gu, W. Lin, G. Zhai, X. Yang, W. Zhang, C.W. Chen, No-Reference Quality Metric of
     Contrast-Distorted Images Based on Information Maximization, IEEE Trans. Cybern. 47 (2017)
     4559–4565. doi: 10.1109/TCYB.2016.2575544.
[43] K. Gu, G. Zhai, W. Lin, X. Yang, W. Zhang, No-Reference Image Sharpness Assessment in
     Autoregressive Parameter Space, IEEE Trans. Image Process. 24 (2015) 3218–3231.
     doi: 10.1109/TIP.2015.2439035.
[44] N. D. Narvekar, L. J. Karam, A No-Reference Perceptual Image Sharpness Metric Based on a
     Cumulative Probability of Blur Detection, in Proceedings of the International Workshop on
     Quality of Multimedia Experience, QoMEx ’2009, San Diego, CA, USA, 2009, pp. 87–91.
     doi: 10.1109/QOMEX.2009.5246972.
[45] C. Feichtenhofer, H. Fassold, P. Schallauer, A Perceptual Image Sharpness Metric Based on
     Local Edge Gradient Analysis, IEEE Signal Process. Lett. 20 (2013) 379–382.
     doi: 10.1109/LSP.2013.2248711.
[46] N. N. Ponomarenko, V. V. Lukin, O. I. Eremeev, K. O. Egiazarian, J. T. Astola, Sharpness
     Metric for No-Reference Image Visual Quality Assessment, SPIE 8295 (2012) 829519.
     doi: 10.1117/12.906393.
[47] T. Zhu, L. Karam, A No-Reference Objective Image Quality Metric Based on Perceptually
     Weighted Local Noise, EURASIP J. Image Video Process. (2014) 1–5. doi: 10.1186/1687-5281-
     2014-5.
[48] Y. Gong, I. F. Sbalzarini, Image Enhancement by Gradient Distribution Specification, Lecture
     Notes in Computer Science 9009 (2015) 47–62. doi: 10.1007/978-3-319-16631-5_4.
[49] F. Crété-Roffet, T. Dolmiere, P. Ladret, M. Nicolas, The Blur Effect: Perception and Estimation
     with a New No-Reference Perceptual Blur Metric, In Proceedings of the Human Vision and
     Electronic Imaging XII, HVEI ’2007, San Jose, CA, USA, 2007, pp. 649201.
     doi: 10.1117/12.702790.
[50] S. A. Golestaneh, D. M. Chandler, No-Reference Quality Assessment of JPEG Images via a
     Quality Relevance Map, IEEE Signal Process. Lett. 21 (2014) 155–158.
     doi: 10.1109/LSP.2013.2296038.
[51] O. Ieremeiev, V. Lukin, K. Okarma, K. Egiazarian, B. Vozel, On properties of visual quality
     metrics in remote sensing applications, in: Proceedings of the IS&T Int’l. Symp. on Electronic
     Imaging: Image Processing: Algorithms and Systems, IPAS ’2022, San Francisco, CA, USA,
     2022, pp. 354-1-354-6. doi: 10.2352/EI.2022.34.10.IPAS-354.
[52] D. Du, Y. Qi, H.g Yu, Y. Yang, K. Duan, G. Li, W.g Zhang, Q. Huang, Q. Tian, The Unmanned
     Aerial Vehicle Benchmark: Object Detection and Tracking, in: Proceedings of the European
     Conference on Computer Vision, ECCV’2018. doi: 10.48550/arXiv.1804.00518.
[53] L. Zhang, Y. Shen, H. Li, VSI: A Visual Saliency-Induced Index for Perceptual Image Quality
     Assessment, IEEE Trans. Image Process 23 (2014) 4270–4281. doi:10.1109/TIP.2014.2346028.
[54] K. Gu, L. Li, H. Lu, X. Min, W. Lin, A Fast Reliable Image Quality Predictor by Fusing Micro-
     and     Macro-Structures,    IEEE     Trans.    Ind.   Electron.   64     (2017)    3903–3912.
     doi:10.1109/TIE.2017.2652339.
[55] H. Ziaei Nafchi, A. Shahkolaei, R. Hedjam, M. Cheriet, Mean Deviation Similarity Index:
     Efficient and Reliable Full-Reference Image Quality Evaluator, IEEE Access 4 (2016) 5579–
     5590. doi:10.1109/ACCESS.2016.2604042.
[56] R. Reisenhofer, S. Bosse, G. Kutyniok, T. Wiegand, A Haar wavelet-based perceptual similarity
     index for image quality assessment, Signal Process. Image Commun. 61 (2018) 33–43.
     doi:10.1016/j.image.2017.11.001.
[57] D. Temel, M. Prabhushankar, G. AlRegib, UNIQUE: Unsupervised Image Quality Estimation.
     IEEE Signal Process. Lett. 23 (2016) 1414–1418. doi:10.1109/LSP.2016.2601119.
[58] H. Jia, L. Zhang, T. Wang, Contrast and Visual Saliency Similarity-Induced Index for Assessing
     Image Quality, IEEE Access 6 (2018) 65885–65893. doi:10.1109/ACCESS.2018.2878739.
[59] E. Dumic, S. Grgic, M. Grgic, IQM2: new image quality measure based on steerable pyramid
     wavelet transform and structural similarity index, Signal Image Video Process. 8 (2014) 1159–
     1168. doi:10.1007/s11760-014-0654-3.
[60] S. Li, F. Zhang, L. Ma, K. N. Ngan, Image Quality Assessment by Separately Evaluating Detail
     Losses and Additive Impairments, IEEE Trans. Multimed. 13 (2011) 935–949.
     doi: 10.1109/TMM.2011.2152382.