Barriers Towards No-reference Metrics Application to Compressed Video
      Quality Analysis: on the Example of No-reference Metric NIQE
                             A. Zvezdakova1, D. Kulikov1,2, D. Kondranin3, D. Vatolin4
           azvezdakova@graphics.cs.msu.ru|dkulikov@graphics.cs.msu.ru|denis.kondranin@graphics.cs.msu.ru|
                                            dmitriy@graphics.cs.msu.ru
                              1
                               Lomonosov Moscow State University, Moscow, Russia;
                                      2
                                       Dubna State University, Dubna, Russia
    This paper analyses the application of no-reference metric NIQE to the task of video-codec comparison. A number of
issues in the metric behavior on videos was detected and described. The metric has outlying scores on black and
solid-colored frames. The proposed averaging technique for metric quality scores helped to improve the results in some
cases. Also, NIQE has low-quality scores for videos with detailed textures and higher scores for videos of lower bit rates
due to the blurring of these textures after compression. Although NIQE showed natural results for many tested videos, it
is not universal and currently can’t be used for video-codec comparisons.
    Keywords: video quality, no-reference metric, quality measuring, video-codec comparison.
1. Introduction                                                        DIIVINE (2011) [10], LBIQ (2011) [12], BRISQUE
                                                                       (2012) [9] and V-Bliinds (2012) [11] were trained on
    Today video content takes the biggest part of world
                                                                       LIVE data set. In 2015, a metric called IL-NIQE [15]
Internet traffic (more than 70%). According to the
                                                                       was proposed. It was based on NIQE [8] metric, which is
forecasts [1], its rate will grow up to 82% in 2022.
                                                                       studied in this paper, but used multivariate Gaus-sian
This trend leads to the creation of new encoding stan-
                                                                       (MVG) model to predict the quality of image patches
dards and improvements in existing encoders. There is
                                                                       instead of using a single global MVG model for an
a number of video-codec comparisons which are
                                                                       image.
conducted to find the best codecs for different tasks
and use cases and to help users and customers to                           Another group contains metrics which weren’t
find appropriate encoders for their needs. The tar-get                 trained on any data sets and use only data from a
for video encoding is to deliver high visual qual-ity                  source image to estimate its quality. For example,
with reduced file size, so the only reliable way to                    CORNIA (2012) [14] combined feature and regres-
compare encoded videos quality is to perform a sub-                    sion training. Recently several approaches which use
jective evaluation. It requires a proofed methodology                  neural networks architectures have been developed.
and a high number of observers to achieve reasonable                   The authors of COME (2018) [13] proposed an ap-
results. In general, subjective comparisons are still                  proach based on convolution neural network AlexNet
very expensive to perform, however, there are some                     and multi-regression which outperformed V-Bliinds on a
services which help researchers to perform qualitative                 number of video sets.
subjective comparison [2]. This obstacle increases the                     No-reference metrics are created to approximate
importance of objective metrics for video quality com-                 users perception of video quality, but in case of esti-
parison.                                                               mating the quality of encoding and compression, they
    Objective quality metrics can be divided into three                can be used only as an addition to reference metrics.
general categories: full-reference metrics, no-reference               No-reference metrics can’t become the main criteria
metrics and reduced-reference metrics. Full-reference                  for encoders comparison because in the opposite way
metrics are easy to interpret and useful in application to             encoders could win the comparison producing a vi-
video compression quality estimation. Unlike full-                     sually ideal result which has little common with the
reference metrics which require source video to com-                   input video. The authors of this paper organize world-
pare with compressed, no-reference metrics are useful                  wide video-codec comparisons for 16 years. Currently,
when you don’t have a source and want to estimate the                  full-reference metric SSIM is used in these compar-
quality of the compressed video. This case is usual for                isons as the main metric supplemented with a number of
example for cloud encoding when videos are uploaded                    additional metrics (PSNR, VMAF). At the same time,
compressed by a built-in encoder in smartphones or                     several researchers and industry experts con-sider
non-professional cameras. Reduced-reference metrics                    measuring and taking into account no-reference metrics
require just some part of information about source                     in video-codec comparisons. This paper de-scribes the
video and can also be used in some of the listed cases.                authors’ experience of using no-reference metric NIQE
                                                                       (Natural Image Quality Evaluator) [8] created by
2. Related work                                                        Anish Mittal, Rajiv Soundararajan and Alan C. Bovik
    There is a number of no-reference metrics which                    in video-codec comparison. This met-ric is one of the
were created using databases with subjective quality                   most popular nowadays and shows good results for
scores. Such quality assessment models were trained to                 image quality assessment.
estimate subjective quality, and so their scores                          We used NIQE to access the quality of encoded
depend on training and testing sets. For example,                      video sequences during the video-codec comparison.


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
The main idea of NIQE metric is based on construct-ing       4.1                            Cases with relevant results
a collection of quality-aware features and fitting them
                                                                  According to the authors, NIQE is not applica-ble
to a multivariate Gaussian (MVG) mode. NIQE score
                                                             to unnatural distortions in scenes and scenes from
represents the degree of distortions in the frame, and the
                                                             unnatural source (e.g. computer graphics), as such
lower score is, the higher quality is the frame.             scenes were not used during the training. However,
Accordingly, rate-distortion graphs for encoded videos       we checked metric scores on cartoons from our video
look unusually inverted, so on the plots in this paper       set.
NIQE scores are presented inverted to make the re-
                                                                  At Sita (part from the cartoon movie), rate-
sults more familiar and interpreting.
                                                             distortion curve looks inverted (Fig. 1a), NIQE shows
    There is an open implementation on MATLAB                worse quality scores for high bit rates that for low bit
provided by the authors [5]. In order to increase            rates. This means that the metric is really not appli-
computational speed, we used an implementation               cable to this type of content. At Sintel (part from CGI
from MSU Video Quality Measurement Tool (VQMT)               movie trailer), NIQE showed non-monotonic scores for
which is currently faster. The tool has a free version (it   x265 encoder on fast use case bit rate map, but ac-
includes NIQE) and can be downloaded [6]. Speed was          ceptable results for universal and ripping use cases
important in this case because the metric was used           (Fig. 1b). Thus, the metric is said to be not applica-ble
for video quality assessment.                                to cartoons, but we revealed that it works for some types
                                                             of realistic animation, such as for video gaming
                                                             (sequences Witcher3, Rust).
3. Experimental setup
                                                                  There were some examples, where the rate-
                                                             distortion curve looked unnatural, but the metric cor-
    For the evaluation, 28 different FullHD video se-        rectly ranked worse visual quality to higher bit rates.
quences were used with number of frames per second           For example, on Hera video sequence (a part of a mu-sic
from 24 to 60 and which were generated by real users.        clip with grain effects) NIQE showed worse score for
The videos were chosen from MSU video collection             x264 encoding on 4000 kbps than on 2000 kbps in fast
which consists of 15,833 videos. The collection was          use case (Fig. 2). The metric had better scores for almost
divided into 28 clusters by spatiotemporal complex-ity       all frames of the lower bit rate. It is shown in the
[7] and one sequence from each cluster, which was            example frame on Fig. 3, where x264 encoding of the
close to the cluster center, was chosen for the final        video on 4000 kbps produced worse visual quality and
testing set. Each video was encoded by x264 and x265         more compression artifacts than on 2000 kbps.
encoders. There were three encoding use cases (“fast”,
“universal” and “ripping”) based on different encod-ing                                -4                                                             -9.75
speed/quality ratios and 7 different bit rates from 1
                                                              NIQE (inversed), Y


                                                                                                                                NIQE (inversed), Y


                                                                                       -6                                                                -10
Mbps to 12 Mbps. An overall number of encoded                                          -8
                                                                                                           x264
                                                                                                           x265                                      -10.25
streams which were evaluated by NIQE is 1176.
                                                                                                                   Better


                                                                                                                                                                                         Better
                                                                                      -10                                                             -10.5

    The final video set was used in 2018 Moscow State                                 -12                                                            -10.75             x264
                                                                                                                                                                        x265
University (MSU) video-codec comparison [3]. The                                      -14                                                                -11

comparison results are available on the link, but the                                 -16
                                                                                                  5        10
                                                                                                                                                     -11.25
                                                                                                                                                                    5      10
results of NIQE were not published on-line because of                                           Bitrate, Mbps                                                  Bitrate, Mbps
several issues found in NIQE application to video
                                                                                   (a) Sita video sequence                                     (b) Sintel video sequence
quality measurement. Some of them were noted it
the original article, the others were resolved with our
                                                                                            Fig. 1. Rate-distortion graph for animation.
proposed averaging technique which will be described in
the article. Unfortunately, some issues can’t be fixed
without the metric improvement (completing the training
                                                                                       -4
set or other fixes). In this article, we suggest the
method of metric results processing to solve the
                                                                 NIQE (inversed), Y


                                                                                       -5
detected problems on metric application to videos.
                                                                                       -6
                                                                                                                                                                                Better


4. Metric behavior on videos                                                           -7
                                                                                                                                                                    x264
                                                                                                                                                                    x265
    For most of the encoded videos, NIQE showed
                                                                                       -8

the results which reflected the usual perceptual video                                 -9
                                                                                                                                            compression.ru/video/

quality on different bit rates. But there were some                                              2         4                6                        8         10          12

cases in which NIQE showed the results with some                                                                Bitrate, Mbps
issues; the following sections describe the detected is-
                                                                                               Fig. 2. Rate-distorion graph for Hera
sues and their reasons.
                                                                                            -8


                                                                  NIQE (inversed), Y
                                                                                           -8.5

                                                                                            -9


                                                                                                                                                                                                  Better
                                                                                           -9.5
                                                                                                                                                                             x264
                                                                                                                                                                             x265
                                                                                           -10
                                                                                                                                                  compression.ru/video/
                                                                                       -10.5
                                                                                                        2           4             6                               8          10
                                                                                                                        Bitrate, Mbps

                                                                                              Fig. 4. Rate-distortion graph for Fire.


                                                                                       0                                                                     0


                                                                                                                                      NIQE (inversed), Y
                                                              NIQE (inversed), Y
                                                                                   -10                                                                     -200


                                                                                                                                                                                                           Better
                                                                                   -20


                                                                                                                         Better
                                                                                                                                                           -400

                                                                                   -30
                                                                                                                                                           -600

                                                                                   -40                           x264
                                                                                                                 x265                                      -800
bitrate: 2000 kbps           bitrate: 4000 kbps                                    -50
                                                                                                                                                                  0     50         100      150
                                                                                                                                                                      Frame number
NIQE = 8.04                  NIQE = 11.11
                                                                                                   5        10    15
                                                                                                  Bitrate, Mbps                                             x265 (1 Mbps)         x265 (2 Mbps)


  Fig. 3. Frame 208 from Hera video sequence, codec:                               (a) Rate-distortion graph                                (b) Per-frame NIQE scores
  x264, fast use case. According to NIQE, left image is
                      visually better.                                                                 Fig. 5. Music clip video sequence.

                                                                 The videos described above contained completely
                                                             black or dark frames. In these videos, NIQE had large
                                                             values mostly on these frames, which was the main reason
4.2 Cases with irrelevant results                            for the wrong overall quality score for the en-tire video.
                                                             The following examples demonstrate an-other case in
4.2.1 Dark scenes                                            which NIQE was not applicable to video quality
                                                             estimation.
    The metric was said to be not applicable to the
cartoons, but some other types of video content also         4.2.2                           Noisy scenes/scenes with lots of details
had inaccurate NIQE scores. One of the most fre-
quent cases in video sequences with completely black             A number of cases where the metric took wrong
frames (for example, in the beginning). These frames,        values appear in videos with noise or a lot of small and
according to NIQE, are perceptually worse than the           textured details, like sand, water waves and grass. For
other frames and has an extremely high metric score.         x265-encoded Bay time-lapse sequence, NIQE showed
This might happen because of the absence of such kind of     worse score on 2000 kbps than on 1000 kbps in uni-
content in training data used for NIQE creation.             versal use case (Fig. 6). This video contained a scene
                                                             with water and grass, and the grass and waves on the
    For example, for x264 encoding NIQE showed               water are smoother in a lower-bit rate video stream.
worse score on 2000 kbps than on 1000 kbps at Fire               In another example, NIQE showed worse score on
video sequence (Fig. 4). It contains close shooting of a     4000 kbps than on 2000 kbps in ripping use case on
fire in a dark. In this sequence, the metric showed          Playground video sequence for both encoders. This
better scores on a group of frames where the camera          video contains a lot of bright frames with highly struc-
started a slow movement.                                     tural and detailed grass and sand. Such texture is
    Another example which demonstrates this issue is         quite complicated for compression, and on low bit
presented in Fig. 5. Music Clip video sequence was           rates, there were visible compression artifacts, but
quite complicated for many encoders in MSU compar-           NIQE had a worse score on high bit rates (Fig. 7).
ison. It consists of short scenes which quickly switch       This happened due to NIQE perception of finely tex-
and a lot of special effects, such as red sparkles and       tured grass as noise, while blurred compressed grass
grain. NIQE shows unnatural results on this sequence for     was expected to be visually better by NIQE. This is
all use cases: the rate-distortion curve is not mono-tonic   why the rate-distortion curve looks inverted on bit
because of an anomaly big values on dark frames.             rates higher than 2000 kbps.
                      -3.8                                                                                           0                                                                          -5.5
                                                                                                                     -5                                                                        -5.75


                                                                                            NIQE (inversed), Y


                                                                                                                                                                          NIQE (inversed), Y
 NIQE (inversed), Y


                       -4                                                                                        -10                                                                              -6
                                                                                                                 -15


                                                                                                                                                             Better


                                                                                                                                                                                                                                   Better
                                                                                                                                                                                               -6.25
                      -4.2                                                                                       -20


                                                                                  Better
                                                                                                                                                  x264                                          -6.5
                                                                                                                 -25                                                                                             x264
                                                                                                                                                  x265                                         -6.75
                      -4.4                                                                                       -30                                                                                             x265
                                                                        x264                                     -35                                                                              -7
                                                                        x265                                     -40                                                                           -7.25
                      -4.6                                                                                                  2.5     5       7.5    10                                                     5          10

                                                          compression.ru/video/                                                  Bitrate, Mbps                                                         Bitrate, Mbps
                      -4.8
                              2       4           6           8       10
                                                                                                    (a) Original rate-distortion                                          (b) Rate-distortion graph after
                                              Bitrate, Mbps                                                   graph.                                                            smart averaging.

            Fig. 6. Rate-distortion graph for Bay time lapse.                                                                     Fig. 8. Forest dog video sequence.


                      -4.2                                                                                                 -8
                                                                                                                          -8.5
 NIQE (inversed), Y


                      -4.4


                                                                                                NIQE (inversed), Y
                                                                                                                           -9
                      -4.6                                              x264
                                                                                  Better

                                                                        x265


                                                                                                                                                                                                                          Better
                                                                                                                          -9.5
                      -4.8
                                                                                                                          -10
                       -5                                                                                                                                                                                     x264
                                                                                                                     -10.5
                                                                                                                                                                                                              x265
                      -5.2                                                                                                -11
                                                          compression.ru/video/                                                                                                       compression.ru/video/
                      -5.4                                                                                           -11.5
                                  2       4           6           8        10                                                           2               4             6                           8       10
                                              Bitrate, Mbps                                                                                                 Bitrate, Mbps

                      Fig. 7. Rate-distortion graph for Playground.                                                  Fig. 9. Rate-distortion graph for Music clip after
                                                                                                                                     smart averaging.
4.3 Proposed processing technique
                                                                                           5. Correlation with subjective scores
    During the analysis of per-frame NIQE results, it was
                                                                                               The obtained NIQE quality scores were compared to
revealed, that values greater than 40 don’t usually appear
                                                                                           the subjective scores on part of test videos. A
in most of the video frames. Extreme values often occur
                                                                                           pairwise subjective comparison was conducted as one of
in solid-colored or dark frames. We pro-posed and applied
                                                                                           the parts of 2018 MSU Video-Codec Compari-son,
a special averaging technique to eliminate these cases.                                    where a total of 22542 valid answers were re-ceived
Our NIQE score for the video V was computed in the follo ∑                                 from 473 subjects. The detailed description and
wing way:                                                                                  methodology can be found in the report [4]. Five videos
                                mi ∗ ki
                ScoreV = i∑             , i ∈ [0, N ],                                     were used in this comparison, and none of them
                                 i ki
                                                                                          contained animated scenes or black frames for which
             
             1, mi ∈ [0, 15),                         (1)                                 NIQE could show inaccurate results. In addition,
        ki = −0.04 ∗ mi + 1.6, mi ∈ [15, 40),                                              several full-reference quality metrics were measured
             
             
               0, mi ∈ [40, +∞), where                                                     (SSIM, PSNR, VMAF and their variations). The
                                                                                           Pearson correlation coefficient was calculated for the
    mi – NIQE score for frame i,                                                           results on each video separately (Fig. 10). The av-
    ki – weighting coefficient for mi score,                                               eraged correlation scores across all videos reveal that
    N – number of frames.                                                                  NIQE has the lowest correlation with subjective scores
    The proposed averaging formula helped to im-                                           (0.85) while VMAF v.0.6.1 for phones has the high-
prove NIQE scores for some of the video sequences.                                         est correlation (0.99). It should also be noted that at
The following results demonstrate the corrected rate-                                      the moment NIQE has even lower correlation to
distortion curves, which can be compared to the orig-                                      subjective quality than PSNR (0.98), which is long
inal results presented above.                                                              considered to have low similarity to subjective quality
    With the proposed averaging technique rate-                                            for compression algorithms comparison.
distortion curve for Forest dog doesn’t contain out-                                           The lowest correlation of NIQE with subjective
lying points (Fig. 8b).        Another example, where                                      scores was obtained for Playground video sequence.
the results were corrected by the proposed averag-ing                                      As it was described above for this video sequence,
for both encoders, is Music clip video sequence (Fig.                                      NIQE showed worse scores for detailed textures (grass
9). The non-monotonic curve of x264 encoding was                                           and sand) in this video sequence, which is illustrated in
caused by high spatial complexity of this video.                                           Fig. 11.
                        1
                                                                                      NIQE despite the high bit rate of the encoded video,
                                                                                      which leads to incorrect results. At the same time, in the
                       0.9
 Pearson corr. coef.


                                                                                      original paper, NIQE was said to be not appli-cable to
                                                                                      computer graphics, but in our investigation, it was found
                       0.8
                                                                                      that the metric works for some types of animation
                                                                                      (particularly for a screen capture of video gaming).
                       0.7


                       0.6
                                                                                      7. Acknowledgments
                       0.5
                             NIQE, Y      PSNR, Y    VMAF       SSIM, Y     VMAF         Special thanks to Georgiy Osipov who helped to
                                                    v0.6.1, Y               v0.6.1    analyze all detected issues and improved NIQE imple-
                                                                           Phone, Y
                                                                                      mentation in MSU VQMT. This work was partially
                             Crowd Run        Ducks Take Off    Mountain Mike
                             Playground       Red Kayak                               supported by the Russian Foundation for Basic Re-
                                                                                      search under Grant 19-01-00785a.
                        Fig. 10. Correlation between objective quality
                                 metrics and subjective scores.                       8. References

                                                                                      [1] Cisco Report VNI                           2017-2022,
                                                                                          2018             update             https://www.
                                                                                          cisco.com/c/en/us/solutions/collateral/
                                                                                          service-provider/visual-networking-
                                                                                          index-vni/white-paper-c11-741490.html
                                                                                      [2] Crowd-sourced subjective quality evaluation platform
                                                                                          subjectify.us
                                                                                      [3] HEVC        Video     Codec     Comparison      2018
                                                                                          (Thirteen MSU Video Codec Comparison)
                                                                                          http://compression.ru/video/codec_      comparison/
                                                                                          hevc_2018/
                                                                                      [4] HEVC Video Codec Comparison 2018 (Thir-
                                                                                          teen MSU Video Codec Comparison), Part II:
                                                                                          FullHD Content,         Subjective Evaluation http://
                                                                                          compression.ru/video/codec_ comparison/hevc_2018/
                                                                                          #subjective_                                    report
                                                                                          MathWorks Documentation: Naturalness Image
                                                                                      [5]
                                                                                          Quality Evaluator (NIQE) no-reference image
                                                                                          quality     score    https://www.mathworks.com/help/
 bitrate: 2000 kbps                                  bitrate: 4000 kbps                   images/ref/niqe.html
 NIQE = 3.24                                         NIQE = 4.40                      [6] MSU Quality Measurement Tool: Download Page
                                                                                          http://compression.ru/video/quality_         measure/
     Fig. 11. Frame 58 from Playground video sequence,                                    vqmt_download.html
    codec: x265, ripping use case. According to NIQE, left                            [7] C. Chen, S. Inguva, A. Rankin, and A. Kokaram, “A
                    image is visually better.                                              subjective study for the design of multi-
                                                                                           resolution ABR video streams with the VP9
                                                                                           codec,” in Electronic Imaging, 2016(2), pp. 1-5.
6. Conclusion
                                                                                       [8] A. Mittal, R. Soundararajan, and A. C. Bovik,
    During the experiments, NIQE showed good re-                                           “Making a «completely blind» image quality an-
sults for most of the videos. But still, there are many                                    alyzer,” in IEEE Signal Processing Letters, 2012,
cases for which the metric is not applicable. This is                                      20(3) pp. 209-212.
why NIQE is not universal and can not be used in                                       [9] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-
video-codec comparisons at the moment. The results of                                      reference image quality assessment in the spatial
this comparison show NIQE deficiencies that need to                                        domain,” in IEEE Transactions on Image Process-
be corrected, such as an application to animated                                           ing, 2012, 21(12), pp. 4695-4708.
cartoons, videos with completely black and solid-                                     [10] A. K. Moorthy and A. C. Bovik, “Blind image
colored frames, noise and highly detailed/textured                                         quality assessment: From natural scene statistics to
frames. For example, the abundance of fine details                                         perceptual quality,” in IEEE Transactions on Image
(grass, sand, grain effects) increases the values of                                       Processing, 2011, 20(12), pp. 3350–3364.
[11] M. Saad, A. C. Bovik, and C. Charrier, “Blind
    image quality assessment: A natural scene statis-tics
    approach in the DCT domain,” in IEEE
    Transactions    on    Image     Processing,   2012,
    21(8), pp. 3339–3352.
[12] H. Tang, N. Joshi, and A. Kapoor, “Learning a
    blind measure of perceptual image quality,” in
    IEEE CVPR, 2011, pp. 305-312.
[13] C. Wang, S. Li, and W. Zhang, “COME for
    No-Reference Video Quality Assessment,” in 2018
    IEEE Conference on Multimedia Information Pro-
    cessing and Retrieval (MIPR), 2018.
[14] P. Ye, J. Kumar, L. Kang, and D. Doermann,
    “Unsupervised feature learning framework for no-
    reference image quality assessment,” in 2012 IEEE
    Conference on Computer Vision and Pattern
    Recognition, Jun. 2012, pp. 1098–1105.
[15] L. Zhang, L. Zhang, and A. C. Bovik, “A feature-
    enriched completely blind image quality evalua-
    tor,” in IEEE Transactions on Image Processing,
    2015, 24(8), pp. 2579-2591.