Comprehensive Survey on Video Denoise Methods
Zhanming Bi
International College, Guangxi University, Nanning, China
261052@qq.com

                Abstract
                Today we have entered a smart information age on a variety of carriers, especially images and
                videos as carriers of information are widely used in our lives. There is a strong demand for
                clearer images and more visual enjoyable videos. However, in modern society, images and
                videos are inevitably affected by noise, which leads to a deterioration in image quality. We
                therefore need to reduce noise without losing as much of the image characteristics (edges,
                corners, resolution) as possible. So far, researchers have proposed a considerable number of
                classes of noise removal methods, each with its own advantages and disadvantages. In this
                paper, we present several traditional denoising methods as well as neural network-based
                denoising methods, supplemented by additional denoising methods based on human vision.
                We present the main formulations of each algorithm and give experimental results, as well as
                analyzing the advantages and disadvantages of them. We conclude with ideas for future trends
                in denoising methods finally.

                Keywords
                Video and Image Noise, Denoise Methods, CNN, HVS, Perceptual Quality

1. Introduction
    Video noise is the random variation in brightness or color of an image produced by a sensor, scanner
circuit or digital camera. Video noise also originates from point noise in film grain size and invariant
quantum detectors. Video noise is often seen as an unwanted component of image acquisition.
According to the different characteristics and distributions of different noise, video noise is divided into
the following categories: Gaussian noise, Poisson noise, multiplicative and additive noise and pretzel
noise [1]. For example, Gaussian noise is a class of noise whose probability density function follows a
Gaussian distribution, and Poisson noise is a noise model that follows a Poisson distribution. In general,
most video noise from natural signals satisfies the Gaussian distribution. Therefore, lots of research are
conducted on the image and video Gaussian type noise. In order to reduce the image noise, numerous
works are proposed.
    To address the problem of image denoising, by presenting a novel approach on using adaptive
principal components for image denoising, D. Darian Muresan et.al improved the image visual fidelity
[2]. Antoni Buades has proposed a new method, the non-local means (NL-means), which is an image
denoising method based on the non-local averaging of all pixels in the image. This method effectively
improves local smoothing filters [3]. Chunwei Tian uses machine learning methods in image denoising,
and he proposed a novel method called enhanced convolutional neural denoising network (ECNDNet).
This new approach uses residual learning and batch normalization techniques to address training
difficulties and accelerate network convergence. [4].
    In addition to solving the image denoising problem, video denoising is also a point of interest.
Michele Claus proposed ViDeNN (CNN for Video Denoising): a CNN (Convolutional neural network)
for video denoising that combines spatial and temporal filtering, learning to perform spatial denoising
for this method without prior knowledge of the noise distribution (blind denoising) [5]. It has been
shown through extensive experimental data that the algorithm has the same excellent results as other
state-of-the-art algorithms. Andrea Petreto also proposed a new real-time embedded video denoising
algorithm(RTE-VD), which can denoise the video in real-time [6], while its output image quality is
worse than the other advanced algorithms including the VBM3D [7]. Matias Tassano et al. developed
DVDnet (Deep Video Denoising Network): a fast network for deep video denoising based on a CNN


Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
architecture, an approach that significantly reduces computation time as well as memory footprint and
is able to handle a wide range of noise levels with a single network model [8].
    Although there are many noise reduction methods for images and videos above, all these methods
have their own application characteristics. On the one hand, in the traditional video image filtering
methods, some methods do not take into account the different content of different regions and use the
same filtering parameters in the whole image and video, which may lead to large differences in regional
filtering effects. Then, for example, some videos use time-domain filtering methods, but the methods
are too complex not easy to implement in real-time. On the other hand, the noise reduction methods
based on neural network learning are usually only applicable to limited data sets and are difficult to be
adaptive according to the content. In addition, most of the methods do not start from the perspective of
human eye vision, and most of them start from the perspective of statistical signals, and the filtering
results may not be the best noise reduction for human eyes.
    This paper investigates the video denoising methods proposed in the past three years by analyzing
the above methods and deeply analyzes the advantages and disadvantages as well as the applicability
of each video denoising method. And based on this research, we predict the future development trend
of video denoising algorithms and point out the possible future development direction of video
denoising to help solve the above problems now.
    The remainder of this paper is organized as follows. In Section Ⅱ, we present and analyze some
traditional video and image denoising methods, while showing experimental results. In Section Ⅲ,
neural network-based denoising methods and show experimental results are presented and analyzed. In
the IV Section, we show some human-eye vision-based denoising methods and demonstrate the
experimental results. Finally, we conclude with a summary of the above algorithms and give our
expectations for the future in the Section Ⅳ.

2. Traditional Video Denoise Methods Analysis
   In this section, we will analyze the traditional methods of image denoising, namely the NL-means,
and the deep analysis is given finally. Firstly, NL-means is defined by the formula,
                                                            (𝐺𝑎∗|𝑢(𝑥+.)−𝑢(𝑦+.)|2 )
                                        1    𝛺 −
                            𝑁𝐿[𝑢][𝑥] = 𝐶(𝑥) ∫−𝛺 𝑒                   ℎ2               𝑢(𝑦)𝑑𝑦           ()
                                   (𝐺𝑎∗|𝑢(𝑥+.)−𝑢(𝑦+.)|2 )
                         𝛺 −
   where 𝑥 𝜖𝛺and 𝐶(𝑥) = ∫−𝛺 𝑒              ℎ2        𝑑𝑧, respectively. 𝐶(𝑥) is the normalization function,
𝐺𝑎 is the Gaussian kernel, and h is the filtering parameter. This equation means that the denoised value
at 𝑥 is the average of all points in the Gaussian neighborhood of 𝑥. Experimental results have proven
that the NL-means method can denoise efficiently as shown in Figure 1. Figure 1(a) shows the original
image, and Figure 1 (b) shows the image after processing by NL-means algorithm. From the intuitive
perspective of the human eye, we can find that the NL-means algorithm is still at a high level of detail
retention and structure restoration, and handles noise very well.


                  (a) Original Image                         (b) NL-means Filtered Image
Figure 1: The filter results of NL-means method [3]
    Secondly, expect for the image denoise method above, we will introduce a video denoising algorithm
as the RTE-VD method. The RTE-VD is defined by the formula,
                                                                                      2
                                                                    −[𝐼𝑡 (𝑥𝑖 )−𝐼𝑡 (𝑥)]         −[𝑥𝑖 −𝑥]2
                                    1                                          2                 2𝜎2
                           𝐼𝑓 (𝑥) = 𝑊 ∑𝑥𝑖∈Ω 𝐼𝑡 (𝑥𝑖 )𝑒                      2𝜎𝑖
                                                                                          ∗𝑒         𝑑              ()
                                     𝑝

where,
                                                                                 2
                                                               −[𝐼𝑡 (𝑥𝑖 )−𝐼𝑡 (𝑥)]          −[𝑥𝑖 −𝑥]2
                                                                          2                  2𝜎2
                              𝑊𝑝 = ∑𝑥𝑖∈Ω 𝐼𝑡 (𝑥𝑖 )𝑒                    2𝜎𝑖
                                                                                     ∗𝑒          𝑑                  ()
and,
                                                               2                                            2
                                          −[𝐼𝑡 (𝑥𝑖 )−𝐼𝑡 (𝑥)]                              −[𝐼𝑡 (𝑥𝑖 )−𝐼𝑡 (𝑥)]
                                                 2𝜎2                                             2𝜎2
                           𝐼𝑡 = 𝐼𝑝 (𝑥)𝑒              𝑖             + 𝐼(𝑥)(1 − 𝑒                      𝑖          )   ()
   where 𝐼𝑓 is filtered image, 𝐼𝑝 is the previous image, 𝐼 is the current image, 𝑥 is the coordinates of the
image, 𝛺 is the filter kernel, 𝜎𝑖 , 𝜎𝑑 and 𝜎𝑡 are the smoothing parameters. Experimental results have
proven that the RTE-VD method can denoise efficiently as shown in Figure 2. Figure 2 (a) shows the
original image, and Figure 2 (b) shows the image after processing by RTE-VD algorithm. What we can
know from the experiments is that it is able to recover details and achieve performance improvements
on very noisy videos.


                 (a) Original Image                                                            (b) Filtered Image
Figure 2: The filter results of RTE-VD method [6]
    As a conclusion, among the image and video denoising methods above, although they achieve
acceptable results, there still exist weakness. On one hand, NL-means algorithm can play a good role
in resisting noise while processing details and displaying fine structures. But nonlocal algorithms work
only in similar windows and in restricted scales with special characteristics. On the other hand, RTE-
VD is not only able to recover details perfectly on very noisy videos in real-time denoising, but also has
70 times better performance than other algorithms. However, its denoising results are worse than other
advanced filters.

3. Learning Based Video Denoise Methods Analysis
   In this section, we will analyze the methods of image denoising which are based on machine learning
including ECNDNet and ViDeNN. Firstly, we will elaborate the ECNDNet method . The main
algorithm of ECNDNet,
                                                1                                               2
                                   𝑙(𝑝) = 𝑁 ∑‖𝑓(𝑦𝑗 ; 𝑝) − (𝑦𝑗 − 𝑥𝑗 )‖                                               ()
  where 𝑝 is the parameters, 𝑦𝑗 is the 𝑗 th noisy image patch and 𝑥𝑗 is the 𝑗 th label image patch.
Expanding the convolutional network at layers 2, 5, 9, and 12 not only captures the information but also
reduces the computational cost. The whole network architecture is shown as Figure 3. In addition, the
residual learning enables this neural network to perform better in denoising.


Figure 3: Architecture of ECNDNet [4]
   Experimental results have proven that the ECNDNet can denoise efficiently as shown in Figure 4.
Figure 4 (a) shows the original image, and Figure 4 (b) shows the image after processing by the
ECNDNet. The experimental results can show that the ECNDNet algorithm not only improves the
network performance and makes the network easier to train while reducing the computational cost, but
also obtains clean images from the noisy images.


                  (a) Original Image                           (b) ECNDNet Filtered Image
Figure 4: The filter results of ECNDNet method.
    Secondly, neural networks can also be applied to video denoising, for example the ViDeNN is a
good case study. The structure of the proposed ViDeNN network is shown as Figure 5. Each frame is
passed through a spatially denoised CNN. The temporal CNN takes the three spatially denoised frames
as input and outputs a final estimate of the central frame. These two CNNs first estimate the noise
residuals, and then reduce them from the noisy input signal. ViDeNN consists of only convolutional
layers. The number of feature maps is written at the bottom of each layer. The number of feature maps
is written at the bottom of each layer.


Figure 5: Architecture of ViDeNN [5]
   Experimental results have proven that the ViDeNN algorithm can denoise efficiently as shown in
Figure 6. Figure 6 (a) shows the original image, and Figure 6 (b) shows the video after processing by
the ViDeNN algorithm. The data show that videos using the ViDeNN algorithm can be processed for
noise reduction without pre-analysis of the signal. And with ViDeNN using a simplified automatic
framework, video denoising can reduce the time and cost required to obtain clean video images simply
and efficiently.


                  (a) Original Image                                (b) ViDeNN Filtered Image
Figure 6: The filter results of ViDeNN method
    Neural networks can be of great advantage in both video and image denoising. ECNDNet is able to
improve network performance for fast image processing, and ViDeNN is able to blindly denoise any
video almost unconditionally. In short, they both perform the denoising task very well and stand out
among similar algorithms. However, they have great data dependency, which means they can only work
efficient when the training and test data are available for the applications. When the situation is changed,
the network should be re-trained, which will limit their practical application.

4. HVS Based VIDEO Denoise Methods Analysis
  In this section, we will analyze HVS (Human Visual System) based video denoise methods including
Multi-Scale multiscale structural similarity for image quality assessment (MS-SSIM) [9] and JND (Just-
Noticeable-Difference) based perceptual filter methods. Firstly, MS-SSIM is defined by the formula as,
                                           𝐿 = 𝛼𝐿1 + 𝛽𝐿𝑀𝑆𝑆𝑆𝐼𝑀                                           ()
where,                                 𝐿1 (𝑝) = ∑𝑁
                                                 𝑝𝜖𝑃|𝑦(𝑝) − 𝑦
                                                            ̂(𝑝)|                                       ()
and,                           𝐿𝑀𝑆𝑆𝑆𝐼𝑀 (𝑃) = [1 − 𝑀𝑆𝑆𝑆𝐼𝑀(𝑦(𝑝), 𝑦̂(𝑝))]                                  ()
    where 𝑝 represents a pixel in the whole image patch 𝑃 of the image patch, there are 𝑃 pixels in that
patch. 𝑦 and 𝑦̂ represent the real image and the restored image, 𝛼 and 𝛽 are two scalars. The MS-SSIM
metric can be used evaluate the image structure similarity, which will help to reserve the structure and
edge well. Experimental results have proven that the MS-SSIM method can denoise efficiently as
shown in Figure 7. Figure 7 (a) shows the original image, and Figure 7 (b) shows the image after
processing by MS-SSIM algorithm. From the experimental results, the method can retain the contrast
in the high frequency region while denoising in the structures and edges.


                                                                (b) MS-SSIM Filtered Image [9]
                  (a) Original Image
Figure 7: The filter results of MS-SSIM method
   Secondly, we can present the JND model to conduct the perceptual denoise [10]. The JND means
the just noticeable difference which human eyes can perceive. We can use JND model to enhance the
perceptual noise with contrast masking. filter the un-noticeable perceptual noise to reduce the image
high frequency information and complexity. The algorithms based on JND model is defined by the
formula,
                             𝑑𝑜𝑢𝑡 (𝑎, 𝑏) = 𝑒 ∗ 𝑉𝑀(𝑎, 𝑏) ∗ 𝐿𝐴(𝑎, 𝑏) ∗ 𝑑(𝑎, 𝑏)                          ()
   where 𝑑𝑜𝑢𝑡 (𝑎, 𝑏) and 𝑑(𝑎, 𝑏) are the output of the noise reduction and the output of the contrast
enhancement of the noise-aware detail layer, respectively. 𝑒 is a control parameter for the degree of
noise reduction. 𝐿𝐴(𝑎, 𝑏) is the ratio of JND thresholds to calculate and 𝑉𝑀(𝑎, 𝑏) is the visual masking.
Experimental results have proven that the JND model method can denoise efficiently as shown in Figure
8. Figure 8 (a) shows the original image, and Figure 8 (b) shows the image after processing by JND
model. Experimental results show that the model enhances the contrast of low-light images while
successfully minimizing distortion and preserving detail. It means that we can get a low-filtered image
with almost the same visual quality.


                  (a) Original Image                             (b) JND model Filtered Image
Figure 8: The filter results of JND model method [10]
   Different from the filter methods above, the HVS base filter methods mainly focus on the human
perceptual noise rather than the statistic noise. Usually the human visual characteristics are considered
and explored to guide the filter model and procedure. There methods can help to obtain filtering results
for better subjective quality rather than the objective quality focused by the other methods.

5. Conclusion
    In this paper, we present the NL-means method, this algorithm is a non-local average image
denoising method based on all pixels in the image, which effectively improves the local smoother. Also
for image denoising, we can use the non-traditional neural network denoising method ECNDNet, which
uses neural networks as well as residual learning to solve difficulties and speed up the fusion of the
network.
    For research in video denoising, we present the RTE-VD algorithm, a method that can denoise video
in real-time, although its output image quality is slightly inferior to other advanced algorithms. However,
the use of neural networks in the field of video denoising can lead to even better results, and to this end
we present ViDeNN. The algorithm that uses CNNs for video denoising, which combines spatial and
temporal filtering and has excellent results in the field of blind denoising.
    We also present and analyze denoising methods in human vision, including the MS-SSIM algorithm
and the JND model algorithm. The algorithms that focus on human vision both achieve the visual
friendly low-filtered image quality, instead of only the objective filter performance.
    From the research in this paper, we can see that each class of noise reduction methods has its own
characteristics, but these methods still do not achieve enough generality and have their own limitations.
Combined with the current research results, we believe that in the future, video and image noise
reduction methods can be further studied in depth from the following aspects. On the one hand, for
videos and images viewed by the human eye, the filtering methods should take into account the visual
model as much as possible, so that the final filtering result obtains the best subjective quality. On the
other hand, for videos and images that are analyzed by machines, the filtering method should be
combined with the purpose of machine analysis such as face recognition and so on. In this way the best
balance of model complexity and machine analysis efficiency can be achieved.
References
[1] Sun P, Wang C, Li M, et al. Partial Differential Equations-Based Iterative Denoising Algorithm
     for Movie Images[J]. Advances in Mathematical Physics, 2021, 2021.
[2] Muresan D D, Parks T W. Adaptive principal components and image denoising[C]//Proceedings
     2003 International Conference on Image Processing (Cat. No. 03CH37429). IEEE, 2003, 1: I-101.
[3] Buades A, Coll B, Morel J M. A non-local algorithm for image denoising[C]//2005 IEEE Computer
     Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005, 2: 60-
     65.
[4] Tian C, Xu Y, Fei L, et al. Enhanced CNN for image denoising[J]. CAAI Transactions on
     Intelligence Technology, 2019, 4(1): 17-23.
[5] Claus M, van Gemert J. Videnn: Deep blind video denoising[C]//Proceedings of the IEEE/CVF
     Conference on Computer Vision and Pattern Recognition Workshops. 2019: 0-0.
[6] Petreto A, Romera T, Lemaitre F, et al. A new real-time embedded video denoising
     algorithm[C]//2019 Conference on Design and Architectures for Signal and Image Processing
     (DASIP). IEEE, 2019: 47-52.
[7] Makkar H, Lamba O S. An improved VBM3D filtering technique for removal noise in video
     signals[J]. European Journal of Advances in Engineering and Technology, 2017, 4(8): 584-
     591.Tassano M, Delon J, Veit T. Dvdnet: A fast network for deep video denoising[C]//2019 IEEE
     International Conference on Image Processing (ICIP). IEEE, 2019: 1805-1809.
[8] Tassano M, Delon J, Veit T. Dvdnet: A fast network for deep video denoising[C]//2019 IEEE
     International Conference on Image Processing (ICIP). IEEE, 2019: 1805-1809.
[9] Uddin A F M S, Chung T, Bae S H. A perceptually inspired new blind image denoising method
     using $ L_ {1} $ and perceptual loss[J]. IEEE Access, 2019, 7: 90538-90549.
[10] Su H, Jung C. Perceptual enhancement of low light images based on two-step noise suppression[J].
     IEEE Access, 2018, 6: 7005-7018.