=Paper=
{{Paper
|id=Vol-2304/00030081
|storemode=property
|title=Deep Learning Based Super-Resolution
|pdfUrl=https://ceur-ws.org/Vol-2304/00030081.pdf
|volume=Vol-2304
|authors=Fernando Zapata Barron,Jose Manuel Mejia Muñoz,Boris Jesus Mederos Madrazo,Leticia Ortega Maynez
}}
==Deep Learning Based Super-Resolution==
Deep Learning based Super-Resolution Fernando Zapata Barron, Jose Manuel Mejia Muñoz, Boris Jesus Mederos Madrazo, and Leticia Ortega Maynez Universidad Autónoma de Ciudad Juárez Departamento de Ingenierı́a Eléctrica y Computación Avenida del Charro No. 450 Norte, Ciudad Juárez, Chihuahua, México al164459@alumnos.uacj.mx, {jose.mejia,boris.mederos,lortega}@uacj.mx http://www.uacj.mx Abstract. We propose a method for image super-resolution with basis on deep learning. This method makes use of a convolutional neural net- work to find the similarities between low-resolution and high-resolution patches of an image and learn a mapping between them. The network is capable of outputting a high-resolution image, taking a low-resolution image as an input, it can handle three color channels, and it’s performant enough for use in real-time systems. Keywords: Super-Resolution · Deep Learning · Convolutional Neural Network 1 Introduction It’s desirable to make use of high-resolution images on applications that require the use of digital imaging, given how these can contain details critical for various applications[15] in comparison to their low-resolution equivalents, as seen in Fig- ure 1. A doctor can rely on a high-resolution image to make a correct diagnosis, objects can be easily distinguished on a high-resolution satellite image[11], or a pattern recognition algorithm can achieve a higher effectiveness when provided with high-resolution image samples. The simplest methods to augment the resolution of a captured image are either diminishing the size of the pixel, or enlarging the image sensor itself. However, these approaches can prove to be prohibitively costly on large scale applications, or on those requiring a high degree of precision. Because of this, it’s preferable to apply an algorithmic approach which doesn’t depend on the development of new sensors and allows for the use of already existing image capture systems. For this purpose, there exists a method known as super-resolution. Although there are many super-resolution techniques, most are based on the same basic idea: make use of the information contained within a multitude of distinct images of the same scene to reconstruct a new, higher resolution image [10]. This is unlike traditional image scaling methods, which synthesize details based on the existing information within a single image[14]. 82 F. Zapata et al. (a) Low-resolution (b) High-resolution Fig. 1: Example of a low-resolution image compared to a high-resolution sample of the same scene. In [6], one of the earliest spatial domain super-resolution methods is proposed, this is, a method that works directly on the pixels of an image. Expanding upon this, [7] presents a computationally inexpensive method that makes super- resolution viable in current computational systems. On the other hand, [8] a method is proposed, making use of a pre-established dictionary of images as additional information to perform the process of super-resolution, what is now known as example based super-resolution. Additionally, [9] proposes an evolution on this premise, now constructing a dictionary of patches extracted from a single image. In this paper, we propose an architecture to increment the resolution of an in- put image. This is achieved with a neural network of convolutional layers capable of implicitly learning the dictionary of image patches typically used on exam- ple based super-resolution, in addition to performing the image reconstruction within the network itself. The rest of the paper is organized as follows: section 2 reviews the theory of super-resolution and convolutional networks, section 3 describes the proposed architecture, section 4 show the obtained results, and finally, section 5 offers a conclusion. 2 Theory In this section we review the super-resolution and convolutional network theory. 2.1 Image super-resolution As mentioned previously, super-resolution is a process that generates one or more high-resolution images from a set of low-resolution images[10,7]. These images are subsampled and displaced with a subpixel precision. This allows using the information contained on each of these images to obtain a higher resolution one, as shown in Figure 2. If the images were displaced on integer units, they would contain the same information as one another and they wouldn’t be useful for the reconstruction of high-resolution images[15]. Deep Learning based Super-Resolution 83 Fig. 2: The subpixel displacement provides the complementary information be- tween multiple low-resolution images, making the reconstruction of a high- resolution image possible[3]. There exist multiple methods for super-resolution: from the earliest ones, based on the displacement and aliasing properties of the Fourier transform[18], to more modern ones, like the previously mentioned example based super-resolution, which performs the process bases on a preset database of image patches gener- ated via the degradation of high-resolution images[9,8], or single image super- resolution, a method based on the similarities between patches extracted from a single image[9]. 2.2 Convolutional neural networks A neural network is a system capable of learning to perform a task through the analysis of samples. For example, in the case of image classification, the samples could consist of images of different objects along with the label identifying the type of object they contain. The principal components of a neural network are the nodes called neurons, which work based on the principles of the perceptron, and are inspired on the working of biological neurons[16]. These neurons are interconnected, allowing them to send signals between themselves, in the form of real numbers. These signals are calculated on function of the sum of the signals a neuron receives multiplied by its weights, values which modify the strength of the input signals and are adjusted during the training phase, making learning possible. The most common way of modelling a neural network is in the shape of an acyclical graph[17], this is, the neurons are connected in such a way the outputs of certain neurons can act as the input of other ones without creating cycles between them. These connections are organized in layers: sets of neurons whose inputs are connected to the outputs of the previous layer, and whose outputs are connected to the inputs of the next layer[2]. A special type of network is the convolutional neural network, which assume the input signal are meant to represent an image, which allows for the codification of certain characteristics within its architecture, in order to make it more efficient for image processing purposes. 84 F. Zapata et al. The principal component of the convolutional network is the convolutional layer. Unlike the layers of a traditional network, the convolutional layer’s neurons are connected only to a small part of the previous layer. This allows these layers to process data of larger sizes, as are images, without needing to operate on large amounts of weights, reducing their computational requirements. The way in which each neuron connects only to a small area of it’s input makes it so the operation of a layer is equivalent to the convolution of a filter conformed of the neuron’s weights over the input[13]. As such, convolutional layers are defined by four parameters: K, the amount of filters or neurons on the layer; F , the size per side of said filters; S, the stride, or the amount of pixels the filter moves for each step of the convolution; and P , the thickness of the zero padding added around the input. 3 Methods As previously mentioned, it’s possible to perform the process of super-resolution on a single image, through a technique similar to sample based super-resolution, making use of patches extracted from the image itself[8,9]. We propose a method capable of learning these patches through the use of a convolutional neural net- work. 3.1 Network architecture As seen in Figure 3, the network architecture is based on four convolutional layers, in addition to a scaling layer which repeats its input data so its output is twice the size of the input received. The figure shows the parameters used for each layer according to the notation described at the end of section 2. Once trained, this network is capable of generating a high resolution image based on a low resolution one given as input. Input Convolution Convolution Convolution Convolution Output Low-resolution K = 32 K = 16 K = 16 K=3 High-resolution 32×32×3 F=9 F=5 F=5 F=5 64×64×3 S=1 S=1 S=1 S=1 Scaling P=4 P=2 P=2 P=2 Fig. 3: The proposed architecture contains four convolutional layers, split by a scaling layer in the middle of the network. Deep Learning based Super-Resolution 85 This network performs the following tasks[5]: Patch extraction This operation extracts patches from the low-resolution im- ages and stores them as the weights of the neurons on the convolutional layer. High resolution-low resolution mapping This operation makes use of the scaling layer to establish a relationship between the low-resolution patches ex- tracted on the last operation with the high-resolution patches of the target image. High resolution image reconstruction This operation merges the high- resolution patches to reconstruct the high-resolution target image, which is ex- pected to be similar to the image of the original scene. The neural network was implemented using Keras, a high-level neural net- work library for the Python language[4], running on top of TensorFlow, a ma- chine learning framework[1]. 3.2 Training In order to learn the mapping between low-resolution and high-resolution, the training phase seeks to reduce the loss between the reconstructed high-resolution images and the original samples. The loss function utilized in this network is the Mean Squared Error, defined by: n 1X M SE = (F (Y ; Θ) − Xi )2 (1) n i=1 where n is the number of training samples, F (Y, Θ) is the set of reconstructed high-resolution images, X is the set of the original high-resolution images and Y is the set of low resolution ones. The loss between the images is minimized using Adam, a method for stochastic optimization with low computational and memory requirements[12]. The training data consists of a set of 2577 image patches extracted from a set of larger images, mostly captured from architectural or outdoors scenes. The set of low-resolution images is generated from these patches by decimating them, discarding 75% of their content. The high-resolution patches have dimensions of 256 × 256, while the generated low-resolution ones have half the size per side, at 128 × 128. Additionally, the images have 3 color channels, corresponding to red, green and blue. 4 Results 4.1 Learned filters The weights learned by a convolutional layer can be interpreted as a set of filters that, when convolved with the input data, activate due to the presence of specific 86 F. Zapata et al. visual characteristics, for example, borders, patterns or colors. An example of this can be seen on Figure 4, a visualization of a set of filters extracted from the first layer of the network. Fig. 4: Filters obtained in the first convolutional layer after performing the train- ing process. 4.2 Generated images As stated, the purpose of this neural network is generating a high-resolution image from a low resolution input. Specifically, the network is capable of dou- bling the resolution of the input image, and while it was trained using pairs of 128 × 128 and 256 × 256, in practice, the network can apply the super-resolution process over an image of any size, limited only by the computational and memory capacity of the system it executes on. Examples of the generated images can be seen on Figure 5 and Figure 6, each showing the original image, the low-resolution sample and the high-resolution image generated by the neural network. In comparison to the generated images, the low-resolution samples have a more pixelated appereance when scaled at double their size. (a) Original (b) Low-resolution (c) Generated Fig. 5: Original image, low-resolution image and high-resolution generated image from an architectural scene. Deep Learning based Super-Resolution 87 (a) Original (b) Low-resolution (c) Generated Fig. 6: Original image, low-resolution image and high-resolution generated image from a picture of a keyboard containing text. Furthermore, sample images were compared using PSNR and SSIM to mea- sure the difference between them. These are methods utilized to approximate the human perceived quality of an image in comparison to another one [19], and are traditionally used for the purpose of measuring the quality of compressed images for data transmission. PSNR is defined on a scale of decibels, with identical images having a mea- surement of 0, indicating the absence of noise, and 25 being considered an ade- quate quality for wireless transmission. On the other hand, SSIM is defined as a real number, with a maximum of 1, which indicates the compared images are identical. The comparison can be seen in Figure 7 and Table 1, which show a sample of the compared image along with the measurements obtained from the comparison with the generated high-resolution image. (a) (b) (c) (d) (e) Fig. 7: Images used for the comparison using PSNR and SSIM. 5 Conclusion We have proposed a method for image super-resolution based on deep convolu- tional networks. The proposed method is capable of learning a mapping between low resolution and high resolution images, while maintaining a simple architec- ture and achieving an adequate performance for use in real-time applications. 88 F. Zapata et al. Image MSE PSNR SSIM (a) 226.39168294 24.58219892 0.86677955 (b) 218.77011108 24.73092373 0.83722339 (c) 301.35599772 23.34000521 0.87755187 (d) 383.97914632 22.28772722 0.80628970 (e) 259.54080200 23.98874718 0.62289394 Table 1: Measurements obtained from the comparison of a set of original samples against the images generated through the neural network. This deep learning based method could be further refined through experi- mentation with different filter sizes, an expanded amount of layers, or the use of a different set of training data. In addition, the incorporation of other types of layers in the proposed network architecture could enable the network to perform other operations along with super-resolution, for example, image denoising, segmentation, feature recogni- tion, etc; or even further augment the resolution of the input image, by stacking the layers already present in this network. References 1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015), https://www.tensorflow.org/, software available from tensorflow.org 2. Burger, J.T.: A basic introduction to neural networks. http://pages.cs.wisc.edu/ ∼bolo/shipyard/neural/local.html (1996) 3. Choi, E., Choi, J., Kang, M.G.: Super-resolution approach to overcome physical limitations of imaging sensors: An overview. International Journal of Imaging Sys- tems and Technology 14(2), 36–46 (2004). https://doi.org/10.1002/ima.20006 4. Chollet, F., et al.: Keras. https://keras.io (2015) 5. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. pp. 184–199. Springer (2014) 6. Elad, M., Hel-Or, Y.: A fast super-resolution reconstruction algorithm for pure translational motion and common space-invariant blur. In: 21st IEEE Con- vention of the Electrical and Electronic Engineers in Israel. Proceedings (Cat. No.00EX377). pp. 402–405 (2000). https://doi.org/10.1109/EEEI.2000.924450 7. Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE transactions on image processing 13(10), 1327–1344 (2004) 8. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Computer graphics and Applications 22(2), 56–65 (2002) Deep Learning based Super-Resolution 89 9. Glasner, D., Bagon, S., Irani, M.: Super-resolution from a single image. In: 2009 IEEE 12th International Conference on Computer Vision. pp. 349–356 (9 2009). https://doi.org/10.1109/ICCV.2009.5459271 10. Infognition Co. Ltd.: What is super-resolution? http://www.infognition.com/ articles/what is super resolution.html (2009), accessed: 2017-01-31 11. Kim, S.P., Bose, N.K., Valenzuela, H.M.: Recursive reconstruction of high resolution image from noisy undersampled multiframes. IEEE Transactions on Acoustics, Speech, and Signal Processing 38(6), 1013–1027 (6 1990). https://doi.org/10.1109/29.56062 12. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014), http://arxiv.org/abs/1412.6980 13. LeCun, Y., Jackel, L., Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U., Sackinger, E., Simard, P., et al.: Learning algorithms for classification: A comparison on handwritten digit recognition. Neural networks: the statistical mechanics perspective 261, 276 (1995) 14. Nasrollahi, K., Moeslund, T.B.: Super-resolution: A comprehensive survey. Ma- chine Vision and Applications 25(6), 1423–1468 (2014) 15. Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image reconstruction: A technical overview. IEEE Signal Processing Magazine 20(3), 21–36 (2003) 16. Rosenblatt, F.: The perceptron, a perceiving and recognizing automaton. Cornell Aeronautical Laboratory (1957) 17. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Networks 61, 85 – 117 (2015). https://doi.org/https://doi.org/10.1016/j.neunet.2014.09.003 18. Tsai, R.Y., Huang, T.S.: Multi-frame image restoration and registration. Advances in computer vision and Image Processing 1, 317–339 (1984) 19. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Process- ing 13(4), 600–612 (April 2004). https://doi.org/10.1109/TIP.2003.819861