Image Colorization D. M. Mikhalina1, A. A. Kuzmenko1, K.V. Dergachev1, V. A. Shkaberin1 mikhalinadasha97@gmail.com | alex-rf-32@yandex.ru | kv.dergachev@gmail.com | vash@tu-bryansk.ru 1 Bryansk State Technical University, Bryansk, Russian Federation The article discusses one of the latest ways to colorize a black and white image using deep learning methods. For colorization, a convolutional neural network with a large number of layers (Deep convolutional) is used, the architecture of which includes a ResNet model. This model was pre-trained on images of the ImageNet dataset. A neural network receives a black and white image and returns a colorized color. Since, due to the characteristics of ResNet, an input multiple of 255 is received, a program was written that, using frames, enlarges the image for the required size. During the operation of the neural network, the CIE Lab color model is used, which allows to separate the black and white component of the image from the color. For training the neural network, the Place 365 dataset was used, containing 365 different classes, such as animals, landscape elements, people, and so on. The training was carried out on the Nvidia GTX 1080 video card. The result was a trained neural network capable of colorizing images of any size and format. As example we had a speed of 0.08 seconds and an image of 256 by 256 pixels in size. In connection with the concept of the dataset used for training, the resulting model is focused on the recognition of natural landscapes and urban areas. Keywords: ResNet, convolutional neural network, CIE Lab, Place 365, image colorizing. 1. The problem of a vanishing gradient is the effect of multiplying 1. Introduction n small numbers from the activation function to compute gradients in n-layer network, meaning that the gradient (error Nowadays, data processing automatisation is a globally urgent signal) decreases exponentially with n, thus, the front layers are task. One of the directions is to automate colorizing monochrome trained very slowly. (black and white) images. Most of coloring is now done manually, 2. CNN usually have a great number of parameters in their models which makes this process extremely time-consuming and which increase complexity, so training takes much more time. expensive. For developing the software system of image colorization we Image colorization is a fundamental problem of computer studied a number of libraries: graphics and machine learning. In recent years, there have been  OpenCV ia a library of computer vision algorithms, image many successful works in this area. For example, in 2011 ILSVRC processing and numerical algorithms. reached a good error-rate classification, which was 25%. In 2012 AlexNet was developed [1]. This is the first model based on 8  NumPy is a library for Python, a programming language, with optimized computational algorithms for working with convolution neural networks (CNN). AlexNet got 16% of errors in multidimensional data arrays. ImageNet call. In the next couple of years, VG 19 [2] with 19 layers and GoogleNet [3] with 22 layers reduced the error rate to a few  PyTorch is a machine learning library for Python that is used percent. for natural language processing. Although CNN made some breakthrough in accuracy, they are To conduct the study, a convolutional neural network was difficult to be trained for a number of reasons. chosen, the result of which is an output image with segmented objects written in Python. Fig. 1. Segmentation of image objects The architecture of the neural network is based on ResNet-18 the number of stacked layers enriches the feature "levels". Stacked structure. layer is crucial. The main difference of ResNets is that it has connections The main problem when collapsing a deep network is a rapid parallel to conventional convolutional layers. These connections deterioration of learning accuracy with increasing the network are always active, and the gradients can easily propagate through depth. To overcome this problem, Microsoft introduced a deep them, resulting in faster learning. ResNet with 152 layers achieves "residual" learning structure. Instead of believing that every few the best results with an error rate of 3% [4]. This type of deep stacked layers directly correspond to the desired main view, they convolutional network exceeds the human level of image explicitly allow these layers to correspond to the "residual" ones. classification. It allows low-, medium -, and high-level features to Formula F(x) + x can be implemented using neural networks with be extracted in an end-to-end multilayer manner, and an increase in connections for quick access. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Fig. 2. Layers preliminary preparation of the neural network was carried out using 2. Approach to image restoration ResNet-18 model (ResNet is a deep neural network, 18 is a number of layers) to enter images in gray shades. Let us consider images of H × W size in the colorspace CIE L At the output of ResNet layers, a matrix of reduced size is * a * b *. [5]. Starting with the brightness component XL = RH = obtained. It depends on the original image size, then the W = 1, the goal of our model is to estimate the remaining deconvolutional operation is applied to this matrix. This procedure components to generate the full color version X=∈ R H=× W=×. is performed by alternating convolutional and upsampling layers. 𝐹: 𝑋𝐿 → (𝑋̃𝑎 , 𝑋 ̃𝑏 ). Upsampling of the matrix occurs in this way: the input matrix In this paper we assume that there is an image F described by increases its size by duplicating its elements according to the size the equationwhere X˜a, X˜b are components a *, b * of the restored of the scan core due to the fact that each matrix element is projected image, which together with the input give assessed colour image into a larger matrix. X˜ = (XL, X˜a, X˜b). Then we get ResNet-18-Gray model, capable of working with In order not to depend on the size of the input data, the images in gray shades. architecture is entirely based on CNN model [6]. In short, a The convolutional neural network has a multilayer structure convolutional layer is a set of small trainable filters that correspond and consists of convolutional, sampling and umsampling layers. to specific local patterns in the input image. Layers close to the The input is a black and white image. The ResNet architecture input look for simple patterns, such as contours, and layers close to is designed in such a way that it can only process images that are the output extract more complex elements [6]. multiples of 255. To solve this problem, a program was written in As it was mentioned above, the system selects the colorspace Python that increases the size of the image to the required (multiple CIE L * a * b * to represent the input imageswhere L is the of 255 pixels) using frame extensions. brightness channel, which is a value from black to white (from 0 to The input layer consists of cards, the number of cards depends 100), and the spectrum is in the range from green to red (in values on the type of image. If the image is color, then there will be 3 cards from +128 to -128), b is the spectrum in the range from blue to according to the number of color channels (red, blue, green). In our yellow (in values from +128 to -128) [7]. case, the image is in shades of gray, so the card will be one. Further, The CIE Lab color space was chosen to represent the input the input pixel values of the image are normalized in the range from images, because in it the color characteristics (a, b) are separated 0 to 1 from the brightness (L). Brightness can be considered as a black Convolutional layer - the convolution layer of the image. It is a and white image, similar to that which is fed to the input of the set of feature cards. Each card has a synaptic core that “glides” over neural network. Thanks to this color scheme, the operation of a the entire image area, performing a multiplication operation with neural network is reduced to the selection of 2 numerical values for the input data, and then, based on the values obtained, finding each pixel that reflect its color. certain features of the objects. The combination of brightness with predicted color First, the values of the characteristics map of the convolutional components provides a high level of details for the final restored layer are 0. The values of the weights of the kernels are set image. randomly within [-0.5; 0.5]. The kernel glides over the map and multiplies 3. Neural Network Architecture The window of the kernel size passes with the given step the Convolutional neural networks have partial resistance to whole image, at each step element-by-element multiplies the distortion of two-dimensional images: change of angle, rotation and contents of the window by the kernel, the result is summed and shift, zooming. written into the result matrix. Currently, convolutional neural networks are considered the Then the results are transferred to convolution and upsample best in speed and accuracy of finding objects in images. Since 2012, layers, where it gradually increases with each layer to its original the SNA has been number one in ImageNet. size. The neural network gets an image with 3 color channels, as well The output is a color image from the input grayscale image. as parameters such as height (H) and width (W). Then the After that, the result of the work (Output) is compared with the original color image (Ground Truth). Fig. 3. Neural network architecture The neural network under study was trained on Places365 data gradient descent. Its advantage is that Adam is an adaptive array, which mainly consists of images of landscapes and cities. algorithm, that is, it calculates individual learning speeds of various 365 Places is built as image pairs. One is black and white and the parameters of the neural network, which allows you to adjust the other is colored. During learning the neural network gets this pair learning speed. of images and finding certain patterns, it learns how to paint other black and white images. 5. Results Achieved After training the neural network provided monochrome 4. Neural Network Training images for colorization. The results were quite good for most images. Fig. 4 illustrates the results for some examples. So the The optimal parameters of the model are determined by images of nature were processed with high accuracy, the colors minimizing the objective function defined on the basis of the were not distorted as close as possible to the originals, the expected result. To quantify the loss of the model, we use the processing speed of the photo was __ seconds. Image processing of standard error between the assumed pixel colors in space * b * and containing people had parameters similar to those of the their real value. For X image MSE is defined as: environment. The speed of work and the accuracy of color rendition 𝐻 𝑊 are primarily related to the selection of photographs in which the 1 𝐶(𝑋, 𝜃) = ∑ ∑ ∑(𝑋𝑘𝑖,𝑗 − 𝑋̃ 2 𝑘𝑖,𝑗 ) , neural network was trained, their quantity and subject matter. 2𝐻𝑊 Tested on a subset of the Place365 dataset, ResNet-Gray achieves 𝑘∈{𝑎,𝑏} 𝑖=1 𝑗=1 where θ defines all parameters of models, Xki,j and X˜ki,j denote ij 75.7% accuracy. Per-pixel mean squared error (MSE) on the values: th-pixel of k-component: th-target and restored images, Places365 validation set is 0.0025 for 10 epochs and 0.0019 for 40 respectively. This can be easily extended by averaging the weight epochs. among all the images in the package. For training and testing the described architecture, scripts were During training, this loss propagates inversely to updating the written using the Python language and the Pytorch library. For model parameters θ using Adam Optimizer [7] with an initial training, a data loader was used to load a color image, translate it learning rate η = 0.001. During training, the input image is set to a into the CIE color scheme. A black and white image channel was fixed size for batch processing. sent to the network input. The result was compared with the original Adam Optimazer is an optimization algorithm for iteratively for the redistribution of the weights of the neural network. To test updating the weights of a neural network based on training data. It the work, color images were also used to visually compare the is an improved analogue of the classical procedure of stochastic results of the neural network. Fig. 4. The results of operation of a taught neural network 6. References 1. Zhang, Richard, Phillip Isola, and Alexei A. Efros. «Colorful image colorization» European Conference on Computer Vision. Springer International Publishing, 2016. 2. Liang, Xiangguo, et al. «Deep patch-wise colorization model for grayscale images» SIGGRAPH ASIA 2016 Technical Briefs. ACM, 2016. 3. Cheng, Zezhou, Qingxiong Yang, and Bin Sheng. «Deep colorization» Proceedings of the IEEE International Conference on Computer Vision. 2015. 4. Dahl, Ryan. «Automatic colorization» (2016). 5. Goodfellow, Ian, et al. «Generative adversarial nets» Advances in neural information processing systems. 2014. 6. Medsker, L. R., and L. C. Jain. «Recurrent neural networks» Design and Applications 5 (2001). 7. Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 15(1): 1929–1958, 2014.