<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Bryansk State Technical University</institution>
          ,
          <addr-line>Bryansk, Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>D. M. Mikhalina</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The article discusses one of the latest ways to colorize a black and white image using deep learning methods. For colorization, a convolutional neural network with a large number of layers (Deep convolutional) is used, the architecture of which includes a ResNet model. This model was pre-trained on images of the ImageNet dataset. A neural network receives a black and white image and returns a colorized color. Since, due to the characteristics of ResNet, an input multiple of 255 is received, a program was written that, using frames, enlarges the image for the required size. During the operation of the neural network, the CIE Lab color model is used, which allows to separate the black and white component of the image from the color. For training the neural network, the Place 365 dataset was used, containing 365 different classes, such as animals, landscape elements, people, and so on. The training was carried out on the Nvidia GTX 1080 video card. The result was a trained neural network capable of colorizing images of any size and format. As example we had a speed of 0.08 seconds and an image of 256 by 256 pixels in size. In connection with the concept of the dataset used for training, the resulting model is focused on the recognition of natural landscapes and urban areas.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays, data processing automatisation is a globally urgent
task. One of the directions is to automate colorizing monochrome
(black and white) images. Most of coloring is now done manually,
which makes this process extremely time-consuming and
expensive.</p>
      <p>
        Image colorization is a fundamental problem of computer
graphics and machine learning. In recent years, there have been
many successful works in this area. For example, in 2011 ILSVRC
reached a good error-rate classification, which was 25%. In 2012
AlexNet was developed [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This is the first model based on 8
convolution neural networks (CNN). AlexNet got 16% of errors in
ImageNet call. In the next couple of years, VG 19 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with 19 layers
and GoogleNet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with 22 layers reduced the error rate to a few
percent.
      </p>
      <p>Although CNN made some breakthrough in accuracy, they are
difficult to be trained for a number of reasons.
1. The problem of a vanishing gradient is the effect of multiplying
n small numbers from the activation function to compute
gradients in n-layer network, meaning that the gradient (error
signal) decreases exponentially with n, thus, the front layers are
trained very slowly.
2. CNN usually have a great number of parameters in their models
which increase complexity, so training takes much more time.</p>
      <p>For developing the software system of image colorization we
studied a number of libraries:
 OpenCV ia a library of computer vision algorithms, image</p>
      <p>processing and numerical algorithms.
 NumPy is a library for Python, a programming language, with
optimized computational algorithms for working with
multidimensional data arrays.
 PyTorch is a machine learning library for Python that is used
for natural language processing.</p>
      <p>To conduct the study, a convolutional neural network was
chosen, the result of which is an output image with segmented
objects written in Python.</p>
      <p>Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        Let us consider images of H × W size in the colorspace CIE L
* a * b *. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Starting with the brightness component XL = RH =
W = 1, the goal of our model is to estimate the remaining
components to generate the full color version X=∈ R H=× W=×.
      </p>
      <p>:   → ( ̃, ̃ ).</p>
      <p>In this paper we assume that there is an image F described by
the equationwhere X˜a, X˜b are components a *, b * of the restored
image, which together with the input give assessed colour image
X˜ = (XL, X˜a, X˜b).</p>
      <p>
        In order not to depend on the size of the input data, the
architecture is entirely based on CNN model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In short, a
convolutional layer is a set of small trainable filters that correspond
to specific local patterns in the input image. Layers close to the
input look for simple patterns, such as contours, and layers close to
the output extract more complex elements [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        As it was mentioned above, the system selects the colorspace
CIE L * a * b * to represent the input imageswhere L is the
brightness channel, which is a value from black to white (from 0 to
100), and the spectrum is in the range from green to red (in values
from +128 to -128), b is the spectrum in the range from blue to
yellow (in values from +128 to -128) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>The CIE Lab color space was chosen to represent the input
images, because in it the color characteristics (a, b) are separated
from the brightness (L). Brightness can be considered as a black
and white image, similar to that which is fed to the input of the
neural network. Thanks to this color scheme, the operation of a
neural network is reduced to the selection of 2 numerical values for
each pixel that reflect its color.</p>
      <p>The combination of brightness with predicted color
components provides a high level of details for the final restored
image.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Approach to image restoration</title>
    </sec>
    <sec id="sec-3">
      <title>3. Neural Network Architecture</title>
      <p>Convolutional neural networks have partial resistance to
distortion of two-dimensional images: change of angle, rotation and
shift, zooming.</p>
      <p>Currently, convolutional neural networks are considered the
best in speed and accuracy of finding objects in images. Since 2012,
the SNA has been number one in ImageNet.</p>
      <p>The neural network gets an image with 3 color channels, as well
as parameters such as height (H) and width (W). Then the</p>
      <p>Fig. 3. Neural network architecture</p>
      <p>The neural network under study was trained on Places365 data gradient descent. Its advantage is that Adam is an adaptive
array, which mainly consists of images of landscapes and cities. algorithm, that is, it calculates individual learning speeds of various
365 Places is built as image pairs. One is black and white and the parameters of the neural network, which allows you to adjust the
other is colored. During learning the neural network gets this pair learning speed.
of images and finding certain patterns, it learns how to paint other
black and white images. 5. Results Achieved</p>
    </sec>
    <sec id="sec-4">
      <title>4. Neural Network Training</title>
      <p>The optimal parameters of the model are determined by
minimizing the objective function defined on the basis of the
expected result. To quantify the loss of the model, we use the
standard error between the assumed pixel colors in space * b * and
their real value. For X image MSE is defined as:
 
 ( ,  ) = 2 ∑ ∑ ∑(   , −  ̃ , )2,</p>
      <p>∈{ , }  =1  =1
where θ defines all parameters of models, Xki,j and X˜ki,j denote ij
values: th-pixel of k-component: th-target and restored images,
respectively. This can be easily extended by averaging the weight
among all the images in the package.</p>
      <p>
        During training, this loss propagates inversely to updating the
model parameters θ using Adam Optimizer [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with an initial
learning rate η = 0.001. During training, the input image is set to a
fixed size for batch processing.
      </p>
      <p>Adam Optimazer is an optimization algorithm for iteratively
updating the weights of a neural network based on training data. It
is an improved analogue of the classical procedure of stochastic
1</p>
      <p>After training the neural network provided monochrome
images for colorization. The results were quite good for most
images. Fig. 4 illustrates the results for some examples. So the
images of nature were processed with high accuracy, the colors
were not distorted as close as possible to the originals, the
processing speed of the photo was __ seconds. Image processing of
containing people had parameters similar to those of the
environment. The speed of work and the accuracy of color rendition
are primarily related to the selection of photographs in which the
neural network was trained, their quantity and subject matter.
Tested on a subset of the Place365 dataset, ResNet-Gray achieves
75.7% accuracy. Per-pixel mean squared error (MSE) on the
Places365 validation set is 0.0025 for 10 epochs and 0.0019 for 40
epochs.</p>
      <p>For training and testing the described architecture, scripts were
written using the Python language and the Pytorch library. For
training, a data loader was used to load a color image, translate it
into the CIE color scheme. A black and white image channel was
sent to the network input. The result was compared with the original
for the redistribution of the weights of the neural network. To test
the work, color images were also used to visually compare the
results of the neural network.</p>
    </sec>
    <sec id="sec-5">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Zhang, Richard, Phillip Isola, and
          <string-name>
            <surname>Alexei</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Efros</surname>
          </string-name>
          . «
          <source>Colorful image colorization» European Conference on Computer Vision</source>
          . Springer International Publishing,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Xiangguo</surname>
          </string-name>
          , et al. «
          <article-title>Deep patch-wise colorization model for grayscale images» SIGGRAPH ASIA 2016 Technical Briefs</article-title>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Cheng, Zezhou,
          <string-name>
            <given-names>Qingxiong</given-names>
            <surname>Yang</surname>
          </string-name>
          , and Bin Sheng. «
          <source>Deep colorization» Proceedings of the IEEE International Conference on Computer Vision</source>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dahl</surname>
          </string-name>
          , Ryan. «Automatic colorization» (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ian</surname>
          </string-name>
          , et al. «
          <source>Generative adversarial nets» Advances in neural information processing systems</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Medsker</surname>
            ,
            <given-names>L. R.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Jain</surname>
          </string-name>
          . «
          <source>Recurrent neural networks» Design and Applications</source>
          <volume>5</volume>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Nitish</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>