<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural network technology to search for targets in remote sensing images of the Earth</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>N S Abramov</string-name>
          <email>nikolay.s.abramov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>А А Talalayev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V P Fralenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O G Shishkin</string-name>
          <email>shishkinog@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V M Khachumov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aylamazyan Program Systems Institute of Russian Academy of Sciences</institution>
          ,
          <addr-line>Peter the First Street, 4 “a”, Veskovo Village, Yaroslavl Region</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Peoples' Friendship University of Russia</institution>
          ,
          <addr-line>Miklukho-Maklaya Street, 6, Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>180</fpage>
      <lpage>186</lpage>
      <abstract>
        <p>The paper introduces how multi-class and single-class problems of searching and classifying target objects in remote sensing images of the Earth are solved. To improve the recognition efficiency, the preparation tools for training samples, optimal configuration and use of deep learning neural networks using high-performance computing technologies have been developed. Two types of CNN were used to process ERS images: a convolutional neural network from the nnForge library and a network of the Darknet type. A comparative analysis of the results is obtained. The research showed that the capabilities of convolutional neural networks allow solving simultaneously the problems of searching (localizing) and recognizing objects in ERS images with high accuracy and completeness.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Today, there is an upsurge of activity in the field of Earth remote sensing (ERS) data processing: new
software systems are being created, high-resolution image processing methods are being modernized.
The current situation is characterized by the improvement of the equipment of spacecraft (SC) and
ground control stations, the expansion of the functionality and spectrum of the image processing tasks
performed. The scope of application of these spacecraft includes monitoring of forest, agricultural and
arctic zones, analysis of natural disasters, environmental protection, public safety, etc. The growing
volumes of evolving ERS data have significantly increased the requirements for speed and quality of
information processing. Recently, artificial neural networks (ANN) and high-performance computing
technologies have been increasingly used.</p>
      <p>
        The analysis of modern work on the application of ANN has shown that neural networks are
mainly used for searching and recognizing targets that are related to the category of nonrigid [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ].
The authors of this paper created a scientific and practical groundwork in solving various problems
based on intelligent processing of ERS images (multispectral, panchromatic, color) search for rigid
objects and zones of interest using the developed spectrographic approach and the generalized metric
(fires, inundations, ice conditions assessment, etc.) [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7 ref8">3-8</xref>
        ]. The proposed paper presents the results of
new in-depth studies related to the use of modern convolutional neural networks (CNN) for processing
panoramic full-color ERS images obtained from unmanned aerial vehicles (UAVs); some methods and
tools to improve their efficiency and performance during the search and recognition of the objects of
military equipment with the necessary completeness and accuracy, which still remains unresolved
even with an abundance of software, are proposed. The modern formulation of the task of finding and
recognizing an object by a neural network includes the steps of selecting the type, setting the
parameters of the ANN and preparing the input data. The multi-class and single-class problems were
considered as part of the study. The first task is thought of as the search and recognition of objects of
several classes simultaneously. The second task involves the search by a neural network of objects of
a single class.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods, software tools and results of image processing using ANN</title>
      <p>
        Two types of CNN were used to process ERS images: a convolutional neural network from the
nnForge library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and a network of the Darknet type [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Both implementations are distinguished by
the support of various types of layers; therewith a flexible configuration is provided and it is possible
to change the structure to suit own needs. In addition, the network of the second type not only
classifies the target objects but also reports on their positions in the shot. A distinctive feature of the
considered CNN is the support of computational speedup using graphics processing units (GPU) both
during training and operation. Copies of trained ANN are distributed between the existing GPUs
where data are processed independently and asynchronously. A special software complex for
designing neural network application systems was used to implement the computational process [
        <xref ref-type="bibr" rid="ref11 ref12">11,
12</xref>
        ].
      </p>
      <p>A special tool for the automated preparation of training samples has been implemented to improve
the quality of the classification. A human expert prepares preliminarily some images with a
transparency marker set, where background pixels are set invisible using alpha channel controls.
Figure 1 shows the original fragment of the ERS image, on the right: the same fragment is shown after
the alpha channel change. For convenience, images with objects of different classes are sorted to
different directories.</p>
      <p>Various settings of the ANN from the nnForge library (configurations and characteristics of the
layers) can be customized with scaled copies of these images. The scaling factor is chosen so that each
target object is placed in a separate scanning window, the size of which coincides with the size of the
input ANN window. The experiments were conducted on military equipment images of 6000x4000
pixels, made from a height of 300 meters at the Russia Arms Expo – 2015 (RAE-2015) international
exhibition.</p>
      <p>
        The following CNN architecture from the nnForge library was experimentally chosen:
− contrast extraction layer with a 9x9 pixel Gaussian window [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; the original size of the data
window: 39x39;
      </p>
      <p>− convolution layer with the 6x6 feature maps (total 136 maps), the hyperbolic tangent module is
used for normalization, the window size after processing is 34x34;
− average subsampling layer with a 2x2 mask, the window size after processing is 17x17;
− convolution layer with the 6x6 feature maps (total 272 maps), the window size after processing
is 12x12;
− average subsampling layer with a 2x2 mask, the window size after processing is 6x6;
− convolution layer with the 6x6 feature maps (total 544 maps), the window size after processing
is 1x1;</p>
      <p>− dropout layer with an adjustable probability of disabling connections between neurons
(experimentally set to 0.05);</p>
      <p>− convolution layer with the 1x1 feature maps, the number of feature maps corresponds to the
number of distinguished classes, the hyperbolic tangent type activation function is used.</p>
      <p>The scanning window during the recognition moves through the image in increments of one pixel
and is processed by the neural network. The sequential processing of the entire image results in a
colored map where the target objects are separated from the background. Figure 2 shows an example
of the original image and the result of its processing.</p>
      <p>3.626 million objects automatically extracted from 73 images were used for training. The following
results have been achieved: Classification completeness: background – 0.9976, materiel – 0.9354.
Classification accuracy: background – 0.9392, materiel – 0.9974. Training time: 24 hours on one
Nvidia Geforce GTX 1060 and using a single CPU core Intel Core i7 6850K (of the existing 6 cores,
3.6 @ 4.0 GHz). Processing time of ten panoramic images on one GPU: 3051 s; on two GPUs: 1640 s.</p>
      <p>
        As a result of numerous experiments, the developers of the Darknet network have selected a very
successful architecture, such that it works on different training / test samples [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]. The network,
within certain limits, is resistant to the fact that images of different sizes and subjected to geometric
distortions can be input. The main adjustable parameter is the size of the CNN input layer.
      </p>
      <p>The software tool developed as part of this study includes the programs YOLORotate,
YOLOAnchors and YOLOGetObjects.</p>
      <p>
        The YOLORotate program is designed to convert images into a format suitable for training the
YOLO v2 type ANNs. The data needed for the preparation of a series of our experiments include
many panoramic images taken from the UAV. They have four classes of target objects: IFV (infantry
fighting vehicles), Military vehicles, SPG (self-propelled gun mounts) and Tanks; information about
the coordinates and sizes of each of the objects is pre-assembled and stored in text files. Each such
image has a size of not less than 832x832 pixels, where all target objects occupy a relatively small part
of the image. YOLORotate rotates images from a training sample with a given step, for example, 15
degrees. At the same time, the maximum possible number of fragment images with target objects is
cut out. A total of 4361 fragments for the training sample and 1173 fragments for the test sample were
automatically created. It is guaranteed that on each such fragment there is at least one target object. In
order to ensure that the target objects are located in random positions of the received fragment images,
a pseudorandom number generator is used. The analysis showed that with the selected size of the
source data of 832x832 pixels, up to 20 targets fall into the frame. The YOLOAnchors program uses
the k-means method [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to detect the width and height of typical targets on the output window of a
neural network. The program finds several such pairs of sizes, which are used later in the training of
the ANN.
      </p>
      <p>The program YOLOGetObjects segments the images into fragments of 832x832 pixels while
providing a partial intersection, and each next window captures a quarter of the previous one (both
horizontally and vertically). In total, there are 70 fragments per panoramic picture. Further, all
fragments are independently processed using a GPU and a general-purpose processor. The next step is
to combine information about all the target objects found. Figure 3 shows an example of the result of
using a trained ANN.
Share of target IFV
objects found Military cars</p>
      <p>SPG</p>
      <p>Tanks
Average share of target objects found
Completeness IFV</p>
      <p>Military cars
SPG</p>
      <p>Tanks
Normalized accuracy IFV</p>
      <p>Military cars
SPG
Tanks
IFV
Military cars
SPG</p>
      <p>Tanks</p>
      <p>
        The batch training where the next step of adjusting the weighting factors is based on information
about the results of processing a limited group of images of the training sample was used in all
experiments. Each group of images on the new training period is formed randomly; preference is given
to the groups with representatives of all classes of target objects. Using batch training allows improve
the quality of the neural network and abandon the resource-intensive dropout layer [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The best
package size is chosen experimentally for each problem to be solved.
      </p>
      <p>Tables 1-3 show the refinement characteristics and the results of the experiments performed in
solving a multi-class problem — simultaneous search and recognition of objects of four classes.</p>
      <p>Table 4 shows the comparative results of processing the test sample when solving single-class and
multi-class problems. The training time of the selected network configuration on each of the four
classes of military equipment was 6.84, 13.6 and 26.8 hours when working with groups of 28, 56 and
112 images. In the single-class case, the ANN works with only one class; the user has the opportunity
to choose the best option – the network trained using a group of images of the optimal size. In the last
column of table 4, the best coefficients for mixed mode are collected, when a network trained for
individual classes is used. For instance, a network trained on a package of 56 images is used for IFV,
and for the Tanks class – a network trained on a package of 112 images. The average processing time
of a 6000x4000 pixels image in one separate single-class neural network is the same as that of a
multiclass neural network, that is, four seconds.</p>
      <p>Class of objects</p>
      <p>IFV
Military cars</p>
      <p>SPG</p>
      <p>Tanks
Average share of target
objects found
Training time of one
separate neural network,
hours</p>
      <p>The results of the experiments confirmed the effectiveness of the use of single-class neural
networks. However, training a complex of such networks requires more computing resources than
those used for training of one multi-class network. An increase in the complexity of the task with the
same number of feature maps leads to a decrease in the average share of the found target objects for a
single-class ANN.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>The article presents the results of research related to the use of modern convolutional neural networks
for processing panoramic full-color aerial ERS images. Using the nnForge and Darknet CNN,
multiclass and single-class problems of searching and classifying targets are solved. Some methods for
preparing training samples, optimal configuration, and the use of high-performance computing have
been developed to improve the recognition efficiency. A comparative analysis showed that the
oneclass approach has an advantage in recognition quality but loses in operation time. In general, it should
be noted that the capabilities of convolutional neural networks allow solving simultaneously the
problems of searching (localizing) and recognizing objects in ERS images with high accuracy and
completeness.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work was supported by the Russian Foundation for Basic Research (projects
No. 18-29-03011-mk Research and Development of New Methods and Technologies for the Tasks of
Intellectual Analysis and Optimization of Processing Large Data Streams of the Earth Remote Sensing
and No. 17-29-07003-ofi_m Development of Methods and Models of Dynamic Behavior Planning and
Hierarchical Intellectual Motion Control of Unmanned Aerial Vehicles in an Uncertain Environment
with Computing Resources Constraints).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Vizilter</given-names>
            <surname>Yu</surname>
          </string-name>
          <string-name>
            <given-names>V</given-names>
            ,
            <surname>Gorbatsevich</surname>
          </string-name>
          <string-name>
            <given-names>V S</given-names>
            ,
            <surname>Vorotnikov</surname>
          </string-name>
          <string-name>
            <given-names>A V</given-names>
            and
            <surname>Kostromov</surname>
          </string-name>
          <string-name>
            <surname>N A</surname>
          </string-name>
          <year>2017</year>
          <article-title>Real-time face identification via CNN and boosted hashing forest</article-title>
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>2</issue>
          )
          <fpage>254</fpage>
          -
          <lpage>265</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-2-
          <fpage>254</fpage>
          -265
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ivanov</surname>
            <given-names>A I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lozhnikov P S and Sulavko</surname>
            <given-names>A E</given-names>
          </string-name>
          <year>2017</year>
          <article-title>Evaluation of signature verification reliability based on artificial neural networks</article-title>
          ,
          <source>Bayesian multivariate functional and quadratic forms Computer Optics</source>
          <volume>41</volume>
          (
          <issue>5</issue>
          )
          <fpage>765</fpage>
          -
          <lpage>774</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-5-
          <fpage>765</fpage>
          -774
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          <year>2010</year>
          <article-title>Spectrographic texture analysis for earth remote sensing data</article-title>
          <source>Artificial Intelligence and Decision Making</source>
          <volume>2</volume>
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          <year>2018</year>
          <article-title>Intelligent analysis of aerospace images using high-performance computing devices</article-title>
          <source>Proceedings of the conference “Artificial Intelligence: Problems</source>
          and Solutions” (Moscow region, Patriot Park)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Abramov</surname>
            <given-names>N S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agronik</surname>
            <given-names>A Yu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emelyanova Yu</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Latyshev</surname>
            <given-names>A V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talalaev</surname>
            <given-names>A A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          and
          <string-name>
            <surname>Khachumov M V 2017</surname>
          </string-name>
          <article-title>Methods, models and software for processing data for space monitoring of the Arctic zone Aerospace Instrument-Making 7</article-title>
          <fpage>38</fpage>
          -
          <lpage>51</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          <year>2017</year>
          <article-title>Localization and classification of military equipment in the stream of images from UAVs Materials of the conference “Fundamental Science for Army” within of the Third International Military-Technical Forum “ARMY-2017” (Moscow region</article-title>
          , Patriot Park)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] URL: https://www.science-education.ru/ru/article/view?id=
          <fpage>18607</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Khachumov</surname>
            <given-names>V M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          ,
          <article-title>Chen Guo Xian and Zhang Guo Liang 2015 Construction perspectives of the remote sensing data high-performance processing system</article-title>
          <source>Program Systems: Theory and Applications</source>
          <volume>1</volume>
          <fpage>121</fpage>
          -
          <lpage>133</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>[9] URL: http://milakov.github.io/nnForge</mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>[10] URL: https://arxiv.org/abs/1612.08242</mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Talalaev</surname>
            <given-names>А А</given-names>
          </string-name>
          and
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          <year>2013</year>
          <article-title>The complex of tools for the design of neural network application systems Scientific</article-title>
          and
          <source>Technical Volga region Bulletin</source>
          <volume>4</volume>
          <fpage>237</fpage>
          -
          <lpage>243</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Talalaev</surname>
            <given-names>А А</given-names>
          </string-name>
          and
          <string-name>
            <surname>Fralenko</surname>
            <given-names>V P</given-names>
          </string-name>
          <year>2013</year>
          <article-title>The architecture of a parallel-pipeline data processing complex for heterogeneous computing environment Bulletin of Peoples</article-title>
          ' Friendship University of Russia.
          <source>Mathematics series. Computer science. Physics</source>
          <volume>3</volume>
          <fpage>113</fpage>
          -
          <lpage>117</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] URL: https://journals.plos.org/ploscompbiol/article?id=
          <volume>10</volume>
          .1371/journal.pcbi.0040027
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Everingham</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Gool</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            <given-names>C K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winn</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zisserman</surname>
            <given-names>A 2010</given-names>
          </string-name>
          <article-title>The pascal visual object classes (voc</article-title>
          ) challenge
          <source>International journal of computer vision</source>
          <volume>88</volume>
          <fpage>303</fpage>
          -
          <lpage>338</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Lin</surname>
            <given-names>T-Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maire</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hays</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanan</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            <given-names>P</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zitnick C L 2014</surname>
          </string-name>
          <article-title>Microsoft coco: Common objects in</article-title>
          context In European Conference on Computer Vision
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Celebi</surname>
            <given-names>M E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kingravi</surname>
            <given-names>H A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vela P A 2013</surname>
          </string-name>
          <article-title>A comparative study of efficient initialization methods for the k-means clustering algorithm</article-title>
          <source>Expert Systems with Applications</source>
          <volume>40</volume>
          <fpage>200</fpage>
          -
          <lpage>210</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Srivastava</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            <given-names>G E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            <given-names>I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Salakhutdinov R 2014 Dropout:</surname>
          </string-name>
          <article-title>a simple way to prevent neural networks from overfitting</article-title>
          <source>Journal of Machine Learning Research</source>
          <volume>15</volume>
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>