<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dynamical Change of the Perceiving Properties of Neural Networks as Training with Noise and Its Impact on Pattern Recognition</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Systems &amp; Technologies, North-Caucasus Federal University.</institution>
          <addr-line>2, Kulakov Prospect, Stavropol, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <fpage>35</fpage>
      <lpage>40</lpage>
      <abstract>
        <p>General parameters of convolutional networks (kernels) are set in the learning process. Also in addition to the method of training the quantity of information that is passed through the kernel influences the quality of setting. This quantity of information depends on the size of training sample and the concentration of receptive fields. You can increase the concentration of the re-ceptive fields for a fixed training set size due to the multilayer coating of arbitrary maps with fields of different types, that will be equivalent to the use of noisy training sample. This can increase the networks performance in the test.</p>
      </abstract>
      <kwd-group>
        <kwd>convolutional networks</kwd>
        <kwd>training with noise</kwd>
        <kwd>different types of receptive fields</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Roman Nemkov
To date, problem of invariance is the main and yet unsolved problem of pattern
recognition: the same object may have substantially different from each other
external characteristic (shape, colour, texture, etc.) as well as it may be differently
displayed on the retina (view from different angles), which greatly complicates
its classification. This global problem within the neural network technologies
can be solved by to create big training sets [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. If creating such sets are difficult
then this sets are expanded by the addition of noise. [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4–6</xref>
        ]. A neural network
can be regarded as a pyramidal hierarchical graph then noise can be created by
changing communications between nodes in a graph [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] or changing perceiving
properties in nodes of a graph [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Convolutional neural networks (CNNs) have
three perceiving properties in a node-neuron: a receptive field (RF), an
activation function and a method for producing a weighted sum (simple weighted sum
or higher-order polynomial). A change of RFs is the easiest and most promising
way of the three. The same pattern can be differently perceived by changing
the RFs, that leads to the creation of noise. The influence of noise (which was
created due to changing the shape of the fields) is rinvestigated on the pattern
recognition in this article.
The Generation of Noise by Changing the Receptive
Fields
Training with noise in the context of gradient descent for a neural network can
be written as
∂E
∂w
+ ε = ∇,
(1)
where ∂∂Ew is the gradient vector from the network’s weights, ε is the additional
noisy component corrects the gradient vector. The gradient vector after a
correction can’t exactly point to the local minimum, but training with noise has
two benefits: generalization ability increases and local minimums can be better
overcome during the gradient descent. When you change perceiving properties
in the nodes of networks you have the same training with noise, where ε can be
explicitly written:
ε =
∂E ∂E
∂w (new perception) − ∂w (standart perception),
(2)
where ∂∂Ew (new perception) is the gradient vector from weights (the perceiving
properties have been already changed), ∂∂Ew (standart perception) is the gradient
vector from weights with the standart perceiving properties (RFs have square
form).
      </p>
      <p>The changing of perceiving properties in the nodes of a network occur due to
changing the shape of the RFs. Each element of RF has neighbors. The neighbors
are elements located in the one or two discrete steps from current element. Thus,
the value of current element (within RF) can be replaced by the neighboring
value. If you do this operation for all elements of RF then weighted sum will be
changed, hence output of neuron will be also changed. The replacement is shown
on the Fig. 1.</p>
      <p>The map (which is the input for the current neurons) receives another
covering of RFs, but the kernels of convolution layer (which passed through themselves
this new covering) remain the same. The quantity of information affecting the
kernels increases and the kernels can extract the best invariant features. This
process is shown in Fig. 2.</p>
      <p>This technology can expand the training set by the patterns which are created
depending on where (how far away from the current element) elements of RFs
take the information for a replacement. Let all elements of RFs are replaced then
additional training sets, which will be obtained by this technology, is shown in
Fig. 3.</p>
      <p>Any convolutional layers may have their coverings, hence the change of
perception can be on the different layers. If CNN has three convolutional layers
then the quantity of combinations (or “refracting prisms”) will be 23-1=7 (one
“prism” is a standard perception). The unique pattern will be obtained within
the frames of each scheme-“prism”. A strategy is also important for a marking
the particular layer using RFs. There are two opposing strategies: the RF is
chosen by random way and is superimposed on desired location or the same type
of RF with a specific index is superimposed on all desired locations. The second
strategy can model the primitive affine transformations if the RF simulates a
shift for all its elements.</p>
      <p>Patterns need create with the combinations of all “refracting prisms”, with
using the both strategies, with using the RFs which are fully updated for
maximum coverage of any of the three additional sets.
3</p>
      <p>Experiment
MNIST was chosen for experiments with noise. This is due to the fact, that
the most schemes of the creating of noise have been tested in this set. The
architecture of CNN is shown in Fig. 4.</p>
      <p>The simplest algorithm of gradient descent (without momentum, weights
decay and other tricks) has been used for maximum simplicity and repeatability
of the experiment. The initial value of the learning rate (η) is equal to 0.005,
after every 100 epochs new value are obtained from the old value by multiplying
by 0.3. Error function is mean-square error (MSE). The pattern is recognized if
the error on the output layer does not exceed the value of 0.001. Pools of RFs
for convolutional layers are shown in Figure 5.</p>
      <p>The RF with arbitrary index is set in the proper position by the strategy of
markup. Geometrical interpretation of the index for shift is shown in Fig. 6.</p>
      <p>
        Thus, the noise from the first additional training set (Fig. 3, set (1)) was
used. Comparative results are given in Table 1.
Algorithm Distortion Error Ref.
2 layer MLP (MSE) affine 1.6% [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
SVM affine 1.4% [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
Tangent dist. affine+thick 1.1% [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
LeNet-5 (MSE) affine 0.8% [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
2 layer MLP (MSE) elastic 0.9% [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
CNN (MSE) this distortions 1.2% this paper
Best result elastic 0.23% [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
      </p>
      <p>This is a good result that has been achieved without the involvement of
additional noise from the sets (2) or (3) (Fig. 3). Research has shown that the
change of perceiving properties in the nodes of CNN can effectively expand the
training set and reduce the error of generalization. Also, this technology is easy
compatible with the elastic distortions and dropconnects or dropouts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>M.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senior</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          :
          <article-title>Large Scale Distributed Deep Networks</article-title>
          .
          <source>NIPS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Improving Neural Networks by Preventing Co-adaptation of Feature Detectors</article-title>
          .
          <source>CoRR</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeiler</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , LeCun,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          , R.:
          <source>Regularization of Neural Networks using DropConnect. ICML 3</source>
          , volume
          <volume>28</volume>
          <source>of JMLR Proceedings, page 1058- 1062.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haffner</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Gradient-based Learning Applied to Document Recognition</article-title>
          .
          <source>Proceedings of the IEEE. v. 86</source>
          , pp.
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Decoste</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scholkopf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Training Invariant Support Vector Machines</article-title>
          .
          <source>Machine Learning Journal</source>
          . vol
          <volume>46</volume>
          , No 1
          <issue>-3</issue>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Simard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinkraus</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Platt</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis</article-title>
          . ICDAR, page
          <volume>958</volume>
          -
          <fpage>962</fpage>
          . IEEE Computer Society,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ciresan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meier</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Multicolumn Deep Neural Networks for Image Classification</article-title>
          .
          <source>In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <source>CVPR '12</source>
          , pages
          <fpage>3642</fpage>
          -
          <lpage>3649</lpage>
          , Washington, DC, USA,
          <year>2012</year>
          . IEEE Computer Society.
          <source>ISBN 978-1-4673-1226-4.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Nemkov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezentseva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <source>The Use of Convolutional Neural Networks with Nonspecific Receptive Fields. The 4-th International Scientific Conference ” Applied Natural Science</source>
          <year>2013</year>
          ”,
          <string-name>
            <given-names>Novy</given-names>
            <surname>Smokovec</surname>
          </string-name>
          , High Tatras,
          <source>Slovak Republic, Oktober 2-4</source>
          ,
          <year>2013</year>
          , p.
          <fpage>148</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>