<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Learning Neural Networks with Controlled Switching of Neural Planes*</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Saint Petersburg Electrotechnical University "LETI" Saint Petersburg</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The algorithm of network topology construction and training of twodimensional fast neural networks with additional switched planes is considered. It is noted that the structure of fast neural networks has a fractal nature. The constructed topology is ideologically close to the topology of convolutional neural networks of deep learning, but it has regular topology with the number of layers established by the factor representation of the dimensions of the image and the output plane of the classes. The learning algorithm has an analytical representation, and it is stable and converges in a finite number of steps. Additional planes extend the information capacity of the tunable transformation to the maximum possible. Control of the planes in the training and processing mode is realized by numerical coordinate codes of the output plane. The architecture of a regular neural network with additional planes is presented. Variants for image ordering in the output plane are considered. Examples are given.</p>
      </abstract>
      <kwd-group>
        <kwd>fast tunable transformation</kwd>
        <kwd>neural network</kwd>
        <kwd>learning</kwd>
        <kwd>convolutional neural network</kwd>
        <kwd>the bitmap</kwd>
        <kwd>the planes of the neural layers First Section</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Deep learning technology involves a process of configuration complexity o f
informative features in the sequence of neural layers. Starting with the neocognitron by K.
Fukushima [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] it has been proposed some variants of realization of this idea. One of the
successful solutions is the architecture of convolutional neural networks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which has
shown high efficiency in solving various problems. A distinctive feature of this
architecture is the presence in convolutional layers of several data processing channels
(called maps or planes). In each plane, the output image of the previous layer is being
convoluted with a fixed kernel of small dimensions. Convolutional layers alternate with
pooling layers that multiply reduce the dimension of the feature space. The pooling
layers are optional and there are variants of completely eliminating them from the
network architecture [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The second distinctive feature of the architecture is the use of
*
semi-linear activation functions that act as switching keys controlled by the values of
hidden layers variables [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
        ].
      </p>
      <p>
        The disadvantage of the convolution network architecture is the lack of theoretically
justified methods for choosing the network structure and convolution kernel
parameters. There are several well-functioning network configurations for specific tasks, but
it is not clear how to build a network for a new task. Until now, the choice of the
structure of the convolution network is an art object. The second significant drawback is
related to the training time of convolutional networks. On a typical processor, the time
can vary from several hours to several days, therefore high-performance GPUs are often
used to train networks. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the authors proposed the idea of using fast transformation
algorithms to construct the structure and topology of multilayer perceptron neural
networks. We show that this approach with some modification can also be used to
construct the structure and topology of convolutional neural networks.
      </p>
      <p>
        At present, fast algorithms for linear Fourier, Walsh, Haar, and similar
transformations are widely known. With the use of fast algorithms, the gain on computational
operations increases exponentially with the increase in the dimension of the
transformation. Since the end of the 20-th century, there has arisen a direction of fast tunable
transformations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which are essentially neural networks with limited connections and
linear activation functions. There have been developed methods for training such neural
networks that converge in a finite number of steps. The number of layers in fast
transformations and their configuration are determined by the dimension of the processed
images. To construct fast algorithms, the dimension of the transformation must be a
composite number, and the more multipliers in the composition of the dimension, the
higher the computational efficiency of the fast algorithm. Despite the wide variety of
fast algorithms, the configurations of their structures satisfy the system invariant of
selfsimilarity [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Fractals are known to have the same property. Therefore, fast
algorithms can be interpreted as quasi-fractals. The property of structural fractality allows
solving two tasks simultaneously: to realize fast data processing and fast transformation
training.
      </p>
      <p>In this paper, we will show that a small modification of the system invariant of fast
algorithms leads to convolutional neural network structures. At the same time, it is
possible to preserve the algorithm of fast learning and increase the information capacity of
the network recognition up to the maximum possible, determined by the number of
neurons of the output layer of the network. The proposed architecture cannot be called
convolution networks, because in the planes of neural layers more general
transformations than convolution are used, and there are no pooling layers in the
transformation, but the principle of pooling is used in the training of the network. All neurons
have linear activation functions, however non-linear processing exists, but it is
implemented not at the expense of activation functions, but by switching the planes of neural
networks. To some extent, this is similar to the switching semi-linear activation
functions of convolutional neural networks. Control of switching planes is carried out by
the coordinates of neurons in the output plane of the network. Call this class of networks
neural networks with controlled switching of planes (CSPNN).
2</p>
    </sec>
    <sec id="sec-2">
      <title>The topology of Two-Dimensional Fast Tunable</title>
    </sec>
    <sec id="sec-3">
      <title>Transformations</title>
      <p>Let us designate through F U y ,Ux  an image matrix by dimensionality N y  N x . In
case of impact on the image through linear transformation, H U y ,Ux ;Vy ,Vx  the array
from M y  M x coefficients turns out. Two-dimensional transformation is executed by
the rule:</p>
      <p>Ny1 Nx1
S Vy ,Vx     F U y ,U x H U y ,U x ;Vy ,Vx  .</p>
      <p>Uy0 Ux0
A necessary condition of the existence of a fast algorithm is the possibility of
multiplicative decomposition of values of input and output dimensionalities of the
transformation to an equal number of multiplicands:
(1)
(2)
(3)
(4)
N y  p0y p1y
Nx  p0x p1x
pny1,
pnx1,</p>
      <p>M y  g0y g1y
M x  g0x g1x
gny1,
gnx1.</p>
      <p>Here indexes x, y mean the belonging to coordinate axes of the source image, and
value n defines the number of layers in the graph of a fast algorithm. Using
multiplicands of decomposition, coordinates of points of the image representation in a
positional system notation with the mixed radices:</p>
      <p>U y  uny1un2</p>
      <p>y
U x  unx1un2
x
u1yu0y ,
u1xu0x ,
where the weight of m ’s position digit is defined by an expression pm*1 pm2
*</p>
      <p>* *
p p
1 0
and um* is the digit variable accepting values 0, pm* 1 (the asterisk replaces indexes
x, y here). It is similarly possible to represent coordinates of spectral coefficients for
the plane Vy ,Vx  :</p>
      <p>Vy  vny1vny2
Vx  vnx1vn2
x
v1yv0y ,
v1xv0x ,
where the weight of m ’s position digit is defined by an expression gm*1gm2
*</p>
      <p>* *
g g
1 0
and vm* is the digit variable accepting values 0, gm* 1 .</p>
      <p>
        The algorithm of fast transformation is usually presented in the form of a graph with
a different topology. It is convenient to use the digit-by-digit form for the analytical
description of a graph of the topology of a fast algorithm. For example, topology a
graph can be described by "Cooley–Tukey topology with decimation on to time" in the
form of a linguistic sentence [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (topological model):
 un*1un2
      </p>
      <p>*



un*1un2
*
u1*u0*
un*1un2</p>
      <p>*
um*1umvm1vm2
* * *
u1*v0*
v0*
vn*1vn2
*

 ,
v1*v0* 
where words are digit-by-digit representations of coordinate numbers, and letters –
names of digit variables. The number of words in the sentence is equal n  1. The first
and last words in the sentence correspond to coordinates of points of the terminal planes
provided by expressions. The intermediate words define input U ym ,Uxm and output
coordinate Vym ,Vxm in the planes of inner layers of the fast algorithm. For an algorithm
with substitution of values, the condition is follow-up satisfied:</p>
      <p>Uym1  Vym,</p>
      <p>Uxm1  Vxm
Graph of topology contains basic operations in a layer Wixmm,imy umyumx ;vmyvmx  , representing
four-dimensional matrixes of dimensionality  pmy , pmx; gmy , g mx  . Where digit-by-digit
expressions of indexes of kernels of a layer m for the selected topology have viewed:</p>
      <p>
        Expression is an analytical representation of a system invariant of fast algorithms [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
In general topologies, for the directions, x and y may be different. Connections
between basic operations are defined by the structural model of fast transformation, where
to each node there corresponds to basic operation (differently called hereinafter as
neural kernels). For the selected topology of graphs of the structural model is described by
the following linguistic sentence:
 un*1un2
      </p>
      <p>*



un*1un2
*
u1*
un*1un2</p>
      <p>*
um*1vm1vm2
* *
u2*v0*
v0*
vn*1vn2
*

 .
v1*v0* 
Each word in this sentence defines the number of basic operations i*m in the layer m .
The number of words is equal n in the sentence.</p>
      <p>In Fig. 1 structural model of fast two-dimensional transformation for dimensionality
8  8 is shown. Input image enters on the low layer and spectral coefficients turn out in
the high layer. Nodes of the model there correspond to basic operations (neural kernels)
with dimensionality [2,2;2,2]. Kernels in layer m execute two-dimensional
transformation over the spatial unit with size pmy  pmx :
(5)
(6)
S m Vym ,Vxm     F m U ym ,U xm Wixmm,iym umyumx ; vmy vmx  .</p>
      <p>umy umx
Y</p>
      <p>X
Setting specific values for all digit variables um* , vm* (where m runs through the values
0,1, n 1) defines some path in a topological graph between pair of nodes of an initial
and finite layer. From the uniqueness of digit-by-digit representation of coordinate
numbers, it follows that such path is single for each pair combination of spatial points
of the input and output plane. This circumstance allows obtaining the convenient
analytical expression connecting array elements of fast transformation with elements of
kernels. From expression it follows:</p>
      <p>H U y ,Ux ;Vy ,Vx  
S Vy ,Vx 
F U y ,Ux 
.</p>
      <p>Differentiating by the rule of differentiation of the composite function we will obtain:
H U y ,U x ;Vy ,Vx   F n1 S n2 F n2
S n1 F n1 S n2
F1 S 0
.
(7)
(8)
F m
From condition it follows that for all m the following equals take place S m1  1 ,
S m
and from, – that F m  Wixmm,imy umyumx ; vmy v mx  . Thus, we will obtain that each element of
a four-dimensional transformation matrix H expresses through elements of kernels in
the form of the following product:</p>
      <p>H U y ,U x ;Vy ,Vx   Wixnn11,iyn1 uny1unx1; vny1vnx1 </p>
      <p>Wixnn2 2,iyn2 uny2unx2; vny2vnx2 </p>
      <p>Wix00,i0y u0yu0x ; v0yv0x  ,
(9)
where digit-by-digit expressions of kernel indexes for a layer m for the selected
topology are defined by expression Multiplicative Decomposition of Two-Dimensional
Images.</p>
      <p>
        The algorithm of multiplicative decomposition is based on the ideas of fractal filtering
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (in the notation of convolution neural networks this operation corresponds to
pooling). For a two-dimensional case, fractal filtering represents the multiple scale image
processing sequentially squeezing its sizes up to a single point. The diagram of fractal
filtering can be presented in the form of the pyramid shown in Fig 2.
      </p>
      <p>F U y,Ux </p>
      <p>F2 U y,Ux </p>
      <p>F1U y,Ux 
The base of the pyramid is the source image, F U y ,Ux  for which arguments U y and
Ux are presented in a radix notation (see expression. In this positional representation,
we will fix all digits except two the lowest u0y and u0x . If to vary these digits on all
possible values, then we will obtain a two-dimensional selection with the size p0y  p0x
. The fractal filter is understood as any functional  , acting on this selection. Formally,
it can be written in the form of the following expression:
The image F1 will be multiply reduced by the sizes with the source image. For example,
the rule of average calculation of selection or its median line can be such functional.
The source image can be now formally presented in the form of a product:
F  uny1un2</p>
      <p>y
 F1  uny1un2
y
u u x
1y 0y , unx1un2
u1y , unx1un2
x
u1xu0x  
u1x  f j0y jx0 u0y , u0x  ,
(10)
(11)
where f j0y j0x u0y , u0x  - is a set of the two-dimensional function factors depending on
digit variables u0y and u0x , and indexes jy0, jx0 selecting a two-dimensional function
from this set. The value of these indexes is set to equal values of arguments of the image
F1 , so that jy0  uny1uny2 u1y and jx0  unx1unx2 u1x . For obtaining the factor
functions, it is enough to execute scalar division of the image F to the image F1 in case of
variation of all digit variables. In turn, the image F1 can also be represented as the
product of the image F2 on factors from the set f j1y j1x u1y , u1x  . Repeating multiply the
operation of fractal filtering and decomposition, we will reach the peak of the pyramid
of images and we will obtain multiplicative decomposition:</p>
      <p>F  uny1un2
y
u u x
1y 0y , unx1un2</p>
      <p>u1xu0x  
f jyn1 jxn1 uny1, unx1  f jyn2 jxn2 uny2 , unx2 
f j1y j1x u1y , u1x  f j0y jx0 u0y , u0x ,
where indexes of multiplicands are defined by expressions:
jxm  unx1un2
x</p>
      <p>x
um1 ,
jym  uny1un2
y</p>
      <p>y
um1 .</p>
      <p>F  uny1un2</p>
      <p>y
1
  F  uny1uny2
u0y ,u0x  
u1y , unx1un2
x</p>
      <p>u1x  
u u x
1y 0y , unx1un2
u1xu0x  .
4</p>
    </sec>
    <sec id="sec-4">
      <title>Attuning of Adapted Transformations</title>
      <p>We will call transformation adapted to the image if one of the transformation base
functions with coordinates Vy ,Vx  coincides with this image. The value of a scalar product
of the image with this function will be maximal among other coefficients of the spectral
area of the transformation. The purpose of transformation attuning consists in it. Value
of coordinates Vy ,Vx  we will call as adaptation point the function in the spectral
plane.</p>
      <p>Attuning can be realized also in several images. If to compare the obtained
multiplicative decomposition of the image with the decomposition of fast transformation, it is
easy to note that they are similar, and set kernel indexes in each layer cover a set of
indexes of function multiplicands. From this constructive result, follows that fast
transformation will be attuned to the image when transformation kernels are attuned to
function multiplicands. Attuning of transformation kernels is defined by the rule:</p>
      <sec id="sec-4-1">
        <title>Wixmm,iym umyumx ;vmyvmx   f jxm jmy umy ,umx  .</title>
        <p>(12)
Comparing expression for ixm ,iym and for jxm, jym , it is possible to obtain the result
conclusion that quantity of components in the multiplicative expansion of the image and
quantity of kernels of transformation coincide for a layer m  0 (thus it takes place
following equals ix0  jx0 and iy0  jy0 ), and less number of kernels for all remaining
layers. Therefore, in the case of attuning a part of degrees of freedom of a
transformation is not used. Digit variables v0y , v0x are freely variational variables; therefore, the
kernel may be attuned to D  g0y g0x images. The remaining layers have a bigger number
of degrees of freedom and cannot worsen this value. Thus, it is possible to conclude
that a fast transformation cannot adapt more than to D different images. On this, the
opportunities of this algorithm of attuning are exhausted. Value D let us call as the
level of transformation attuning.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Regular Neural Networks with Additional Planes</title>
      <p>
        Remaining within the considered topology, not used in case of attuning degrees of
freedom it is possible to determine in several ways (in more detail see. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). In this case,
the level of attuning does not change, but at the same time remaining transformation
functions change.
      </p>
      <p>Let us consider the alternative decision, which consists of an extension of the
topology through additional planes to use the remained degrees of freedom for an increase
in the level of attuning. In this case, the number of computing operations in the new
topology increases, but the structural regularity of the network remains.</p>
      <p>In the beginning, let us specify a choice rule of the adapted kernels for the former
topology. By adaptation let's express point coordinates in a radix notation, having
designated digit variables through y and x :</p>
      <p>Vy  yn1, yn2
y0 , Vx  xn1, xn2
x0 .</p>
      <p>The fixed values of digit variables ym , xm correspond to variables vmy , vmx , therefore (as
it follows from) in case of a choice of this point of adaptation, according to the rule
need adapt only kernels with numbers:
umy1 ym1 ym2
x0 ,
In particular for m  0 we have:</p>
      <p>Thus, irrespective of a choice of points of adaptation all kernels of a zero layer always
shall be adapted. At the same time, the level of transformation attuning is restricted to
value D .</p>
      <p>To increase of transformation attuning level we will enter the additional plane
structure copying the main plane in each layer. We will determine the order numbers of the
additional planes within a layer by the rule:
 m  xn1xn2
xm1, yn1 yn2
ym1 .</p>
      <p>The maximum quantity of the additional planes will be in the zero layers, and in process
of increase in a number of a layer the quantity of the additional planes will decrease,
and in the last layer we will obtain  n1  , i.e. the additional planes will not be
absolute. Thus, in the new topology, the plane of the last layer will remain the same,
and in younger layers, the additional planes will appear.</p>
      <p>The architecture of the neural network with the additional planes is shown in Fig. 3.
The input image is fed at the same time to all planes of the input layer. Layers are
divided by switchboards which are controlled by position digits of coordinate numbers
of output class.</p>
      <p>Switchboard 0</p>
      <p>Switchboard 1</p>
      <p>y  y2 y1y0 
 x2x1y2 y1 
 x2 y2 
Since the rule of a generation of the new planes does not contradict to condition, so for
attuning of kernels of the transformation it is possible to use the former rule, with an
additional argument for selecting the planes:</p>
      <sec id="sec-5-1">
        <title>Wixmm,iym  m umyumx;vmy vmx   f jxm jym umy ,umx  .</title>
        <p>k
Image
```
`
x  x2x1x0 
The index k in the right part enumerates an adaptation point. For m  0 we have
ix0  jx0 and iy0  jy0 , here variational variables are the number of plane
 0  xn1xn2 x1, yn1 yn2 y1 and digit variables v0y , v0x . Together, they cover the
full coordinate range of the output plane. Possible index values k correspond to this
range. The remaining layers do not impair the level of transformation attuning. Thus,
the transformation with additional planes can be adapted to D  M y  M x images, i.e.
each point of the output plane will exactly correspond to one image of the learning set.</p>
        <p>If the image corresponds to one of the adaptation points, the value of this spectral plane
coefficient will be maximal. The transformation result for above the image of the digit
"0" is shown in Fig. 5. It is seen that the coefficients corresponding to the subclasses of
the number “0” have maximum values.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Ordering of the Adaptation Points</title>
      <p>For each k ’s adaptation point in the range of k  [0, D 1] , you must set your values
for bit variables ym , xm . A one-to-one correspondence k  Vy ,Vx
defines rules for
ordering adaptation points in the output plane. Let's look at some typical variants.
Recall that the bitwise representation of coordinates is defined by expressions (13). The
task of ordering is to establish a correspondence between the ordinal number k and the
bit variables yi , xi . We assume that moving in the spectral plane along a column
is determined by changing the coordinate Vy , and along a row by changing the
coordinate Vx .
6.1</p>
      <sec id="sec-6-1">
        <title>Ordering Along of Columns</title>
        <p>The ordering algorithm can be specified by the following sequence number
representations:
k  xn1xn2
x0 yn1 yn2
y0 .</p>
        <p>In this expression, the digit y0 is the lowest, so when you increase the number k , the
digits yi will change first, and as a result, the adaptation points will be placed along
with the columns. Fig. 4 shows the variant of the ordering where classes are placed
along columns and subclasses are placed along rows.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Ordering Along of Rows</title>
        <p>In this expression, the digit x0 is the lowest, so when you increase the number k , the
digits xi will change first, and as a result, the adaptation points will be placed along
the rows. Fig. 6 shows a variant of implementing a transformation with the ordering of
the adaption points along rows.
The ordering algorithm can be specified by the following sequence number
representations:
k  xn1 yn1xn2 yn2
x1 y1x0 y0 .</p>
        <p>
          In this case, the spectral plane will be filled with increasing values k when moving
clockwise along the "circular" segments. Fig. 7 shows a variant of implementing the
transformation with the ordering of basic functions along with circular segments.
Regular tunable transformations have a unique possibility of analytical representation
of the topology of the implementing network, which allows developing learning
algorithms that converge in a finite number of steps. It is shown that the implementing
topology is easily expanded by additional planes, and the number of recognized images
increases dramatically and covers all elements of the output plane. Moreover, the
topology extension does not violate the principle of building a training algorithm. The
constructed topology is ideologically close to the topology of convolutional deep
learning networks [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] but is regular. The presented solution provides a constructive answer
to the fundamental questions of deep learning neural networks: how to choose a
topology and how to reduce the learning time of the network.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fukushima</surname>
            <given-names>К.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miyake</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takayuki</surname>
            <given-names>I.</given-names>
          </string-name>
          <article-title>Neocognitron: A neural network model for a mechanism of visual pattern recognition</article-title>
          .
          <source>IEEE Transaction on Systems, Man and Cybernetics</source>
          SMC-
          <volume>13</volume>
          (
          <issue>5</issue>
          ):
          <fpage>826</fpage>
          -
          <lpage>34</lpage>
          . ̵
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Denker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hubbard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Jackel</surname>
          </string-name>
          : Backpropagation Applied to Handwritten Zip Code Recognition,
          <source>Neural Computation</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <fpage>541</fpage>
          -
          <lpage>551</lpage>
          ,
          <year>Winter 1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Springenberg</surname>
          </string-name>
          , Jost Tobias; Dosovitskiy, Alexey; Brox, Thomas &amp; Riedmiller,
          <string-name>
            <surname>Martin</surname>
          </string-name>
          (
          <year>2014</year>
          - 12-21),
          <article-title>"Striving for Simplicity: The All Convolutional Net"</article-title>
          , arΧiv:
          <fpage>1412</fpage>
          .6806 https://arxiv.org/pdf/1412.6806.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Romanuke</surname>
          </string-name>
          , Vadim.
          <article-title>Appropriate number and allocation of ReLUs in convolutional neural networks (англ</article-title>
          .) // Research Bulletin of NTUU “Kyiv Polytechnic Institute”:
          <source>journal. - 2017</source>
          . - Vol.
          <volume>1</volume>
          . - P.
          <fpage>69</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Glorot</surname>
          </string-name>
          , Antoine Bordes,
          <source>BengioY. Deep Sparse Rectifier Neural Networks January 2010Journal of Machine Learning Research 15 Conference: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS).</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>V.</given-names>
            <surname>Nair</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Rectified linear units improve restricted boltzmann machines</article-title>
          .
          <source>In Proc. 27th. International Conference on Machine Learning</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Maas</surname>
          </string-name>
          , Andrew L.;
          <string-name>
            <surname>Hannun</surname>
          </string-name>
          , Awni Y.;
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>Andrew Y.</given-names>
          </string-name>
          (
          <year>June 2013</year>
          ).
          <article-title>"Rectifier nonlinearities improve neural network acoustic models" (PDF)</article-title>
          .
          <source>Proc. ICML</source>
          .
          <volume>30</volume>
          (
          <issue>1</issue>
          ).
          <source>Retrieved 2 January</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dorogov</surname>
            <given-names>A. Yu.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alekseev</surname>
            <given-names>A</given-names>
          </string-name>
          . A. // Mathematical models
          <article-title>of fast neural networks</article-title>
          .
          <source>In: collection of scientific. Tr. SPbGETU “Information management and processing systems”. Issue</source>
          .
          <volume>490</volume>
          ,
          <year>1996</year>
          , p.
          <fpage>79</fpage>
          -
          <lpage>84</lpage>
          . In Russian.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Solodovnikov</surname>
            <given-names>A. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spivakovsky</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <article-title>Fundamentals of the theory and methods of spectral information processing</article-title>
          :
          <source>Proc. benefit. L.: publishing House of LSU</source>
          ,
          <year>1986</year>
          . 272c. In Russian.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dorogov</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Theory and design of fast tunable transformations and weakly connected neural networks</article-title>
          .
          <source>SPb.: "Polytechnic"</source>
          ,
          <year>2014</year>
          . 328 pp. In Russian. http://dorogov.su/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>