111


        Deep Learning Neural Networks with Controlled
                 Switching of Neural Planes*

                          Alexander Yu. Dorogov [0000-0002-7596-6761]

                      Saint Petersburg Electrotechnical University "LETI"
                                    Saint Petersburg, Russia

                                 vaksa2006@yandex.ru


        Abstract. The algorithm of network topology construction and training of two-
        dimensional fast neural networks with additional switched planes is considered.
        It is noted that the structure of fast neural networks has a fractal nature. The con-
        structed topology is ideologically close to the topology of convolutional neural
        networks of deep learning, but it has regular topology with the number of layers
        established by the factor representation of the dimensions of the image and the
        output plane of the classes. The learning algorithm has an analytical representa-
        tion, and it is stable and converges in a finite number of steps. Additional planes
        extend the information capacity of the tunable transformation to the maximum
        possible. Control of the planes in the training and processing mode is realized by
        numerical coordinate codes of the output plane. The architecture of a regular neu-
        ral network with additional planes is presented. Variants for image ordering in
        the output plane are considered. Examples are given.

        Keywords: fast tunable transformation; neural network; learning; convolutional
        neural network; the bitmap; the planes of the neural layers First Section


1       Introduction

Deep learning technology involves a process of configuration complexity of informa-
tive features in the sequence of neural layers. Starting with the neocognitron by K. Fu-
kushima [1] it has been proposed some variants of realization of this idea. One of the
successful solutions is the architecture of convolutional neural networks [2], which has
shown high efficiency in solving various problems. A distinctive feature of this archi-
tecture is the presence in convolutional layers of several data processing channels
(called maps or planes). In each plane, the output image of the previous layer is being
convoluted with a fixed kernel of small dimensions. Convolutional layers alternate with
pooling layers that multiply reduce the dimension of the feature space. The pooling
layers are optional and there are variants of completely eliminating them from the net-
work architecture [3]. The second distinctive feature of the architecture is the use of

*   Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License
    Attribution 4.0 International (CC BY 4.0).
112

semi-linear activation functions that act as switching keys controlled by the values of
hidden layers variables [4, 5, 6, 7].
    The disadvantage of the convolution network architecture is the lack of theoretically
justified methods for choosing the network structure and convolution kernel parame-
ters. There are several well-functioning network configurations for specific tasks, but
it is not clear how to build a network for a new task. Until now, the choice of the struc-
ture of the convolution network is an art object. The second significant drawback is
related to the training time of convolutional networks. On a typical processor, the time
can vary from several hours to several days, therefore high-performance GPUs are often
used to train networks. In [8] the authors proposed the idea of using fast transformation
algorithms to construct the structure and topology of multilayer perceptron neural net-
works. We show that this approach with some modification can also be used to con-
struct the structure and topology of convolutional neural networks.
    At present, fast algorithms for linear Fourier, Walsh, Haar, and similar transfor-
mations are widely known. With the use of fast algorithms, the gain on computational
operations increases exponentially with the increase in the dimension of the transfor-
mation. Since the end of the 20-th century, there has arisen a direction of fast tunable
transformations [9], which are essentially neural networks with limited connections and
linear activation functions. There have been developed methods for training such neural
networks that converge in a finite number of steps. The number of layers in fast trans-
formations and their configuration are determined by the dimension of the processed
images. To construct fast algorithms, the dimension of the transformation must be a
composite number, and the more multipliers in the composition of the dimension, the
higher the computational efficiency of the fast algorithm. Despite the wide variety of
fast algorithms, the configurations of their structures satisfy the system invariant of self-
similarity [10]. Fractals are known to have the same property. Therefore, fast algo-
rithms can be interpreted as quasi-fractals. The property of structural fractality allows
solving two tasks simultaneously: to realize fast data processing and fast transformation
training.
    In this paper, we will show that a small modification of the system invariant of fast
algorithms leads to convolutional neural network structures. At the same time, it is pos-
sible to preserve the algorithm of fast learning and increase the information capacity of
the network recognition up to the maximum possible, determined by the number of
neurons of the output layer of the network. The proposed architecture cannot be called
convolution networks, because in the planes of neural layers more general transfor-
mations than convolution are used, and there are no pooling layers in the transfor-
mation, but the principle of pooling is used in the training of the network. All neurons
have linear activation functions, however non-linear processing exists, but it is imple-
mented not at the expense of activation functions, but by switching the planes of neural
networks. To some extent, this is similar to the switching semi-linear activation func-
tions of convolutional neural networks. Control of switching planes is carried out by
the coordinates of neurons in the output plane of the network. Call this class of networks
neural networks with controlled switching of planes (CSPNN).
                                                                                        113


2       The topology of Two-Dimensional Fast Tunable
        Transformations

Let us designate through F U y ,U x  an image matrix by dimensionality N y  N x . In
case of impact on the image through linear transformation, H U y ,U x ;Vy ,Vx  the array
from M y  M x coefficients turns out. Two-dimensional transformation is executed by
the rule:
                                  Ny 1 Nx 1
                    S Vy ,Vx     F U y ,U x H U y ,U x ;V y ,Vx  .              (1)
                                  Uy  0 Ux  0


A necessary condition of the existence of a fast algorithm is the possibility of multipli-
cative decomposition of values of input and output dimensionalities of the transfor-
mation to an equal number of multiplicands:

                      N y  p0y p1y       pny1 ,     M y  g0y g1y   g ny1 ,
                                                                                         (2)
                      N x  p0x p1x      pnx1 ,     M x  g0x g1x    g nx1.

Here indexes x, y mean the belonging to coordinate axes of the source image, and
value n defines the number of layers in the graph of a fast algorithm. Using multipli-
cands of decomposition, coordinates of points of the image representation in a posi-
tional system notation with the mixed radices:

                                  U y  uny1uny 2       u1y u0y ,
                                                                                         (3)
                                  U x  unx1unx 2       u1x u0x ,

where the weight of m ’s position digit is defined by an expression pm* 1 pm* 2     p1* p0*
and um* is the digit variable accepting values 0, pm*  1 (the asterisk replaces indexes
 x, y here). It is similarly possible to represent coordinates of spectral coefficients for
the plane Vy ,Vx  :

                                      V y  vny1vny 2   v1y v0y ,
                                                                                         (4)
                                      Vx  vnx1vnx 2    v1x v0x ,

where the weight of m ’s position digit is defined by an expression gm* 1 gm* 2     g1* g0*
and v m* is the digit variable accepting values 0, gm*  1 .
   The algorithm of fast transformation is usually presented in the form of a graph with
a different topology. It is convenient to use the digit-by-digit form for the analytical
description of a graph of the topology of a fast algorithm. For example, topology a
114

graph can be described by "Cooley–Tukey topology with decimation on to time" in the
form of a linguistic sentence [10] (topological model):

                un*1un*2 u1*u0* un*1un*2 u1*v0*                                                 
                                                                                                    ,
                un*1un*2 um* 1um* vm* 1vm*  2 v0*                 vn*1vn* 2         v1*v0*   
                                                                                                    

where words are digit-by-digit representations of coordinate numbers, and letters –
names of digit variables. The number of words in the sentence is equal n  1 . The first
and last words in the sentence correspond to coordinates of points of the terminal planes
provided by expressions. The intermediate words define input U ym ,U xm and output co-
ordinate Vym ,Vxm in the planes of inner layers of the fast algorithm. For an algorithm
with substitution of values, the condition is follow-up satisfied:

                                 U ym1  Vym ,        U xm1  Vxm                                       (5)

Graph of topology contains basic operations in a layer Wimm,im  umy umx ; vmy vmx  , representing
                                                                               x       y


four-dimensional matrixes of dimensionality  p , p ; g , gmx  . Where digit-by-digit
                                                                  y
                                                                  m
                                                                          x
                                                                          m
                                                                                   y
                                                                                   m

expressions of indexes of kernels of a layer m for the selected topology have viewed:

                           ixm  unx1unx 2    umx 1vmx 1vmx  2        v0x ,
                                                                                                          (6)
                           i ym  uny1uny 2     umy 1vmy 1vmy  2       v0y .

Expression is an analytical representation of a system invariant of fast algorithms [10].
In general topologies, for the directions, x and y may be different. Connections be-
tween basic operations are defined by the structural model of fast transformation, where
to each node there corresponds to basic operation (differently called hereinafter as neu-
ral kernels). For the selected topology of graphs of the structural model is described by
the following linguistic sentence:

                 un*1un* 2 u1* un*1un* 2 u2*v0*                                              
                                                                                                 .
                 un*1un* 2 um* 1vm* 1vm*  2 v0*              vn*1vn*2              v1*v0* 
                

Each word in this sentence defines the number of basic operations i*m in the layer m .
The number of words is equal n in the sentence.
   In Fig. 1 structural model of fast two-dimensional transformation for dimensionality
8  8 is shown. Input image enters on the low layer and spectral coefficients turn out in
the high layer. Nodes of the model there correspond to basic operations (neural kernels)
with dimensionality [2,2;2,2]. Kernels in layer m execute two-dimensional transfor-
mation over the spatial unit with size pmy  pmx :
                                                                                              115


               S m V ym ,Vxm    F m U ym ,U xm Wi mm,i m  umy umx ; vmy vmx  .       (7)
                                                                    x       y
                                    umy   umx


                          Y


                                                                        X


                   Fig. 1. Structure model two-dimensional transformation


3      Мultiplicative Decomposition of Fast Transformation

Setting specific values for all digit variables um* , vm* (where m runs through the values
 0,1, n  1 ) defines some path in a topological graph between pair of nodes of an initial
and finite layer. From the uniqueness of digit-by-digit representation of coordinate
numbers, it follows that such path is single for each pair combination of spatial points
of the input and output plane. This circumstance allows obtaining the convenient ana-
lytical expression connecting array elements of fast transformation with elements of
kernels. From expression it follows:

                                                             S Vy ,Vx 
                              H U y ,U x ;V y ,Vx                            .             (8)
                                                            F U y ,U x 

Differentiating by the rule of differentiation of the composite function we will obtain:

                                                S n 1 F n 1 S n  2        F 1 S 0
                H U y ,U x ;V y ,Vx                                                    .
                                                F n 1 S n  2 F n  2       S 0 F 0
116


                                                                                                 F m
From condition it follows that for all m the following equals take place                                 1 ,
                                                                                                 S m 1
                  S m
and from, – that        Wi mm,i m  umy umx ; vmy vmx  . Thus, we will obtain that each element of
                  F m     x y


a four-dimensional transformation matrix H expresses through elements of kernels in
the form of the following product:

                   H U y ,U x ;V y ,Vx   Wi nn11,i n1  uny1unx1 ; vny1vnx1 
                                                                x    y
                                                                                                         (9)
                      W    n 2
                         ixn 2 ,i yn 2   u u ; v v  W u u ; v v  ,
                                             y    x
                                             n 2 n 2
                                                             y    x
                                                             n 2 n 2
                                                                          0
                                                                         ix0 ,i 0y
                                                                                     y x
                                                                                     0 0
                                                                                           y x
                                                                                           0 0


where digit-by-digit expressions of kernel indexes for a layer m for the selected topol-
ogy are defined by expression Multiplicative Decomposition of Two-Dimensional
Images.
The algorithm of multiplicative decomposition is based on the ideas of fractal filtering
[10] (in the notation of convolution neural networks this operation corresponds to pool-
ing). For a two-dimensional case, fractal filtering represents the multiple scale image
processing sequentially squeezing its sizes up to a single point. The diagram of fractal
filtering can be presented in the form of the pyramid shown in Fig 2.


                                                   F2 U y , U x 


                                           F1 U y , U x 


                     F U y , U x 


                          Fig. 2. The scheme of the fractal image filtering

The base of the pyramid is the source image, F U y ,U x  for which arguments U y and
U x are presented in a radix notation (see expression. In this positional representation,
we will fix all digits except two the lowest u0y and u0x . If to vary these digits on all
possible values, then we will obtain a two-dimensional selection with the size p0y  p0x
. The fractal filter is understood as any functional  , acting on this selection. Formally,
it can be written in the form of the following expression:
                                                                                                                                    117


                          
                    F1 uny1uny 2                     u1y , unx1unx 2            u1x    
                     
                     u0y ,u0x  
                                          
                                  F uny1uny 2                       u1y u0y , unx1unx 2
                                                                                                                     
                                                                                                             u1x u0x  .


The image F1 will be multiply reduced by the sizes with the source image. For example,
the rule of average calculation of selection or its median line can be such functional.
The source image can be now formally presented in the form of a product:

                      
                    F uny1uny 2                   u1y u0y , unx1unx 2            u1x u0x      
                              
                     F1 uny1uny 2                      u1y , unx1unx 2          u1x     f u , u  ,
                                                                                                  j 0y jx0
                                                                                                                y
                                                                                                                0
                                                                                                                      x
                                                                                                                      0


where f j y j x  u0y , u0x  - is a set of the two-dimensional function factors depending on
          0 0


digit variables u0y and u0x , and indexes j y0 , jx0 selecting a two-dimensional function
from this set. The value of these indexes is set to equal values of arguments of the image
 F1 , so that j y0  uny1uny2 u1y and jx0  unx1unx2 u1x . For obtaining the factor func-
tions, it is enough to execute scalar division of the image F to the image F1 in case of
variation of all digit variables. In turn, the image F1 can also be represented as the
product of the image F2 on factors from the set f j1 j1  u1y , u1x  . Repeating multiply the
                                                                                            y x


operation of fractal filtering and decomposition, we will reach the peak of the pyramid
of images and we will obtain multiplicative decomposition:

            
          F uny1uny2             u1y u0y , unx1unx2                   u1x u0x   
                                                                                                                                    (10)
          f jn1 j n1  u , u
                      y
                      n 1
                                  x
                                  n 1   f   j ny 2 jxn 2   u , u 
                                                                y
                                                                n 2
                                                                         x
                                                                         n 2       f j1 j1  u1y , u1x  f j0 j0  u0y , u0x  ,
            y   x                                                                     y x                           y x


where indexes of multiplicands are defined by expressions:

                     jxm  unx1unx2                      umx 1 ,             j ym  uny1uny2               umy 1 .            (11)


4      Attuning of Adapted Transformations

We will call transformation adapted to the image if one of the transformation base func-
tions with coordinates Vy ,Vx  coincides with this image. The value of a scalar product
of the image with this function will be maximal among other coefficients of the spectral
area of the transformation. The purpose of transformation attuning consists in it. Value
of coordinates Vy ,Vx  we will call as adaptation point the function in the spectral
plane.
   Attuning can be realized also in several images. If to compare the obtained multipli-
cative decomposition of the image with the decomposition of fast transformation, it is
118

easy to note that they are similar, and set kernel indexes in each layer cover a set of
indexes of function multiplicands. From this constructive result, follows that fast trans-
formation will be attuned to the image when transformation kernels are attuned to func-
tion multiplicands. Attuning of transformation kernels is defined by the rule:

                          Wimm,im  umy umx ; vmy vmx   f jm jm  umy , umx  .           (12)
                               x   y                           x   y


Comparing expression for ixm , i ym and for jxm , j ym , it is possible to obtain the result con-
clusion that quantity of components in the multiplicative expansion of the image and
quantity of kernels of transformation coincide for a layer m  0 (thus it takes place
following equals ix0  jx0 and i y0  j y0 ), and less number of kernels for all remaining
layers. Therefore, in the case of attuning a part of degrees of freedom of a transfor-
mation is not used. Digit variables v0y , v0x are freely variational variables; therefore, the
kernel may be attuned to D  g 0y g 0x images. The remaining layers have a bigger number
of degrees of freedom and cannot worsen this value. Thus, it is possible to conclude
that a fast transformation cannot adapt more than to D different images. On this, the
opportunities of this algorithm of attuning are exhausted. Value D let us call as the
level of transformation attuning.


5      Regular Neural Networks with Additional Planes

Remaining within the considered topology, not used in case of attuning degrees of free-
dom it is possible to determine in several ways (in more detail see. [10]). In this case,
the level of attuning does not change, but at the same time remaining transformation
functions change.
   Let us consider the alternative decision, which consists of an extension of the topol-
ogy through additional planes to use the remained degrees of freedom for an increase
in the level of attuning. In this case, the number of computing operations in the new
topology increases, but the structural regularity of the network remains.
   In the beginning, let us specify a choice rule of the adapted kernels for the former
topology. By adaptation let's express point coordinates in a radix notation, having des-
ignated digit variables through y and x :

                  Vy  yn 1 , yn 2          y0 , Vx  xn 1 , xn 2               x0 .    (13)

The fixed values of digit variables ym , xm correspond to variables vmy , vmx , therefore (as
it follows from) in case of a choice of this point of adaptation, according to the rule
need adapt only kernels with numbers:

                          ixm  unx1unx 2         umx 1 xm 1 xm  2   x0 ,
                                                                                            (14)
                          i  u u
                           m
                           y
                                       y    y
                                       n 1 n  2   u y
                                                           y       y
                                                      m 1 m 1 m  2      y0 .
                                                                                                                                         119

In particular for m  0 we have:

                   ix0  unx1unx2          u1x ,             i y0  uny1uny2            u1y .

Thus, irrespective of a choice of points of adaptation all kernels of a zero layer always
shall be adapted. At the same time, the level of transformation attuning is restricted to
value D .
   To increase of transformation attuning level we will enter the additional plane struc-
ture copying the main plane in each layer. We will determine the order numbers of the
additional planes within a layer by the rule:

                             m  xn1 xn2          xm1 , yn1 yn2               ym1 .

The maximum quantity of the additional planes will be in the zero layers, and in process
of increase in a number of a layer the quantity of the additional planes will decrease,
and in the last layer we will obtain  n 1    , i.e. the additional planes will not be
absolute. Thus, in the new topology, the plane of the last layer will remain the same,
and in younger layers, the additional planes will appear.
   The architecture of the neural network with the additional planes is shown in Fig. 3.
The input image is fed at the same time to all planes of the input layer. Layers are
divided by switchboards which are controlled by position digits of coordinate numbers
of output class.
                                               Switchboard 0                       Switchboard 1           y  y2 y1 y0 


          Image


                       `
                   `
                       `


                   `
                                                                                                                       x  x2 x1 x0 


                                                          x2 x1 y2 y1                             x2 y 2 


              Fig. 3. Regular neural network architecture with additional planes


Since the rule of a generation of the new planes does not contradict to condition, so for
attuning of kernels of the transformation it is possible to use the former rule, with an
additional argument for selecting the planes:

                           Wimm,im  m  umy umx ; vmy vmx   f jkm jm  umy , umx  .
                              x   y                                        x   y
120

The index k in the right part enumerates an adaptation point. For m  0 we have
ix0  jx0 and i y0  j y0 , here variational variables are the number of plane
 0  xn 1 xn 2x1 , yn 1 yn 2 y1 and digit variables v0y , v0x . Together, they cover the
full coordinate range of the output plane. Possible index values k correspond to this
range. The remaining layers do not impair the level of transformation attuning. Thus,
the transformation with additional planes can be adapted to D  M y  M x images, i.e.
each point of the output plane will exactly correspond to one image of the learning set.


                    Fig. 4. Two-dimensional functions of the adapted transformation

Fig. 4 shows the two-dimensional transformation function adapted to the average im-
ages of handwritten numbers [2]. For this transformation:

             N y  N x  p0 p1 p2  4  3  2  24,   M y  M x  g0 g0 g 2  2  2  2  8 .

If the image corresponds to one of the adaptation points, the value of this spectral plane
coefficient will be maximal. The transformation result for above the image of the digit
"0" is shown in Fig. 5. It is seen that the coefficients corresponding to the subclasses of
the number “0” have maximum values.
                                                                                          121


                  Fig. 5. The result of the two-dimensional transformation


6      Ordering of the Adaptation Points

For each k ’s adaptation point in the range of k  [0, D  1] , you must set your values
for bit variables ym , xm . A one-to-one correspondence k  Vy ,Vx           defines rules for
ordering adaptation points in the output plane. Let's look at some typical variants. Re-
call that the bitwise representation of coordinates is defined by expressions (13). The
task of ordering is to establish a correspondence between the ordinal number k and the
bit variables yi , xi . We assume that moving in the spectral plane along a column
is determined by changing the coordinate V y , and along a row by changing the coordi-
nate Vx .


6.1    Ordering Along of Columns
The ordering algorithm can be specified by the following sequence number representa-
tions:

                          k  xn 1 xn 2   x0 yn 1 yn 2   y0 .

In this expression, the digit y0 is the lowest, so when you increase the number k , the
digits yi will change first, and as a result, the adaptation points will be placed along
with the columns. Fig. 4 shows the variant of the ordering where classes are placed
along columns and subclasses are placed along rows.
122

6.2    Ordering Along of Rows
The ordering algorithm can be specified by the following sequence number representa-
tions:

                          k  yn 1 yn 2       y0 xn 1 xn 2   x0 .

In this expression, the digit x0 is the lowest, so when you increase the number k , the
digits xi will change first, and as a result, the adaptation points will be placed along
the rows. Fig. 6 shows a variant of implementing a transformation with the ordering of
the adaption points along rows.


        Fig. 6. The transformation with the ordering of basis functions along the rows


6.3    Ordering Along of Circular Segments
The ordering algorithm can be specified by the following sequence number representa-
tions:

                          k  xn 1 yn 1 xn 2 yn 2    x1 y1 x0 y0 .

In this case, the spectral plane will be filled with increasing values k when moving
clockwise along the "circular" segments. Fig. 7 shows a variant of implementing the
transformation with the ordering of basic functions along with circular segments.
                                                                                          123


       Fig. 7. Transformation with the ordering of basic functions by circular segments


7      Summary

Regular tunable transformations have a unique possibility of analytical representation
of the topology of the implementing network, which allows developing learning algo-
rithms that converge in a finite number of steps. It is shown that the implementing to-
pology is easily expanded by additional planes, and the number of recognized images
increases dramatically and covers all elements of the output plane. Moreover, the to-
pology extension does not violate the principle of building a training algorithm. The
constructed topology is ideologically close to the topology of convolutional deep learn-
ing networks [2] but is regular. The presented solution provides a constructive answer
to the fundamental questions of deep learning neural networks: how to choose a topol-
ogy and how to reduce the learning time of the network.


       References
 1. Fukushima К., Miyake S., Takayuki I. Neocognitron: A neural network model for a mecha-
    nism of visual pattern recognition. IEEE Transaction on Systems, Man and Cybernetics
    SMC–13(5):826–34. ̵ 1983.
 2. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D.
    Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computa-
    tion, 1(4):541-551, Winter 1989.
 3. Springenberg, Jost Tobias; Dosovitskiy, Alexey; Brox, Thomas & Riedmiller, Martin (2014-
    12-21), "Striving for Simplicity: The All Convolutional Net", arΧiv:1412.6806
    https://arxiv.org/pdf/1412.6806.
 4. Romanuke, Vadim. Appropriate number and allocation of ReLUs in convolutional neural
    networks (англ.) // Research Bulletin of NTUU “Kyiv Polytechnic Institute”: journal. —
    2017. — Vol. 1. — P. 69—78.
 5. Xavier Glorot, Antoine Bordes, BengioY. Deep Sparse Rectifier Neural Networks January
    2010Journal of Machine Learning Research 15 Conference: Proceedings of the 14th Inter-
    national Conference on Artificial Intelligence and Statistics (AISTATS).
 6. V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In
    Proc. 27th. International Conference on Machine Learning, 2010.
124

 7. Maas, Andrew L.; Hannun, Awni Y.; Ng, Andrew Y. (June 2013). "Rectifier nonlinearities
    improve neural network acoustic models" (PDF). Proc. ICML. 30 (1). Retrieved 2 January
    2017.
 8. Dorogov A. Yu., Alekseev A. A. // Mathematical models of fast neural networks. In: collec-
    tion of scientific. Tr. SPbGETU “Information management and processing systems”. Is-
    sue.490, 1996, p. 79-84. In Russian.
 9. Solodovnikov A. I., Spivakovsky, A. M. Fundamentals of the theory and methods of spectral
    information processing: Proc. benefit. L.: publishing House of LSU, 1986. 272c. In Russian.
10. Dorogov A. Yu. Theory and design of fast tunable transformations and weakly connected
    neural networks. SPb.: "Polytechnic", 2014. 328 pp. In Russian. http://dorogov.su/