<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Matrix Deep Neural Network and Its Rapid Learning in Data Science Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iryna Pliss</string-name>
          <email>iryna.pliss@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Boiko</string-name>
          <email>olena.boiko@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentyna Volkova</string-name>
          <email>v.volkova@samsung.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yevgeniy Bodyanskiy</string-name>
          <email>yevgeniy.bodyanskiy@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, UKRAINE</institution>
          ,
          <addr-line>Kharkiv, Nauky ave., 14</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>. Samsung Electronics Ukraine Company, LLC R&amp;D (SRK), UKRAINE</institution>
          ,
          <addr-line>Kyiv, Lva Tolstogo St., 57</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>1</fpage>
      <lpage>3</lpage>
      <abstract>
        <p>The matrix deep neural network and its learning algorithm are proposed. This system allows reducing the number of tunable weights due to the rejection of the operations of vectorizationdevectorization. It also saves the information between rows and columns of 2D inputs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. INTRODUCTION</p>
      <p>
        Nowadays, artificial neural networks (ANNs) are widely
used to solve many problems arising in Data Science. Here,
multilayer perceptron (MLP) [
        <xref ref-type="bibr" rid="ref1 ref13 ref14 ref15 ref16 ref17 ref18">1,13-18</xref>
        ] is the most widely
used. On the basis of MLP deep neural networks (DNNs)
[
        <xref ref-type="bibr" rid="ref19 ref2 ref21 ref3 ref4">2-4,19,21</xref>
        ] were developed, that have improved
characteristics in comparison with their prototypes, namely
traditional shallow neural networks.
      </p>
      <p>In the general case, a multilayer perceptron that contains
L information processing layers ( L −1 hidden layer and one
output layer) realizes a nonlinear transformation that can be
written in the form</p>
      <p>Yˆ (k ) =Ψ ( X (k )) =Ψ[L] (W [L] (k −1) Ψ[L−1] ×
- Ψ[l] are diagonal matrices of activation functions on each
layer;
- W [l] (k −1) are matrices of synaptic weights that are
adjusted during the learning process based on error
backpropagation;
- l = 1, 2,..., L ;
- k = 1, 2,... is discrete time index.</p>
      <p>
        In the DNN family, the most popular are the convolutional
neural networks (CNNs) [
        <xref ref-type="bibr" rid="ref20 ref22 ref23 ref24 ref25">20,22-25</xref>
        ] that are mainly designed
to
process
images
represented
in
the
(n1 × n2 ) -matrices X (k ) = {xi1i2 (k )} (where
form
      </p>
      <p>
        of
i1 = 1, 2,..., n1
and i2 = 1, 2,..., n2 ), which
must be vectorized before
submission to the network, i. e. they must be presented in the
form of vectors [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the dimension of which can be quite
large, that leads to the effect of “curse of dimensionality”.
      </p>
      <p>This effect can be avoided by processing the original
matrix using convolution, pooling and encoding operations.
As a result a vector of dimension smaller than (n1n2 ×1) is
fed to the perceptron’s input.</p>
      <p>Although DNNs provide high quality of the information
processing, their training time is too long, and the training
process itself may require considerable computing resources.
However, it is possible to speed up the information
processing by bypassing the operations of
vectorizationdevectorization, i.e. by storing information that will be
processed not in the form of a vector, but in the form of a
matrix.</p>
      <p>
        The abovementioned problem is solved by the matrix
neural networks [
        <xref ref-type="bibr" rid="ref11 ref12 ref5 ref6">5,6,11,12</xref>
        ], that are quite complex from the
computational point of view.
      </p>
      <p>In this connection, it seems expedient to develop
architecture and algorithms for tuning a deep matrix neural
network that is characterized by the simplicity of the
numerical realization and high speed of its synaptic weights
learning.</p>
      <p>II. ADAPTIVE BILINEAR MODEL</p>
      <p>
        The proposed matrix DNN is based on the adaptive matrix
bilinear model introduced earlier by the authors [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]
Yˆ (k ) =yˆj1 { j2 }
=A(k −1) X (k ) B (k −1) ,
j1 = 1, 2,..., n1;
j2 = 1, 2,..., n2
(1)
where A(k −1) , B (k −1) are (n1 × n1 ) , (n2 × n2 ) -matrices
of tunable parameters that are adjusted during online
learning-identification process.
      </p>
      <p>
        For this, either the gradient adaptation procedure
 A(k ) = A(k −1) +η A (k ) ×

 × E (k ) BT (k −1) X T (k ) ,
 (2)
B (k ) = B (k −1) +η B (k ) ×
 × X T (k −1) AT (k ) EA (k )
is used or its version optimized by speed [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that can be
written as
 A(k ) = A(k −1) + (Tr E (k ) BT (k −1) ×

 ×X T (k ) X (k ) B (k −1) ET (k )) ×

 ×(Tr E (k ) BT (k −1) X T (k ) X (k ) ×

 ×B (k −1) BT (k −1) X T ( k ) X ( k ) ×

 ×B ( k −1) ET ( k ))−1 E (k ) ×


 ×BT (k −1) X T (k ) , (3)

B (k ) = B (k −1) + (Tr EAT ( k ) A( k ) X ( k ) ×

 ×X T (k ) AT (k ) EA (k )) (Tr A(k ) ×

 ×X (k ) X T (k ) AT (k ) EA ( k ) EAT ( k ) ×

 ×A( k ) X ( k ) X T ( k ) AT ( k ))−1 ×

 ×X T (k −1) AT (k ) EA (k ) ,
that is the matrix generalization of the Kaczmarz–Widrow–
Hoff learning algorithm (here η A (k ) , η B (k ) are learning
rate parameters,
E (k ) = Y (k ) − A(k −1) X (k ) B (k −1) ,

 EA (k ) =(k) Y − A(k ) X (k ) B (k −1) ,
Y (k ) is reference matrix signal).
      </p>
      <p>The learning algorithm in Eq. (3) can be given additional
filtering properties if the learning rate parameters in Eq. (2)
are calculated using the recurrence relations that can be
written in the form</p>
      <p>η A−1 (k ) =rA (k ) =β rA (k −1) +
and
+ Tr ( E (k ) BT (k −1) ×
× X T (k ) X (k ) B (k −1) ×
× BT (k −1) X T (k ) X (k ) ×
× B (k −1) ET (k ))
η B−1 (k ) =rB (k ) =β rB (k −1) +
×Tr ( A(k ) X (k ) X T (k ) ×
× AT (k ) EA (k ) EAT (k ) A(k ) ×
× X (k ) X T (k ) AT (k ))
(4)
which is in fact the matrix generalization of the
transformation that is realized by any of the layers of a
multilayer perceptron.</p>
      <p>In Eq. (4) Ψ denotes a (n1 × n2 ) -matrix of activation
functions, that acts elementwise on the matrix of internal
activation signals of the system that are denoted by
U (k ) = {u j1 j2 (k )} .</p>
      <p>In this case, the adjustment of the parameters of the
nonlinear matrix model in Eq. (4) can be realized on the basis
of the modified δ -rule
a j1 j2 (k ) =a j1 j2 (k −1) +η A (k ) ej1 j2 (k ) ×
 n2
 ×ψ ′(u j1 j2 (k )) ∑i2=1 bj1 j2 (k −1) xi1i2 (k )

 =a j1 j2 (k −1) +η A (k ) ej1 j2 (k ) ×

 ×ψ ′(u j1 j2 (k )) xˆi1 (k ) =a j1 j2 (k −1) +

 +η A (k )δ j1 j2 (k ) xˆi1 (k ) ,
 (5)
bj1 j2 (k ) =bj1 j2 (k −1) +η B (k ) eA j1 j2 (k ) ×
 ×ψ ′(uA j1 j2 (k )) ∑i1n=11 a j1 j2 (k −1) xi1i2 (k )

 =bj1 j2 (k −1) +η B (k ) eA j1 j2 (k ) ×

 ×ψ ′(uA j1 j2 (k )) xˆi2 (k ) =bj1 j2 (k −1) +
 +η B (k )δ A j1 j2 (k ) xˆi2 (k ).</p>
      <p>On the basis of Eq. (4) it is easy to introduce into
consideration a multilayer matrix neural network that realizes
the transformation</p>
      <p>
        Yˆ (k ) =Ψ  ( A[L] (k −1) (Ψ  ( A[L−1] (k −1) ×
× (...Ψ  ( A[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (k −1) X (k ) B[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (k −1))...) ×
× B[L−1] (k −1))) B[L] (k −1))
(6)
=
=
      </p>
      <p>Using the learning algorithm from Eq. (5) and error
backpropagation, it is possible to obtain the adaptive
procedure for tuning all parameters of the matrix DNN in
Eq. (6):
- for the output layer:
a[j1Lj]2 (k ) =a[j1Lj]2 (k −1) +η A (k )δ [j1Lj]2 (k ) oˆi[L−1] (k ) ,
1
b[j1Lj]2 (k ) =b[j1Lj]2 (k −1) +η B (k )δ A[Lj]1 j2 (k ) oˆ[ALi−21] (k )
where
δ [j1Lj]2 (k ) =ψ ′(u[j1Lj]2 (k )) ej1 j2 (k ) ,
oˆi[1L−1] (k )</p>
      <p>n
=∑2b[j1Lj]2 (k −1) oi[1Li2−1] (k ) ,</p>
      <p>
        i2 =1
δ A[Lj]1 j 2 (k ) =ψ ′(u[AL]j1 j 2 (k )) eA j1 j2 (k ) ,
where 0 ≤ β ≤ 1 is smoothing parameter [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>On the basis of the model from Eq. (1), it is easy to
introduce its nonlinear modification that can be written in the
following form:
i1 =1
- for the l th hidden layer, 1 &lt; l &lt; L :</p>
      <p>n
oˆA i2 1 [j1Lj]2 (k ) oi[1Li2−1] (k ) ;
[L−1] (k ) = ∑ a</p>
      <p>i1 =1
- for the first hidden layer:
where
where
a[j1l]j2 (k ) =a[j1l]j2 (k −1) +η A (k )δ [j1l]j2 (k ) oˆi[l−1] (k ) ,
1
b[j1l]j2 (k ) =b[j1l]j2 (k −1) +η B (k )δ A[l]j1 j2 (k ) oˆ[Al −i21] (k )</p>
      <p>n
δ [j1l]j 2 (k ) =ψ ′(u[j1l]j 2 (k )) ∑1 δ [j1l +j12]a[j1l+j12] (k ) ,</p>
      <p>i1 =1
ˆ[l−1] (k )
oi1</p>
      <p>n
=∑2b[j1l]j2 (k −1) oi[1li2−1] (k ) ,
i2 =1</p>
      <p>n
δ A[l]j1 j 2 (k ) =ψ ′(u[Al]j1 j 2 (k )) ∑2 δ A[l+j11j] 2 (k ) b[j1l +j12] (k ) ,</p>
      <p>
        i2 =1
n
ˆ[l−1] (k ) = ∑1 a[l]
oA i2 j1 j2i1 (k ) oi[1li2−1] (k ) ;
a[j11]j2 (k ) =a[j11]j2 (k −1) +η A (k )δ [j11]j2 (k ) oˆi[0] (k ) ,
1
b[j11]j2 (k ) =b[j11]j2 (k −1) +η B (k )δ A[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]j1 j2 (k ) oˆ[A0i]2 (k )
      </p>
      <p>n
δ [j11]j 2 (k ) =ψ ′(u[j11]j 2 (k )) ∑1 δ [j12j]2 a[j12]j 2 (k ) ,</p>
      <p>i1 =1
oˆ[0] (k )
i1</p>
      <p>n
=∑2b[j11]j2 (k −1) xi1i2 (k ) ,
i2 =1</p>
      <p>
        n
δ A[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]j1 j 2 (k ) =ψ ′(u[j11]j 2 (k )) ∑2 δ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>A j1 j 2 (k ) b[j12j] 2 (k ) ,
i2 =1
n
oˆ[A0i]2 (k ) = ∑1 a[j11]j2 (k ) xi1i2 (k ) .</p>
      <p>i1 =1</p>
      <p>III. COMPUTATIONAL EXPERIMENTS</p>
      <p>
        The efficiency of the proposed system and learning
methods was demonstrated on the classification task.
A number of experiments was carried out on the MNIST
dataset that was introduced by Yann LeCun and Corinna
Cortes [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>This dataset is widely used for training and testing in
machine learning, namely in the classification task. This
dataset contains 60000 training observations and 10000 test
observations.</p>
      <p>Each observation is an image of size 28x28 pixels that
represents a handwritten digit. In general the dataset has
10 classes (digits from 0 to 9).</p>
      <p>Some examples of the images from this dataset are
presented in Fig. 1.</p>
      <p>The elements of an image are represented by pixel values
from 0 to 255, where 0 means white pixel (background) and
255 means black pixel (foreground). These values were
preprocessed before training using normalization. The inputs
for the network were (n1 × n2 ) -matrices, where n1 =n2 =28 .
Every hidden layer also had size of n1 × n2 = 28× 28 .</p>
      <p>The results of the computational experiments are presented
in Table 1.</p>
      <p>In this paper the matrix deep neural network and its
learning algorithm are proposed. They allow significantly to
reduce the number of adjustable weights due to the rejection
of the vectorization-devectorization operations of 2D input
signals.</p>
      <p>One of the main advantages of the proposed system is that
it also preserves the information between rows and columns
of 2D inputs of the system.</p>
      <p>The considered DNN in comparison with traditional
multilayer perceptrons has increased speed, determined by
reduced number of adjustable parameters and optimization of
the learning algorithm, and the simplicity of numerical
implementation.</p>
      <p>The proposed system can be used to solve a wide range of
machine learning tasks, particularly connected with the
problems of image processing, where input signals are
presented to the system for data processing in the form of a
matrix.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <article-title>Neural Networks for Pattern Recognition</article-title>
          . Oxford : Clarendon Press,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , Y. Bengio, G. Hinton, “Deep Learning,
          <source>” Nature</source>
          , vol.
          <volume>521</volume>
          , pp.
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          , “
          <article-title>Deep Learning in neural networks: An overview</article-title>
          ,”
          <source>Neural Networks</source>
          , vol.
          <volume>61</volume>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>117</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <article-title>Deep Learning</article-title>
          . MIT Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Daniušis</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Vaitkus</surname>
          </string-name>
          , “
          <article-title>Neural networks with matrix inputs</article-title>
          ,” Informatica,
          <volume>19</volume>
          , №4, pp.
          <fpage>477</fpage>
          -
          <lpage>486</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>Matrix neural networks</article-title>
          ,
          <source>” in Proceedings of the14th International Symposium on Neural Networks (ISNN)</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , Sapporo, Japan,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ye. V.</given-names>
            <surname>Bodyanskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. P.</given-names>
            <surname>Pliss</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Timofeev</surname>
          </string-name>
          , “
          <article-title>Discrete adaptive identification and extrapolation of two-dimensional fields,” Pattern Recognition and Image Analysis</article-title>
          ,
          <volume>5</volume>
          , №3, pp.
          <fpage>410</fpage>
          -
          <lpage>416</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haykin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Neural Networks</surname>
            :
            <given-names>A Comprehensive</given-names>
          </string-name>
          <string-name>
            <surname>Foundation. Upper Saddle River</surname>
            ,
            <given-names>N. J. : Prentice</given-names>
          </string-name>
          <string-name>
            <surname>Hall</surname>
          </string-name>
          , Inc.,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vorobyov</surname>
          </string-name>
          , Ye. Bodyanskiy, “
          <article-title>On a non-parametric algorithm for smoothing parameter control in adaptive filtering</article-title>
          ,” Engineering Simulation, vol.
          <volume>16</volume>
          , p.
          <fpage>314</fpage>
          -
          <lpage>320</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oerlemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lew</surname>
          </string-name>
          , “
          <article-title>Deep learning for visual understanding: A review,” Neurocomputing</article-title>
          , vol.
          <volume>187</volume>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stubberud</surname>
          </string-name>
          , “
          <article-title>A vector matrix real time backpropagation algorithm for recurrent neural networks that approximate multi-valued periodic functions</article-title>
          ,”
          <source>International Journal on Computational Intelligence and Application</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>395</fpage>
          -
          <lpage>411</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohamadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Afarideh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Babapour</surname>
          </string-name>
          , “
          <article-title>New 2D Matrix-Based Neural Network for Image Processing Applications</article-title>
          ,” IAENG (International Association of Engineers)
          <source>International Journal of Computer Science</source>
          ,
          <volume>42</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>265</fpage>
          -
          <lpage>274</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          ,
          <article-title>Artificial Neural Networks: Architectures and Applications</article-title>
          . NY: InTech,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Du</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Swamy</surname>
          </string-name>
          ,
          <source>Neural Networks and Statistical Learning</source>
          . Springer-Verlag London,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Graupe</surname>
          </string-name>
          ,
          <source>Principles of Artificial Neural Networks (Advanced Series in Circuits and Systems)</source>
          .
          <source>Singapore: World Scientific Publishing Co. Pte. Ltd.</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rutkowski</surname>
          </string-name>
          ,
          <article-title>Computational intelligence</article-title>
          .
          <source>Methods and techniques</source>
          , Berlin-Heidelberg: Springer-Verlag,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kruse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Borgelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Klawonn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Moewes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Steinbrecher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Held</surname>
          </string-name>
          , Computational intelligence, Berlin: Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. T.</given-names>
            <surname>Pham</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <source>Neural Networks for Identification, Prediction and Control</source>
          , London: Springer-Verlag,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>I.</given-names>
            <surname>Arel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rose</surname>
          </string-name>
          , and T. Karnowski, “
          <source>Deep machine learning - a new frontier in artificial intelligence research</source>
          ,
          <source>” IEEE Computational Intelligence Magazine</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sermanet</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y-L. Boureau</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Gregor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mathieu</surname>
          </string-name>
          , Y.. LeCun, “
          <article-title>Learning Convolutional Feature Hierachies for Visual Recognition,”</article-title>
          <source>in Proceedings of the 23rd International Conference on Neural Information Processing Systems</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>1090</fpage>
          -
          <lpage>1098</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Meier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>“Multi-column deep neural networks for image classification</article-title>
          ,”
          <source>in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pp.
          <fpage>3642</fpage>
          -
          <lpage>3649</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , “
          <article-title>ImageNet classification with deep convolutional neural networks</article-title>
          ,
          <source>” in Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12)</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , “
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          ,” in
          <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          2016.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , K. Kavukcuoglu, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Farabet</surname>
          </string-name>
          , “
          <article-title>Convolutional networks and applications in vision</article-title>
          ,”
          <source>in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS)</source>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>256</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sermanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          , “
          <article-title>Going deeper with convolutions</article-title>
          ,
          <source>” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>[26] http://yann.lecun.com/exdb/mnist/</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>