<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stanislaw Placzek</string-name>
          <email>stanislaw.placzek@wp.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijaya Adhikari</string-name>
          <email>bijaya.adhikari1991@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vistula University</institution>
          ,
          <addr-line>Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <fpage>355</fpage>
      <lpage>370</lpage>
      <abstract>
        <p>Arti cial Neural Networks are of much interest for many practical reasons. As of today, they are widely implemented. Of many possible ANNs, the most widely used ANN is the back-propagation model with direct connection. In this model the input layer is fed with input data and each subsequent layers are fed with the output of preceeding layer. This model can be extended by feeding the input data to each layer. This article argues that this new model, named cross-forward connection, is optimal than the widely used Direct Conection.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction
Arti cial Neural Networks have broad implementation in Machine Learning,
engineering and scienti c applications. Their abilities to provide solutions to
problems involving imprecisions and uncertainties with trivial implementation
have enabled us to nd solutions to real life problems as [1]:</p>
    </sec>
    <sec id="sec-2">
      <title>1. Result approximation and data interpolation</title>
    </sec>
    <sec id="sec-3">
      <title>2. Pattern recognition nad feature classi cation</title>
    </sec>
    <sec id="sec-4">
      <title>3. Data compression</title>
    </sec>
    <sec id="sec-5">
      <title>4. Trend prediciton</title>
    </sec>
    <sec id="sec-6">
      <title>5. Error identi cation</title>
    </sec>
    <sec id="sec-7">
      <title>6. Control The problems mentioned above are solved by implementing ANN as universal approximator function with multidimensional variables. The function can be represented as:</title>
      <p>
        Y = F (X)
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where:
{ X-input vector
{ Y -output vector
      </p>
      <p>Selecting a network to solve a speci c problem is a tedious task. Decision
regarding following thing must be made prior to attempting a solution.
{ Structure of Neural Network, number of hidden layers and number of neurons
in each layer. Conventionally, the size of input and output layers are de ned
by dimension of X and Y vectors respectively.
{ Structure of individual neurons encompassing activation function, which
takes requirement of learning algorithm into account.
{ Data transfer methods between layers
{ Optimization criteria and type of learning algorithm</p>
      <p>Structure of Network can be de ned in arbitrary way to accomplish complex
tasks. The structure plays vital role in determining the functionality of ANN.
This paper will compare and contrast two multilayer network structures.
{ Direct Connection: This structure consists of at-least one hidden layer. Data
tis fed from preceeding layer to succeeding one.
{ Cross Forward Connection. In this structure, the input signal is passed on
to every layer in the network. Therfore, a layer j=1,2,3.....W ,where W is
the output layer, has two inputs : vector X and Vector Vj 1, output of
preceeding layer.</p>
      <p>Structure of Cross Forward Connection is simpler than that of Direct
Connection, in terms of neuron distribution in hidden layers. Learning time, as second
parameter, is shorter for Cross Forward Connection . In later part of the paper,
we will analyze a particular optimization problem for ANN where total number
of neurons, N, and number of layers , W, are given. Our target is to maximize
the total number of subspaces which are created by neurons of every hidden
layers. We will solve this complex problem with respect to the relation between
dimensionality of feature space, N0, and neurons' number in all hidden layers,</p>
    </sec>
    <sec id="sec-8">
      <title>Ni. This problem can be divided into two sub-problems.</title>
      <p>{ Ni N0 { liner optimization problem,
{ N i &gt; N 0 { non-linear optimization problem.</p>
      <p>Where: i= 1,2,3,. . . . . . W-1.</p>
      <p>We can solve liner target function using liner-programming method. The
nonlinear task, with liner constrains, can be solved using Kuhn- Tucker conditions.
As examples, we solved both sub-problems and discussed di erent ANN
structures. In conclusion, we summarize our results giving recommendation for
different ANN structures.
2</p>
      <p>Criteria of ANN Structure Selection
The threshold function for the each neuron is de ned as follows:
g(x) =
(1;</p>
      <p>if x &gt; 0
1; if x 0</p>
    </sec>
    <sec id="sec-9">
      <title>We say that the network in Fig. 3 has structure 2-3-1. Where:</title>
      <p>{ N0=2; number of neurons in input layer.
{ N1=3; number of neurons in hidden layer.</p>
      <p>{ N2=1; number of neurons in output layer.</p>
      <p>Signal transfer from input layer to output layer in this structure can be
represented in the following way.</p>
      <p>U = W1 X</p>
      <p>V = F1(U )
E = W2 V + C2 X</p>
      <p>Y = F2(E)</p>
      <p>
        Where,
{ X[0 : N0] -input signal
{ W1[1:N1;0:N0] - weight coe cients matrix of hidden layer
{ U [1:N1]-analog signal of hidden layer
{ V [1:N1]-output signal of hidden layer
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
{ W2[1:N2;0:N1] - weight coe cients matrix of output layer
{ E[1:N2]-analog signal of output layer
{ Y [1:N2]-output signal of output layer
{ C2[1 : N2; 0 : N0] -weight coe cients matrix of Cross connection
      </p>
      <p>This network will be used for pattern recoginition after being trained by
teacher datas.</p>
      <p>
        The architecure of ANN in g(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) could be represented using hyper-spaces.
Lets imagine a hyperspace having dimension of the number of neurons in the
input layer. The rst hidden layer, depicted in equation (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), divides
feature space, X, into subspaces.
      </p>
      <p>Two dimensional feature space is divided into seven sub-spaces. These
subspaces correspond to internal structure of input data.</p>
      <p>The function (p,q) gives the maximum number of p dimensional sub-spaces
formed q number of p 1 dimensional hyper-planes. The function has following
recursive form.[3]
By de nition of (p; q), it is clear that
(p; q) =
(p
1; q) + (p
1; q</p>
      <p>
        1)
(p; 1) = 2
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
and
(1; q) = q + 1
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
In context of Neural Networks, q { number of neurons in the rst hidden layer,Ni,
and p { dimension of input vector, N0.
{ input received from the output of previous layer-Vector V
{ raw input received - vector X
      </p>
      <p>All input signals are multiplied by the adjustable weights of associated
neurons i.e. matrices W2 and C2 respectively.</p>
    </sec>
    <sec id="sec-10">
      <title>For ANN presented in g.3, we can write: (13) (14) (15)</title>
      <p>(16)</p>
    </sec>
    <sec id="sec-11">
      <title>And, nally,</title>
      <p>For ek=0,</p>
      <p>N1
ek = X W2k;i Vi +
i=1</p>
      <p>N0
X C2k;j Xj
j=0
The input space, X, in (14) represents the set of parallel hyper-planes. The
number of hyper-planes depend on Vi. For two dimension space, the second layer
of ANN is composed of four parallel lines formed by all possible combination of
values of Vi and Vj i.e.,0,0; 0,1; 1,0; 1,1.</p>
      <p>Every subspace which is formed by the hidden layer is further divided into
two smaller sub-spaces by output neuron. For N0, dimensional input space and
N1 number of neurons in the rst hidden layer, the maximum number of
subspaces is given by:</p>
      <p>For example, to divide input space into 14 subspaces, we require 3 neurons
in the rst hidden layer and 1 in output layer. Whereas, we need 5 neurons in
the rst hidden layer and 1 neuron in output layer to obtain the same number
of subspaces in the standard Direct Connection. It could be concluded that the
ANN with cross forward connection is more optimal than the regular straight</p>
    </sec>
    <sec id="sec-12">
      <title>Forward Fonnection.</title>
      <p>3</p>
      <p>Learning Algorithm for Cross Forward Connection
Network
Less number of neurons helps convergence of algorithm during learning process.
We use standard back propagation algorithm. Aim function( goal of learning
e2 =
Cij(n + 1) = Cij(n)
+ [Cij(n)</p>
      <p>Cij(n
and
4</p>
      <p>Structure Optimization of Cross Forward Connection
Network
ANN structure optimization is very complicated task and can be solved in
different ways. Experience has taught us that ANN with 1 or 2 hidden layer is
able to solve most of the practical problems. The problem of ANN optimization
structure can be described as :</p>
      <p>{ maximizing number of subspaces, (N0; W ).
when total number of neurons,N , and number number of layers, W , are given.
4.1</p>
      <p>Optimization task for ANN with one hidden layer
For ANN with 1 hidden layer, the input neurons' number,N0,is de ned by the
input vector structure X and is known as apriori. The output neurons' number
N2 is given by the output vector structure, Y - known as task de nition. We
can calculate the neurons' numbers in the hidden layer N1 using equation 16.
According to the optimization criterion and formula 16, the total number of
subspaces for ANN with one hidden layer is given by:
(N0; W ) = (N0; 2) = (N0; N1)
For ANN with 2 or more hidden layers, optimization is more complicated. As
the rst criterion, we assume that:
{ the number of layers W is given and,
{ total number of neurons N is given for all hidden layers.</p>
    </sec>
    <sec id="sec-13">
      <title>N can be calculated using:</title>
      <p>W 1</p>
      <p>N = X Ni = N1 + N2 + N3 + ::::: + NW 1</p>
      <p>In practice we have to calculate neuron's distribution between f1 : W
1g layers. To nd neuron's distribution, we have to maximize the number of
subspaces according to the equation 22 with 23 as constraint.
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)</p>
      <p>N0 1
(N0; Ni) = CNNi0 1 + 2 X</p>
      <p>k=0
f or i [1; W
when Ni</p>
      <p>CNNi0 1 = 0
1</p>
      <p>N0 &lt; 0
Ni</p>
      <p>N0</p>
      <p>W 1
N = X Ni</p>
      <p>i=1
CNNi0 1 = 0 f or Ni
CNki 1 = 0 f or Ni</p>
      <p>N0
k
Taking 22, 23, 24, and 25 into account, our optimization task can be written</p>
    </sec>
    <sec id="sec-14">
      <title>1. For all hidden layers Ni N0 and Ni k | linear task</title>
    </sec>
    <sec id="sec-15">
      <title>2. For all hidden layers Ni &gt; N0 and Ni &gt; k | non-linear task</title>
    </sec>
    <sec id="sec-16">
      <title>Set of hidden layers can be divided into two subspaces:</title>
      <p>{ S1 = fN1; N2; N3; ::::::; Njg where j W 1.For S1, N N0 and N i
{ S2 = fNj+1; Nj+2; Nj+3; ::::::; NW 1g.For S1, Ni &gt; N0 and N i &gt; K
K</p>
      <p>Where W = number of layers and W-1 = number of hidden layers. This is
a mixed structure, for which nal solution can be found using mixture of both
methods from point 1 and 2.
4.3</p>
      <p>Neuron distribution in the hidden layers, where neurons'
number for all hidden layers is less or equal than initial feature
space</p>
    </sec>
    <sec id="sec-17">
      <title>In this case, we have</title>
      <p>Ni</p>
      <p>N0 f or i f 1; W
1g</p>
    </sec>
    <sec id="sec-18">
      <title>So, the total number of subspaces is de ned by or,</title>
      <p>(N0; Ni) =
(Ni
N0!(Ni
1)!
1</p>
      <p>N0)!
+ 2
Ni</p>
      <p>N0 and Ni; N0
0</p>
      <p>max
Ni [1;W 1]
W 1
f or N = X Ni
i=1
( W 1</p>
      <p>Y 2Ni
i=1
)</p>
    </sec>
    <sec id="sec-19">
      <title>Equation 33 is monotonously increasing and can be written as = max</title>
      <p>Ni [1;W 1]
n 2PiW=1 1 Ni o</p>
      <p>Under the given number of layers, total number of neurons have to satisfy
the new constraints</p>
      <p>Ni</p>
      <p>N0 and N
(W
1)N0
(35)</p>
    </sec>
    <sec id="sec-20">
      <title>Example:</title>
      <p>For ANN with N0 = 3; N1 3; N2 3; N3 = 1, W = 3, nd optimum neurons
distribution between two hidden layers N1, N2.</p>
      <p>It is known that for output layer N3 = 1 and therefore we will only consider
two hidden layer for optimization process. For all Ni, where i = 1; 2 and Ni N0,
using 35 we can write:</p>
      <p>N
(W
1) N0 = (3</p>
      <p>Finally, we have three optimal solutions with three di erent ANN structure.</p>
    </sec>
    <sec id="sec-21">
      <title>Every structure generates 16 subspaces and are euqivalent. Table 2.</title>
      <p>In conclusion, we can say that for every given total number of neurons,N ,
we have many possible neurons distribution between layers. Optimal number of
subspaces in the initial feature space has the same value, .
(36)
(37)</p>
      <p>Neurons distribution in the hidden layers, where neurons'
number for all hidden layers is greater than initial feature space
Lets assume number of layers, W =3. It implies that we have only two hidden
layers. According formula 24.</p>
      <p>N0 1
(N0; Ni) =CNNi0 1 + 2 X
k=0</p>
      <p>CNki 1
for i [1 : W 1] and Ni &gt; N0</p>
      <p>For whole ANN, total number of subspaces is given by (38)</p>
    </sec>
    <sec id="sec-22">
      <title>Taking all assumptions into account we can write,</title>
      <p>(N0; N1) = CNNi0 1 + 2 (CN0i 1 + CN1i 1 + ::::: + CNNi0 11) f or N0 &lt; Ni
(N0; N1) &lt; CNNi0 1 + 2 2Ni 1 &lt; 2Ni
(39)</p>
      <p>In this situation we do not know how many suspaces there are for (N0; N1).
To nd neurons distribution between the hidden layers we should know relations
between N0, Ni and N .</p>
    </sec>
    <sec id="sec-23">
      <title>Example:</title>
      <p>For N0=3, W =3 N =8, and N =10, N =12 nd neuron distribution in the layers,
were Ni &gt; 3. We should maximize the quality criterion
We solve the equation using Kuhn-Tucker conditions. Taking 42 into account.
we can write the following Lagrange equation
For most practical purposes, ANNs with one hidden layer are su cient. Learning
Algorithms for the networks are time consuming and depend on number of layers
and number of neurons in each layer. The running time of learning algorithm has
dependency, greater than linear, on the number of neurons. Hence, the running
time increases faster than the total number of neurons.</p>
      <p>Cross Forward connection provides us an opportunity to decrease the number
of neurons and thus, the running time of learning algorithm.</p>
      <p>We implemented both Direct Connection Neural Networks and Cross
Forward Neural Networks with one hidden layer and used them for pattern
recognition.</p>
      <p>Our implementation required three input neurons and two output neurons.
We varied the number of neurons in hidden layer and trained both networks for
limited number of epoches and noted the sum of squared errors of each output
neurons. The procedure was repeated 20 times and the average sum of square of
errors were recorded. Datas for two cases are presented in table 4 and 5.</p>
      <p>Table 4 and 5 clearly demonstrate that for the given number of neurons in
the hidden layer, Cross-Forward Connection are optimal. If we closely examine
the error term in table four for Direct Connection and the same in table 5 for
Cross Forward Connection we will notice that they are fairly comparable. It
demonstrates that Cross Forward Connecton Structure with one neuron neuron
in hidden layer is almost as good as Direct Connection with four neurons in
hidden layer. Thus, Cross-Forward connection reduce the required number of
neurons in ANNs.</p>
      <p>In addition using optimizations criterion for Cross Forward Connection
structures, we have solved two di erent tasks. For linear one , where Ni N0 for
i=1,2,. . . W-1, we e achieved an equivalent ANN structures with the same
number of total subspaces (N0; W 1). This means that for given total number
of neurons ,N , and number of layers W , there are multiple equivalent ANN
structures ( Table 2). In practice this ANN structures can be used for tasks with
very big dimensionality of input vector X (initial feature space). For nonlinear
optimization task, where Ni &gt; N0 for i=1,2,3. . . . . . W-1, the target function
is nonlinear with liner constraints. There could be one or more optimum
solutions. Final solution depends on dimensionality of feature space N0 and relation
between N, Ni and W. In our example, for ANN with N0 = 3 , W=3, and
N=8,9,10,11,12,. . . .. we achieved one optimum solution for even N0s and two
solutions for odd N0s ( Table 3).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Stanisaw</given-names>
            <surname>Osowski</surname>
          </string-name>
          ,
          <source>Sieci Neuronowe do Przetwarzania Informacji. O cyna Wydawnicza Politechniki Warszawskiej</source>
          ,
          <year>Warszawa 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Osowski</surname>
          </string-name>
          ,
          <article-title>Sieci neuronowe w ujeciu algorytmicznym</article-title>
          .
          <source>WNT</source>
          , Warszawa
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>O.B.</given-names>
            <surname>Lapunow</surname>
          </string-name>
          ,
          <source>On Possibility of Circuit Synthesis of Diverse Elements, Mathematical Institut of B.A. Steklova</source>
          ,
          <year>1958</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Toshinori</given-names>
            <surname>Munakate</surname>
          </string-name>
          ,
          <source>Fundationals of the New Arti cial Intelligence</source>
          .
          <source>Second Edition</source>
          , Springer
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Colin</given-names>
            <surname>Fyle</surname>
          </string-name>
          ,
          <source>Arti cial Neural networks and Information Theory, Departmeeent of Ciomputing and information Systems</source>
          , The University of Paisley,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Joarder</given-names>
            <surname>Kamruzzaman</surname>
          </string-name>
          , Rezaul Begg,
          <source>Arti cial Neural Networks in Finance and Manufacturing</source>
          , Idea Group Publishing,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Mariciak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Korbicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kus</surname>
          </string-name>
          , Wstepne przetwarzanie danych,
          <source>Sieci Nuronowe</source>
          tom
          <volume>6</volume>
          ,
          <string-name>
            <surname>Akademicka</surname>
            <given-names>O</given-names>
          </string-name>
          cyna
          <string-name>
            <surname>Wydawnicza</surname>
            <given-names>EXIT</given-names>
          </string-name>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A.</given-names>
            <surname>Marciniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Korbicz</surname>
          </string-name>
          , Neuronowe sieci modularne,
          <source>Sieci Nuronowe</source>
          tom
          <volume>6</volume>
          ,
          <string-name>
            <surname>Akademicka</surname>
            <given-names>O</given-names>
          </string-name>
          cyna
          <string-name>
            <surname>Wydawnicza</surname>
            <given-names>EXIT</given-names>
          </string-name>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mikrut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tadeusiewicz</surname>
          </string-name>
          ,
          <article-title>Sieci neuronowe w przetwarzaniu i rozpoznawaniu obrazow</article-title>
          ,
          <source>Sieci Nuronowe</source>
          tom
          <volume>6</volume>
          ,
          <string-name>
            <surname>Akademicka</surname>
            <given-names>O</given-names>
          </string-name>
          cyna
          <string-name>
            <surname>Wydawnicza</surname>
            <given-names>EXIT</given-names>
          </string-name>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. L. Rutkowski,
          <article-title>Metody i techniki sztucznej inteligencji</article-title>
          ,
          <source>Wydawnictwo Naukowe PWN</source>
          , warszawa
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Juan R. Rabunal</surname>
          </string-name>
          , Julian Dorado,
          <article-title>Arti cial Neural Networks in Real-Life Applications</article-title>
          , Idea Group Publishing
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>