-

Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection

Stanislaw Placzek

stanislaw.placzek@wp.pl 0

Bijaya Adhikari

bijaya.adhikari1991@gmail.com 0 0 Vistula University , Warsaw , Poland

355 370

Arti cial Neural Networks are of much interest for many practical reasons. As of today, they are widely implemented. Of many possible ANNs, the most widely used ANN is the back-propagation model with direct connection. In this model the input layer is fed with input data and each subsequent layers are fed with the output of preceeding layer. This model can be extended by feeding the input data to each layer. This article argues that this new model, named cross-forward connection, is optimal than the widely used Direct Conection.

Introduction Arti cial Neural Networks have broad implementation in Machine Learning, engineering and scienti c applications. Their abilities to provide solutions to problems involving imprecisions and uncertainties with trivial implementation have enabled us to nd solutions to real life problems as [1]:

1. Result approximation and data interpolation 2. Pattern recognition nad feature classi cation 3. Data compression 4. Trend prediciton 5. Error identi cation 6. Control The problems mentioned above are solved by implementing ANN as universal approximator function with multidimensional variables. The function can be represented as:

Y = F (X) ( 1 ) where: { X-input vector { Y -output vector

Selecting a network to solve a speci c problem is a tedious task. Decision regarding following thing must be made prior to attempting a solution. { Structure of Neural Network, number of hidden layers and number of neurons in each layer. Conventionally, the size of input and output layers are de ned by dimension of X and Y vectors respectively. { Structure of individual neurons encompassing activation function, which takes requirement of learning algorithm into account. { Data transfer methods between layers { Optimization criteria and type of learning algorithm

Structure of Network can be de ned in arbitrary way to accomplish complex tasks. The structure plays vital role in determining the functionality of ANN. This paper will compare and contrast two multilayer network structures. { Direct Connection: This structure consists of at-least one hidden layer. Data tis fed from preceeding layer to succeeding one. { Cross Forward Connection. In this structure, the input signal is passed on to every layer in the network. Therfore, a layer j=1,2,3.....W ,where W is the output layer, has two inputs : vector X and Vector Vj 1, output of preceeding layer.

Structure of Cross Forward Connection is simpler than that of Direct Connection, in terms of neuron distribution in hidden layers. Learning time, as second parameter, is shorter for Cross Forward Connection . In later part of the paper, we will analyze a particular optimization problem for ANN where total number of neurons, N, and number of layers , W, are given. Our target is to maximize the total number of subspaces which are created by neurons of every hidden layers. We will solve this complex problem with respect to the relation between dimensionality of feature space, N0, and neurons' number in all hidden layers,

Ni. This problem can be divided into two sub-problems.

{ Ni N0 { liner optimization problem, { N i > N 0 { non-linear optimization problem.

Where: i= 1,2,3,. . . . . . W-1.

We can solve liner target function using liner-programming method. The nonlinear task, with liner constrains, can be solved using Kuhn- Tucker conditions. As examples, we solved both sub-problems and discussed di erent ANN structures. In conclusion, we summarize our results giving recommendation for different ANN structures. 2

Criteria of ANN Structure Selection The threshold function for the each neuron is de ned as follows: g(x) = (1;

if x > 0 1; if x 0

We say that the network in Fig. 3 has structure 2-3-1. Where:

{ N0=2; number of neurons in input layer. { N1=3; number of neurons in hidden layer.

{ N2=1; number of neurons in output layer.

Signal transfer from input layer to output layer in this structure can be represented in the following way.

U = W1 X

V = F1(U ) E = W2 V + C2 X

Y = F2(E)

Where, { X[0 : N0] -input signal { W1[1:N1;0:N0] - weight coe cients matrix of hidden layer { U [1:N1]-analog signal of hidden layer { V [1:N1]-output signal of hidden layer ( 3 ) ( 4 ) ( 5 ) ( 6 ) { W2[1:N2;0:N1] - weight coe cients matrix of output layer { E[1:N2]-analog signal of output layer { Y [1:N2]-output signal of output layer { C2[1 : N2; 0 : N0] -weight coe cients matrix of Cross connection

This network will be used for pattern recoginition after being trained by teacher datas.

The architecure of ANN in g( 3 ) could be represented using hyper-spaces. Lets imagine a hyperspace having dimension of the number of neurons in the input layer. The rst hidden layer, depicted in equation ( 3 ) and ( 4 ), divides feature space, X, into subspaces.

Two dimensional feature space is divided into seven sub-spaces. These subspaces correspond to internal structure of input data.

The function (p,q) gives the maximum number of p dimensional sub-spaces formed q number of p 1 dimensional hyper-planes. The function has following recursive form.[3] By de nition of (p; q), it is clear that (p; q) = (p 1; q) + (p 1; q

1) (p; 1) = 2 ( 7 ) ( 8 ) and (1; q) = q + 1 ( 9 ) In context of Neural Networks, q { number of neurons in the rst hidden layer,Ni, and p { dimension of input vector, N0. { input received from the output of previous layer-Vector V { raw input received - vector X

All input signals are multiplied by the adjustable weights of associated neurons i.e. matrices W2 and C2 respectively.

For ANN presented in g.3, we can write: (13) (14) (15)

(16)

And, nally,

For ek=0,

N1 ek = X W2k;i Vi + i=1

N0 X C2k;j Xj j=0 The input space, X, in (14) represents the set of parallel hyper-planes. The number of hyper-planes depend on Vi. For two dimension space, the second layer of ANN is composed of four parallel lines formed by all possible combination of values of Vi and Vj i.e.,0,0; 0,1; 1,0; 1,1.

Every subspace which is formed by the hidden layer is further divided into two smaller sub-spaces by output neuron. For N0, dimensional input space and N1 number of neurons in the rst hidden layer, the maximum number of subspaces is given by:

For example, to divide input space into 14 subspaces, we require 3 neurons in the rst hidden layer and 1 in output layer. Whereas, we need 5 neurons in the rst hidden layer and 1 neuron in output layer to obtain the same number of subspaces in the standard Direct Connection. It could be concluded that the ANN with cross forward connection is more optimal than the regular straight

Forward Fonnection.

Learning Algorithm for Cross Forward Connection Network Less number of neurons helps convergence of algorithm during learning process. We use standard back propagation algorithm. Aim function( goal of learning e2 = Cij(n + 1) = Cij(n) + [Cij(n)

Cij(n and 4

Structure Optimization of Cross Forward Connection Network ANN structure optimization is very complicated task and can be solved in different ways. Experience has taught us that ANN with 1 or 2 hidden layer is able to solve most of the practical problems. The problem of ANN optimization structure can be described as :

{ maximizing number of subspaces, (N0; W ). when total number of neurons,N , and number number of layers, W , are given. 4.1

Optimization task for ANN with one hidden layer For ANN with 1 hidden layer, the input neurons' number,N0,is de ned by the input vector structure X and is known as apriori. The output neurons' number N2 is given by the output vector structure, Y - known as task de nition. We can calculate the neurons' numbers in the hidden layer N1 using equation 16. According to the optimization criterion and formula 16, the total number of subspaces for ANN with one hidden layer is given by: (N0; W ) = (N0; 2) = (N0; N1) For ANN with 2 or more hidden layers, optimization is more complicated. As the rst criterion, we assume that: { the number of layers W is given and, { total number of neurons N is given for all hidden layers.

N can be calculated using:

W 1

N = X Ni = N1 + N2 + N3 + ::::: + NW 1

In practice we have to calculate neuron's distribution between f1 : W 1g layers. To nd neuron's distribution, we have to maximize the number of subspaces according to the equation 22 with 23 as constraint. (21) (22) (23) (24) (25) (26) (27) (28) (29)

N0 1 (N0; Ni) = CNNi0 1 + 2 X

k=0 f or i [1; W when Ni

CNNi0 1 = 0 1

N0 < 0 Ni

W 1 N = X Ni

i=1 CNNi0 1 = 0 f or Ni CNki 1 = 0 f or Ni

N0 k Taking 22, 23, 24, and 25 into account, our optimization task can be written

1. For all hidden layers Ni N0 and Ni k | linear task 2. For all hidden layers Ni > N0 and Ni > k | non-linear task Set of hidden layers can be divided into two subspaces:

{ S1 = fN1; N2; N3; ::::::; Njg where j W 1.For S1, N N0 and N i { S2 = fNj+1; Nj+2; Nj+3; ::::::; NW 1g.For S1, Ni > N0 and N i > K K

Where W = number of layers and W-1 = number of hidden layers. This is a mixed structure, for which nal solution can be found using mixture of both methods from point 1 and 2. 4.3

Neuron distribution in the hidden layers, where neurons' number for all hidden layers is less or equal than initial feature space

In this case, we have

N0 f or i f 1; W 1g

So, the total number of subspaces is de ned by or,

(N0; Ni) = (Ni N0!(Ni 1)! 1

N0)! + 2 Ni

N0 and Ni; N0 0

max Ni [1;W 1] W 1 f or N = X Ni i=1 ( W 1

Y 2Ni i=1 )

Equation 33 is monotonously increasing and can be written as = max

Ni [1;W 1] n 2PiW=1 1 Ni o

Under the given number of layers, total number of neurons have to satisfy the new constraints

N0 and N (W 1)N0 (35)

Example:

For ANN with N0 = 3; N1 3; N2 3; N3 = 1, W = 3, nd optimum neurons distribution between two hidden layers N1, N2.

It is known that for output layer N3 = 1 and therefore we will only consider two hidden layer for optimization process. For all Ni, where i = 1; 2 and Ni N0, using 35 we can write:

N (W 1) N0 = (3

Finally, we have three optimal solutions with three di erent ANN structure.

Every structure generates 16 subspaces and are euqivalent. Table 2.

In conclusion, we can say that for every given total number of neurons,N , we have many possible neurons distribution between layers. Optimal number of subspaces in the initial feature space has the same value, . (36) (37)

Neurons distribution in the hidden layers, where neurons' number for all hidden layers is greater than initial feature space Lets assume number of layers, W =3. It implies that we have only two hidden layers. According formula 24.

N0 1 (N0; Ni) =CNNi0 1 + 2 X k=0

CNki 1 for i [1 : W 1] and Ni > N0

For whole ANN, total number of subspaces is given by (38)

Taking all assumptions into account we can write,

(N0; N1) = CNNi0 1 + 2 (CN0i 1 + CN1i 1 + ::::: + CNNi0 11) f or N0 < Ni (N0; N1) < CNNi0 1 + 2 2Ni 1 < 2Ni (39)

In this situation we do not know how many suspaces there are for (N0; N1). To nd neurons distribution between the hidden layers we should know relations between N0, Ni and N .

Example:

For N0=3, W =3 N =8, and N =10, N =12 nd neuron distribution in the layers, were Ni > 3. We should maximize the quality criterion We solve the equation using Kuhn-Tucker conditions. Taking 42 into account. we can write the following Lagrange equation For most practical purposes, ANNs with one hidden layer are su cient. Learning Algorithms for the networks are time consuming and depend on number of layers and number of neurons in each layer. The running time of learning algorithm has dependency, greater than linear, on the number of neurons. Hence, the running time increases faster than the total number of neurons.

Cross Forward connection provides us an opportunity to decrease the number of neurons and thus, the running time of learning algorithm.

We implemented both Direct Connection Neural Networks and Cross Forward Neural Networks with one hidden layer and used them for pattern recognition.

Our implementation required three input neurons and two output neurons. We varied the number of neurons in hidden layer and trained both networks for limited number of epoches and noted the sum of squared errors of each output neurons. The procedure was repeated 20 times and the average sum of square of errors were recorded. Datas for two cases are presented in table 4 and 5.

Table 4 and 5 clearly demonstrate that for the given number of neurons in the hidden layer, Cross-Forward Connection are optimal. If we closely examine the error term in table four for Direct Connection and the same in table 5 for Cross Forward Connection we will notice that they are fairly comparable. It demonstrates that Cross Forward Connecton Structure with one neuron neuron in hidden layer is almost as good as Direct Connection with four neurons in hidden layer. Thus, Cross-Forward connection reduce the required number of neurons in ANNs.

In addition using optimizations criterion for Cross Forward Connection structures, we have solved two di erent tasks. For linear one , where Ni N0 for i=1,2,. . . W-1, we e achieved an equivalent ANN structures with the same number of total subspaces (N0; W 1). This means that for given total number of neurons ,N , and number of layers W , there are multiple equivalent ANN structures ( Table 2). In practice this ANN structures can be used for tasks with very big dimensionality of input vector X (initial feature space). For nonlinear optimization task, where Ni > N0 for i=1,2,3. . . . . . W-1, the target function is nonlinear with liner constraints. There could be one or more optimum solutions. Final solution depends on dimensionality of feature space N0 and relation between N, Ni and W. In our example, for ANN with N0 = 3 , W=3, and N=8,9,10,11,12,. . . .. we achieved one optimum solution for even N0s and two solutions for odd N0s ( Table 3).

Stanisaw

Osowski , Sieci Neuronowe do Przetwarzania Informacji. O cyna Wydawnicza Politechniki Warszawskiej , Warszawa 2006 .

Osowski , Sieci neuronowe w ujeciu algorytmicznym . WNT , Warszawa 1996 .

O.B.

Lapunow , On Possibility of Circuit Synthesis of Diverse Elements, Mathematical Institut of B.A. Steklova , 1958 .

Toshinori

Munakate , Fundationals of the New Arti cial Intelligence . Second Edition , Springer 2008 .

Colin

Fyle , Arti cial Neural networks and Information Theory, Departmeeent of Ciomputing and information Systems , The University of Paisley, 2000 .

Joarder

Kamruzzaman , Rezaul Begg, Arti cial Neural Networks in Finance and Manufacturing , Idea Group Publishing, 2006 .

Mariciak ,

Korbicz ,

Kus , Wstepne przetwarzanie danych, Sieci Nuronowe tom 6 , Akademicka

cyna Wydawnicza

EXIT

2000 .

Marciniak ,

Korbicz , Neuronowe sieci modularne, Sieci Nuronowe tom 6 , Akademicka

cyna Wydawnicza

EXIT

2000 .

Mikrut ,

Tadeusiewicz , Sieci neuronowe w przetwarzaniu i rozpoznawaniu obrazow , Sieci Nuronowe tom 6 , Akademicka

cyna Wydawnicza

EXIT

2000 .

10. L. Rutkowski, Metody i techniki sztucznej inteligencji , Wydawnictwo Naukowe PWN , warszawa 2006 .

11. Juan R. Rabunal , Julian Dorado, Arti cial Neural Networks in Real-Life Applications , Idea Group Publishing 2006 .