<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Coordination algorithm in hierarchical structure of the learning process of Arti cial Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stanislaw Placzek</string-name>
          <email>stanislaw.placzek@wp.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijaya Adhikari</string-name>
          <email>bijaya.adhikari1991@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vistula University</institution>
          ,
          <addr-line>Warsaw</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>While analyzing Arti cial Neural Network structures, one usually nds that the rst parameter is the number of the ANN layers . Hierarchical structure is an accepted default way to de ne ANN structure . This structure can be described using di erent methods, mathematical tools, software and/or hardware realization. In this article, we are proposing ANN decomposition into hidden and output sub networks. To build this kind of learning algorithm, information is exchanged between the rst sub networks level and the second coordinator level in every iteration .Learning coe cients are tuned in every iteration. The main coordination task is to choose the coordination parameters in order to minimize both the global target function and all local target functions. In each iteration their values should decrease in asymptotic way to achieve the minimum. In article learning algorithms using forecasting of sub networks connectedness is studied .</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Many ANN structures are in practice. The most popular among them is the
one with Forward Connections having complete or semi-complete set of weight
coe cients. For special needs, ANNs with Forward Cross Connections and
Back Connections are used. The full structure of ANN. is depicted on Fig.1. To
describe the structure ,independent of the ANN complexity, partition on layers
is used: the input layer, one or more hidden layers, and the output layer. Input
layer connects ANN with external world ( environment) and performs initial
processing , calibration or ltering of input data. The hidden layers are used
for main data processing.</p>
      <p>In most common structures, hidden layers include more neurons than
input layer and they use non-linear activation function. The output layer which
sums all signals from hidden layers uses two types of activation functions:
linear activation function for classi cation tasks and non-linear sigmoid or tanth
activation functions for approximation tasks . In this paper, to avoid confusion
regarding the number of layers, only the hidden layers and the output layer are
included. The concept of layers in ANN structures re ects the silent assumption
that ANN structures are hierarchical. Taking this into account as very
important feature of ANN , to describe the network characteristic, a couple of the
conceptions can be used.
1.1
To analyze ANN structure, verbal description is used so as to help everybody
understand how ANN is built. For more detailed analysis, mathematical
description using algebra and/or di erential equations is required. Based on these
descriptions, ANNs are then implemented by a computer program or an
electronic device. So, to achieve complete description of ANN, concepts and models
from di erent elds of science and technology have to be used.</p>
      <p>Every model uses its own set of variables and terminology in di erent
abstract level. To describe and understand how a particular ANN is working ,
some hierarchical set of abstract concepts are used. To separate these concepts
from the layer description, a new name is used [15] { delamination of ANN into
abstract strata.
1.2</p>
      <p>Calculation complexity or decision taking.</p>
      <p>For multi- layered ANN a lot of hidden layers and output layer can be sectioned
o . Every layer has own output vector that is an input vector of the next
layer, vi i = 1; 2; :::n;. Both hidden layers and output layer can be described
as sub - networks. \n" de nes the total number of sub networks. ANN logic
decomposition depends on layers separated by establishment of extra output
vectors vi i = 1; 2; :::n. Now the network consists of the set of sub- networks ,
for each of which local target function is de ned by = ( 1; 2; ::: n).</p>
      <p>Similar to ANN structure decomposition, learning algorithm using error back
propagation can be decomposed too. (Fig.3.). We can sort out:
- The rst level task in which the minimum of the local target functions
i i = 1; 2; :::n is searched.</p>
      <p>- The second level task which has to coordinate the all rst level tasks.</p>
      <p>In a learning algorithm constructed this way, there is a set of optimization
tasks on the rst level . These tasks are searching for the minimum value of
target function . Unfortunately these are non- linear tasks without constrains.
In practice, standard procedures to solve these problems exist. But in two level
learning algorithm structure, coordinator is not responsible for solving the global
task. Coordinator is obliged to calculate the value of coordination parameters
= ( 1; 2; : : : n) for every task on the rst level . The rst level , searching
for the solution of all tasks have to use the coordination parameters value. It
is an iterative process. Coordinator in every iteration cycle receives new values
of feedback parameters = ( 1; 2 : : : n) from the rst level tasks. Using this
information coordinator has to make new decisions { calculate the new
coordination parameters value. These procedures could be relatively complicated
and in the most situations they happen to be non { gradient procedures. In the
hierarchical learning algorithm , target functions can be de ned as:
Global target function</p>
      <p>,
Set of local target functions for every sub network
i where i = 1; 2; :::n,
Coordinator target function .</p>
      <p>According to [15][2], solution of the primary task depends on the minimum
global target function . The rst level tasks should be built in a way that
when all the rst level tasks are solved, the nal solution must be achieved {
the minimum of the global target function. This kind of strati ed structure is
known as level hierarchy [15].</p>
      <p>To summarize we conclude:</p>
      <p>Complexity of the problem increases from the rst level to the second.
Coordinator needs more time to solve its own tasks.</p>
      <p>Coordination tasks could be non { parametric procedures. To study
dynamics of changing target functions value , coordinator should have the
ability to change ( or changing) learning parameters in the rst level tasks.
As stressed above all the rst level task are non { liner and have to be
solved using iteration procedures.</p>
      <p>For di erent tasks, characteristic of ANN learning processes could be
different . Coordinator studying feedback information from the rst level
tasks should have the ability to change all parameters in the both
coordinator and the rst level procedures.
2</p>
      <p>Decomposition and coordination of ANN
learning algorithm
The two layered ANN with one hidden layer and output layer using full internal
forward connections does not have Cross Forward and Back Connections. This
kind of networks can be used for both approximation and classi cation tasks.
According to concept introduced above this ANN can be described by using two
strata.
2.1</p>
      <p>Verbal description of Structure. Stratum 2.</p>
      <p>ANN with full forward connection contains one hidden layer. In this layer
connections between input vector X and output vector V 1 are represented by
matrix W 1. All matrix coe cients are de ned . Connections in the output
layer are de ned by matrix W 2. Matrices connect input vector V 1 and output
vector Y . In this matrix all weight coe cient are de ned , too. Number of input
neurons is de ned by vector X which has dimensionality of N0. In the same
way number of neurons in the output layer is de ned by vector Y which has
dimensionality of N2. Number of neurons in the hidden layers ,N1, depends on
complexity of problem. Usually N1 &gt; N0, so data is not compressed in the rst
layer. Based on the description introduced above, the ANN can be set o as
hierarchical level structure (Fig.4).In the rst level, two local target functions,
1 for the rst sub-network and 2 for the second sub-network, are de ned. On
the second level, coordinator is established. Its main goal is to coordinate all the
rst level tasks and to achieve the minimum of the global target function . For
coordinator two functions G and H are de ned which transforms coordination
signals (V 21; V 12) and feedback signals (V 1; V 2). At the same time, coordinator
should have the ability to change value of learning coe cients 1 and 2 by using
transformation functions h1( 1; 2) and h2( 1; 2) (Fig.4.).
In the decomposed ANN structure we can de ned the next target functions:
Global target function . For all epoch:</p>
      <p>N2 Np
(W 1; W 2; X; Y ) = X X
v2pk = f (epk)</p>
      <p>N2
epk = X W 2ki v12ip</p>
      <p>i=0
Where:
f - sigmoid function,
i = 1; 2:::N1 k = 1; 2; ::N2,</p>
      <p>2pk- local target function for "k" output of the second sub-network and
pth element of training set.</p>
      <p>On the rst level two minimization task 1 and 2 have to be solved. These
target functions have additive structures. Both could be divided into N1 and
N2 sub-tasks respectively. This can be used to build programming procedures
using appropriate programming language. So, we can formulate N1 sub-tasks</p>
      <p>N1
min 1 = min X
i=1</p>
      <p>N1 Np N0
1i = X X(f [X W 1ij xjp] v21ip)2</p>
      <p>i=1 p=1 j=0</p>
      <p>Np
= X(v1ip
p=1</p>
      <p>v21ip)
W 1ij(n + 1) = W 1ij
1</p>
      <p>xjp
For i = 1; 2; :::N1 j = 0; 1; 2:::N0</p>
      <p>1i-local tarall get function for "i" output of the rst sub-network and for whole
training set.</p>
      <p>In the same way can be formulated N2 sub-tasks</p>
      <p>N2
min 2 = min X
k=1</p>
      <p>N2 Np N1
2k = X X(f [X W 2ki v12ip] zp)2</p>
      <p>k
k=1 p=1 i=0</p>
      <p>Np
= X(v2pk</p>
      <p>p=1
k=1 p=1
zkp)
zkp)
W 2ki(n + 1) = W 2ki(n)</p>
      <p>
        For k = 1; 2; ::::N2 i = 0; 1; 2; :::::N 1
2k-local target function for "k" output of the second sub-network and for whole
training set.
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
(
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
(
        <xref ref-type="bibr" rid="ref12">12</xref>
        )
(
        <xref ref-type="bibr" rid="ref13">13</xref>
        )
(
        <xref ref-type="bibr" rid="ref14">14</xref>
        )
Coordinator target function
! =
( 1; 2; V 1; V 2)
(
        <xref ref-type="bibr" rid="ref15">15</xref>
        )
      </p>
      <p>The rst level tasks calculate control parameters and send them to
coordinator. Additionally , in every iteration, coordinator analyzes the local target
functions 1i(n) and 2k(n). This information is necessary to calculate the
new vector value V 21. At the same time coordinator should have the ability
to interfere in learning process by selecting new value of learning parameters
1; 2; 2. Coordinator can calculate the value of target function by itself using
data sent to it by the rst level. We should stress that values of the target
function change dramatically during the learning process. We observed that values
of 1i(n) and 2k(n) changed signi cantly during several hundred iterations. At
the same, during learning process the values of the target functions can increase
to a big value and then decrease drastically. This process explains that ANN,
at the beginning of the learning process, has to attune the weight coe cients of
the W1 matrix . In the next step both 1 and 2 target functions change their
value in an asymptotic way to achieve their minimum. This means that weight
coe cients for both W 1 and W 2 matrixes are near the stable values and only
small corrections are pursued. So, coordinator should study not only the target
functions but their dynamic changing process too.
3</p>
      <p>Example
In an example the main dynamic characteristics of the learning process are
shown. The stress is made on the characteristic of the rst level local target
functions 1 , 2. The structure of ANN is simple and can be described as
ANN(3-5-1). This mean that ANN includes, 3 input neurons, 5 neurons in
hidden layer and 1 output neuron . Sigmoid activation functions are implemented
in both hidden and output layers. Three arguments of XOR function is fed as
input data. So, every epoch includes 8 vectors. Changing di erent learning
parameters as 1; 2; 2; 1; 2 dynamic characteristics have been studied.</p>
      <p>. In the second part of the test, the simple adaptive coordination algorithm
was used. Fig.5. shows how the two target functions 1, 2 changed their value
during learning process ( iterations' number). The quality of dynamic processes
is di erent. The function 2, represent the second local target function (
output one). This process is smooth. This means that at (during) the learning
process the value of 2 decreases at a constant rate to the minimum value.
Midway through the process, its value decreases very slow. This is correlated
with the rst target function 1 ( hidden layer). This quality is quite di erent.
From start to 3700 iterations target function 1 increased its value. Two local
maximum in 1000 iterations and 3700 iterations are seen. After that, both 1
and 1 functions decreases their value and in the asymptotic way achieves the
minimum.</p>
      <p>As we stressed in previous sections , hidden layer can be divide into 5
subnetworks. Fig.6. shows the outputs of the three sub-networks ( 11; 13; 14).
The quality of dynamic characteristics are the same, but maximum of the
amplitudes are di erent.</p>
      <p>In the next gure (Fig.7), we can see that the quality of learning process
depends on 2 parameter. This parameter is calculated by coordinator and has
impact on the forecast of the vector V 21 value., For 2 =0.1 , that is too small,
learning process isn't smooth. small oscillations can be seen. But if 2=0.5 is
too big, the amplitude increase its value more than 5 times . So, coordinator
should calculate 2 using own adaptive algorithm which should achieve from
the rst level and analyze the target functions 1 and 2.</p>
      <p>To study impact of the coordinator on the quality of learning process ,
adaptive algorithm changes two parameters 1 { learning rate for the hidden
layer, and 2 - learning rate for the vector V 21 . Vector V 21 forecasts the hidden
layer's output (Fig.8). When 2 is greater than 1 learning rates increases.
Their values were increased in very small steps of only 0.05. Learning rates of
both 1 and 2 shouldn't be extremely large or small.
So two extra constraints were used. (Fig.5). shows the coordinator's nal
impact on the quality of the learning process. Target function 2 decreases its
value throughout the learning process, but target function 1 still has the two
maximum values. This problem will be studied in future work.</p>
      <p>(Fig.10) shows how the value of two learning rates are changed by
coordinator.
In [15], few of coordination principles are de ned for big hierarchical systems
structure. In this article, the following principle is used - the forecast of the
connections between sub-networks. In the hierarchical structure of ANN,
coordinator should forecast the value of the vector V 21. This value should be the
same as the real value of the hidden layer output V 1. In this situation global
target function should achieve its minimum value and then the learning process
is nishes.</p>
      <p>If the rst level of local target functions both 1 and 2 meet a couple of
conditions [2][15],then convergence is guaranteed. Unfortunately the global
target function isn't concave and could have a lot of local minimum. Therefore,
it is not possible to prove that algorithm is stable and convergent
mathematically. But the rst local target functions didn't include any constraints and
that helps while build learning algorithm. (Fig.10.) shows the nal result of the
di erent characteristics of the learning processes.</p>
      <p>In the learning processes shown in (Fig.10). all rates were const. Coordinator
calculates the new V 21 value using 2. (Fig.5.) shows that value of target
function 2 doesn't change its value between 2000 and 3700 iterations. This
is due to the fact that ANN in the rst order has to stabilize the W 1 matrix
weight coe cients. This process depends on V 21 vector. When all the W 1
weights coe cients are stable , matrix W 2 then stabilizes its weight coe cient.
In this ANN, the rst layer played the most important role. The sub-networks
impact on the nal value of the rst layer's target function 1 is di erent.
There are components in which its impact is very small. This can be explained
by the hidden layer structure . The hidden layer includes structural neurons
redundancy. Finally, the coordination algorithm is analyzed. Learning rates 1
and 2 didn't achieve their maximum value. Probably the value of the learning
rate should be calculated using not only the relation between 1 and 2, but
also using their dynamic characteristics as the rst di erence</p>
      <p>1(n) = 1(n) 1(n 1) and 2(n) = 2(n) 2(n 1).</p>
      <p>This implies that coordinator should implement the PID controler algorithm.
2(n + 1) = 2(n) + 1
1(n) + 2 ( 1(n)
1(n
1))
(16)</p>
      <p>This two problems described above should be studied in the future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ch. M. Bishop</surname>
          </string-name>
          ,
          <source>Pattern Recognition and Machine Learning</source>
          , Springer Science + Business Media, LLC,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Findeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Szymanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wierzbicki</surname>
          </string-name>
          ,
          <article-title>Teoria i metody obliczeniowe optymalizacji</article-title>
          .
          <source>Panstwowe Wydawnictwo Naukowe</source>
          ,
          <year>Warszawa 1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Montana</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Davis Training Feed Forward Neural Networks Using Genetic Algorithms</article-title>
          . IJCAI Detroit,
          <year>Michigan 1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Osowski</surname>
          </string-name>
          ,
          <source>Sieci Neuronowe do Przetwarzania Informacji. O cyna Wydawnicza Politechniki Warszawskiej</source>
          ,
          <year>Warsaw 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Osowski</surname>
          </string-name>
          ,
          <article-title>Sieci neuronowe w ujeciu algorytmicznym</article-title>
          .
          <source>WNT</source>
          ,Warszawa
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Toshinori</given-names>
            <surname>Munakate</surname>
          </string-name>
          ,
          <source>Fundamentals of the New Arti cial Intelligence</source>
          .
          <source>Second Edition</source>
          , Springer
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fyle</surname>
          </string-name>
          ,
          <source>Arti cial Neural Networks and Information Theory , Department of Computing and information Systems</source>
          , The University of Paisley,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marciniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Korbicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kus</surname>
          </string-name>
          , Wstepne przetwarzanie danych,
          <source>Sieci Neuronowe</source>
          tom
          <volume>6</volume>
          ,
          <string-name>
            <surname>Akademicka</surname>
            <given-names>O</given-names>
          </string-name>
          cyna
          <string-name>
            <surname>Wydawnicza</surname>
            <given-names>EXIT</given-names>
          </string-name>
          ,
          <year>Warsaw 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mikrut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tadeusiewicz</surname>
          </string-name>
          ,
          <article-title>Sieci neuronowe w przetwarzaniu i rozpoznawaniu obrazow</article-title>
          ,
          <source>Sieci Neuronowe</source>
          tom
          <volume>6</volume>
          ,
          <string-name>
            <surname>Akademicka</surname>
            <given-names>O</given-names>
          </string-name>
          cyna
          <string-name>
            <surname>Wydawnicza</surname>
            <given-names>EXIT</given-names>
          </string-name>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Rabunal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Dorado</surname>
          </string-name>
          ,
          <article-title>Arti cial Neural Networks in Real-Life Applications</article-title>
          , Idea Group Publishing
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Placzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Adhikari</surname>
          </string-name>
          ,
          <article-title>Analysis of Multilayer Neural Networks with Direct</article-title>
          and
          <string-name>
            <surname>Cross-Forward</surname>
            <given-names>Connection</given-names>
          </string-name>
          ,
          <string-name>
            <surname>CS</surname>
          </string-name>
          &amp;P Conference in the University of Warsaw, Warsaw 2013
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marciniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Korbicz</surname>
          </string-name>
          , Neuronowe sieci modularne,
          <source>Sieci Neuronowe</source>
          tom
          <volume>6</volume>
          ,
          <string-name>
            <surname>Akademicka</surname>
            <given-names>O</given-names>
          </string-name>
          cyna
          <string-name>
            <surname>Wydawnicza</surname>
            <given-names>EXIT</given-names>
          </string-name>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zeng-Guang Hou.Madan M.Gupta</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter N. Nikiforuk</surname>
          </string-name>
          ,
          <article-title>Min Tan, and Long Cheng, A Recurrent Neural Network for Hierarchical Control of Interconnected Dynamic Systems</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          , Vol.
          <volume>18</volume>
          , No. 2,
          <string-name>
            <surname>March</surname>
          </string-name>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rutkowski</surname>
          </string-name>
          ,
          <article-title>Metody i techniki sztucznej inteligencji</article-title>
          , Wydawnictwo
          <string-name>
            <surname>Naukowe</surname>
            <given-names>PWN</given-names>
          </string-name>
          ,
          <year>Warsaw 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>M. D. Mesarocic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Macko</surname>
            , and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Takahara</surname>
          </string-name>
          ,
          <article-title>Theory of hierarchical multilevel systems</article-title>
          , Academic Press, New York and London,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>