<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhanced Error Correction Algorithm for RBF Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pawel Rozycki</string-name>
          <email>prozycki@wsiz.rzeszow.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janusz Kolbusz</string-name>
          <email>jkolbusz@wsiz.rzeszow.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Information Technology and Management in Rzeszow</institution>
        </aff>
      </contrib-group>
      <fpage>120</fpage>
      <lpage>129</lpage>
      <abstract>
        <p>Using RBF units in neural networks are very interesting option that make network more powerful. The paper presents new training algorithm based on second order ErrCor algorithm. The effectiveness of proposed algorithm has been confirmed by several experiments.</p>
      </abstract>
      <kwd-group>
        <kwd>Error Correction</kwd>
        <kwd>ErrCor</kwd>
        <kwd>RBF networks</kwd>
        <kwd>training algorithms</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The rapid development of intelligent computational systems allowed to solve
thousands of practical problems using neural networks. Major achievements have been made
mainly using architecture MLP (Multi-Layer Perceptron), but it turns out that it is also
possible with other neural network architectures. Although EBP (Error Back
Propagation) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] caused a real breakthrough, it turned out to be a very slow algorithm, not
capable of learning other than MLP, compact network architectures. Most visible progress
in this field was develop the LM (Levenberg-Marquardt) algorithm to train the neural
network. This algorithm is able to teach the network by 100 to 1000 times less
iterations, but its usage to more complex problems is significantly limited, since the size of
the Jacobian matrix is proportional to the number of patterns.
      </p>
      <p>In order to solve more and more complex problems with the use of neuron networks
we should thoroughly understand the neural network architecture and its impact on
the operation of the system and finally develop appropriate processes of learning these
networks. Modification of existing algorithms and development of new algorithms for
network learning will allow for faster and more effective network teaching.</p>
      <p>
        Often used networks MLP have limited capabilities[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], but new neural network
architectures like BMLP (Bridged MLP) [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] or DNN (Dual Neutral Networks) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with
the same number of neurons can solve problems up to 100 times more complex [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ].
Therefore, it can be concluded that the way neurons interconnections in the network is
fundamental.
      </p>
      <p>
        The use of appropriate architecture has a significant impact on the solution of given
problem. An example can be FCC (Fully Connected Cascade) network architecture.
Such a network with 10 neurons can solve the Parity-1023 problem, while the most
widely used the MLP architecture network with 10 neurons in the three-tiered, one
hidden layer, architecture, is able to solve Parity-9 problem. Thus, moving away from
the commonly used architecture MLP, while maintaining the same number of neurons
can increase network capacity, even a hundred times. [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ]. However, a problem arises
in that the currently known network learning algorithms, such as EBP [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or LM do
not deal with such network architectures. LM algorithm is not able to teach other
architectures than the MLP, because the size of Jacobian, which must be processed is
proportional to the number patterns of learning, which limits LM algorithm for
solving network learning a relatively small problems. The only known algorithm that can
learn these new architectures is NBN algorithm (Neuron-by-Neuron) [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ]. It is faster
than LM and can be used for all architectures, including BMLP, FCC DNN and MLP,
of course, and gives good learning results. However, published in 2012 ISO algorithm
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and published in 2014 ErrCor (Error Correction) algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].allow to get even
better results
ϕh (xp) = exp
−
kxp − chk2 !
      </p>
      <p>σh</p>
      <p>H
Op = X whϕh (xp) + wo</p>
      <p>h=1
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Enhanced Error Correction Algorithm</title>
      <sec id="sec-2-1">
        <title>Error Correction Fundamentals</title>
        <p>
          Error Correction is second order LM based algorithm that has been designed for RBF
networks where as neurons RBF units with Gaussian activation function defined by (1)
are used.
(1)
(2)
where: ch and σh are the center and width of RBF unit h, respectively. k·k represents
the computation of Euclidean Norm. The output of such network is given by:
where: wh presents the weight on the connection between RBF unit h and network
output. w0 is the bias weight of output unit. Note that the RBF networks can be
implemented using neurons with sigmoid activation function in MLP architecture [
          <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
          ].
The main idea of the ErrCor algorithm is increasing the number of RBF units one by
one and adjusting all RBF units in networkby training after adding of each unit. New
unit is initially set to compensate largest error in the current error surface and after that
all units are trained changing both centers and widths as well as output weights. Details
of algorithm can be found in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          As can be found in [
          <xref ref-type="bibr" rid="ref10 ref13">10, 13</xref>
          ] ErrCor algorithm had been successfully used to solve
several problems like function approximation, classification or forecasting. The main
disadvantage of ErrCor algorithm is long computation time caused mainly by
requirement of training of whole network at each iteration.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Enhanced ErrCor</title>
        <p>
          Long computation time depends on many factors. One of the most important is number
of patterns used in training and long training of whole network after adding of next
RBF unit. In order of improve this process we suggest the following modifica-tions of
ErrCor algorithm:
– after adding new RBF unit only this new unit is trained using LM-based method
used in ErrCor algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and after that all output weights are justified using
regression;
– after added N new RBF units whole network is trained using the same LM-based
method used in ErrCor algorithm where N is arbitrary assigned value.
        </p>
        <p>Such modification allow to shortened training process because critical whole
training process is limited to cases when N new units are added to network. In the other cases
the training is much faster because in fact trained is only one RBF unit and regression
is quite small absorbing process.</p>
        <p>
          Pseudo code of the enhanced ErrCor algorithm is shown below. Changes to original
ErrCor algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] are bolded.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Enhanced ErrCor pseudo code</title>
        <p>evaluate error of each pattern;
while 1</p>
        <p>C = pattern with biggest error;
add a new RBF unit with center = C;
if N new RBF units are added</p>
        <p>train the whole network using LM-based method;
else
train only one new added RBF unit using LM-based method;
adjust output weights for whole network by regression
end
evaluate error of each pattern;
calculate SSE = Sum of Squared Errors;
if SSE &lt; desired SSE</p>
        <p>break;
end</p>
        <p>end;</p>
        <p>In the next section some experimental results for this approach is presented.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments Results</title>
      <p>To confirm suggested approach several experiments for different approximation
benchmark functions and training parameters have been prepared. The following functions
have been selected: Peaks function, Second Shaffer function and Shwefel function. In
the next three subsections the ErrCor algorithm and the Enhanced ErrCor algorithm
have been used to solve approximation problem of mentioned functions. In all
experiments 900 training patterns and 3481 testing patterns have been generated. For such
prepared data series experiments have been done with different values of parameter N
and compared to results achieved using original ErrCor algorithm. To prepare
experiments Matlab 2009b software with Windows 7 64 on Intel Core i5-M560 CPU and 8GB
platform was used.
3.1</p>
      <sec id="sec-3-1">
        <title>Shwefel Function</title>
        <p>First experiment was prepared for Shwefel function given by
z (x, y) = 2 ∗ 418.9829−xsin
p|x| −ysin
p|y|
(3)
shown in Figure 2.</p>
        <p>Results achieved for Shwefel function are shown in Table 3. Result for original
ErrCor that can be treated as a reference is denoted as OrgErrCor. Parameter N means
the number of units that are added to network between full training. The case when
training process is done without full network training is denoted as X in column N. The
RMSE is Root Mean Square Error given by:</p>
        <p>RM SE =
s</p>
        <p>Pin=1 (outT − outE )
n
2
where outT is the output of trained network and outE is expected value and n is the
number of patterns.</p>
        <p>As shown in Table 1 training time decreases with increased value of N. This is
ob-vious because frequency of full training, that is the most time consuming part of
training process is smaller for higher N. More important is that values of testing and
training RMSE for small values of N (2 and 3) are better than these achieved with
original ErrCor, and for higher value of N are only slightly worse. Note that results for
N=10 are only 53% worse but achieved almost 5 times faster.
(5)</p>
        <p>Results achieved with Enhanced Error Correction algorithm is shown in Table 2.
Similarly like for Shwefel function training time decreases with N while RMSE is
relatively are close to or even lower than for original ErrCor.
Fig. 4. Training process for approximation of Second Shaffer function with: (a) original
ErrCor algorithm, (b) Enhanced ErrCor (N=2)
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Peaks Function</title>
        <p>The last experiment with described Enhanced Error Correction algorithm has been used
for approximation of Peaks function given by:
z(x, y) = − 310 e(−1−6x−9x2−9y2)+
− 0.6x − 27x3 − 243y5 e(−9x2−9y2)
+ 0.3 − 1.8x + 2.7x2 e(−1−6y−9x2−9x2)
(6)
and shown in Figure 5.</p>
        <p>Results achieved for this function in the same way like for previous functions are
shown in Table 3. Unfortunately, they are not so clear like for previous functions. While
training time decreases with N in the same time RMSE values increase.</p>
        <p>Examples of training process by original ErrCor and Enhanced ErrCor with N=3
is presented in Figure 6. As can be observed the full network training is seen as a
rapid RMSE decreasing while adding and training of one RBF unit initially produces
similar effect but later does not decrease RMSE. In the case when N value is higher than
maximal number of units in the network the training is limited to adding new units and
training then one-by-one without full network training. Such training process is shown
in Figure 7. Note that starting from 14th unit added to network RMSE values are not
decreasing. This is because each new unit added to network is localized ac-cording to
the pattern with the highest error and in these case each new unit, starting from 14th, is
initially localized in the same place.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Achieved results confirm effectiveness of suggested method for improvement Er-ror
Correction algorithm that is currently one of the most powerful for training RBF
networks. Proposed modification allows to reduce training time in most cases without
losses of low training and testing errors. Further work will be focused on
improvement of proposed algorithm by correction of method for selection of initial localization
for new RBF units and on applying described algorithm for wider spectrum of functions
and real world classification datasets from UCI Machine Learning Repository.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>"Learning representations by backpropagating errors,"</article-title>
          <source>Nature</source>
          , vol.
          <volume>323</volume>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>536</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Fahlman</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lebiere</surname>
          </string-name>
          ,
          <article-title>"The cascade-correlation learning architecture"</article-title>
          . In D. S. Touretzky (ed.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>2</volume>
          . Morgan Kaufmann, San Mateo, CA,
          <year>1990</year>
          , pp.
          <fpage>524</fpage>
          -
          <lpage>532</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>K. L. Lang</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          <string-name>
            <surname>Witbrock</surname>
          </string-name>
          ,
          <article-title>"Learning to Tell Two Spirals Apart"</article-title>
          .
          <source>Proceedings of the 1988 Connectionists Models Summer School</source>
          , Morgan Kaufman.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Challenges in Applications of Computational Intelligence in Industrial Electronics"</article-title>
          ,
          <source>IEEE International Symposium on Industrial Electronics (ISIE</source>
          <year>2010</year>
          ),
          <source>Jul 04-07</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>"Learning deep architectures for AI"</article-title>
          .
          <source>Foundations and Trends in Machine Learning</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>127</lpage>
          .
          <article-title>Also published as a book</article-title>
          .
          <source>Now Publishers</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Ciresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Meier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.M.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>"Deep big simple neural nets excel on handwritten digit recognition"</article-title>
          ,
          <source>CoRR</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>"Neural Network Learning Without Backpropagation,"</article-title>
          <source>IEEE Trans. on Neural Networks</source>
          , vol.
          <volume>21</volume>
          , no.
          <volume>11</volume>
          ,
          <fpage>pp1793</fpage>
          -
          <lpage>1803</lpage>
          , Nov.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Werbos</surname>
          </string-name>
          ,
          <article-title>"Back-propagation: Past and Future"</article-title>
          .
          <source>Proceeding of International Conference on Neural Networks</source>
          , San Diego, CA,
          <volume>1</volume>
          ,
          <fpage>343</fpage>
          -
          <lpage>354</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewlett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rozycki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Fast and Efficient Second Order Method for Training Radial Basis Function Networks"</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <year>2012</year>
          , Vol.
          <volume>24</volume>
          ,
          <issue>Issue</issue>
          : 4, pp.
          <fpage>609</fpage>
          -
          <lpage>619</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bartczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"An Incremental Design of Radial Basis Function Networks"</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          , vol
          <volume>25</volume>
          , No.
          <volume>10</volume>
          ,
          <year>Oct 2014</year>
          , pp.
          <fpage>1793</fpage>
          -
          <lpage>1803</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>B. M. Wilamowski</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          <string-name>
            <surname>Jaeger</surname>
          </string-name>
          ,
          <article-title>"Implementation of RBF type networks by MLP networks"</article-title>
          ,
          <source>IEEE International Conference on Neural Networks (ICNN 96)</source>
          , pp.
          <fpage>1670</fpage>
          -
          <lpage>1675</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>X. Wu</surname>
            ,
            <given-names>B.M. Wilamowski</given-names>
          </string-name>
          <article-title>"Advantage analysis of sigmoid based RBF networks"</article-title>
          .
          <source>In: Proceedings of the 17th IEEE International Conference on Intelligent Engineering Sys-tems (INES'13)</source>
          .
          <year>2013</year>
          . p.
          <fpage>243</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. Cecati</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kolbusz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rozycki</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Siano</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"A Novel RBF Training Algorithm for Short-Term Electric Load Forecasting and Comparative Studies"</article-title>
          ,
          <source>IEEE Trans. on Ind. Electronics, Eearly Access</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>