1 Introduction

Enhanced Error Correction Algorithm for RBF Neural Networks

Pawel Rozycki

prozycki@wsiz.rzeszow.pl 0

Janusz Kolbusz

jkolbusz@wsiz.rzeszow.pl 0 0 University of Information Technology and Management in Rzeszow

120 129

Using RBF units in neural networks are very interesting option that make network more powerful. The paper presents new training algorithm based on second order ErrCor algorithm. The effectiveness of proposed algorithm has been confirmed by several experiments.

Error Correction ErrCor RBF networks training algorithms

1 Introduction

The rapid development of intelligent computational systems allowed to solve thousands of practical problems using neural networks. Major achievements have been made mainly using architecture MLP (Multi-Layer Perceptron), but it turns out that it is also possible with other neural network architectures. Although EBP (Error Back Propagation) [ 1 ] caused a real breakthrough, it turned out to be a very slow algorithm, not capable of learning other than MLP, compact network architectures. Most visible progress in this field was develop the LM (Levenberg-Marquardt) algorithm to train the neural network. This algorithm is able to teach the network by 100 to 1000 times less iterations, but its usage to more complex problems is significantly limited, since the size of the Jacobian matrix is proportional to the number of patterns.

In order to solve more and more complex problems with the use of neuron networks we should thoroughly understand the neural network architecture and its impact on the operation of the system and finally develop appropriate processes of learning these networks. Modification of existing algorithms and development of new algorithms for network learning will allow for faster and more effective network teaching.

Often used networks MLP have limited capabilities[ 1 ], but new neural network architectures like BMLP (Bridged MLP) [ 1,2 ] or DNN (Dual Neutral Networks) [ 2 ] with the same number of neurons can solve problems up to 100 times more complex [ 2,3 ]. Therefore, it can be concluded that the way neurons interconnections in the network is fundamental.

The use of appropriate architecture has a significant impact on the solution of given problem. An example can be FCC (Fully Connected Cascade) network architecture. Such a network with 10 neurons can solve the Parity-1023 problem, while the most widely used the MLP architecture network with 10 neurons in the three-tiered, one hidden layer, architecture, is able to solve Parity-9 problem. Thus, moving away from the commonly used architecture MLP, while maintaining the same number of neurons can increase network capacity, even a hundred times. [ 2-4 ]. However, a problem arises in that the currently known network learning algorithms, such as EBP [ 5 ], or LM do not deal with such network architectures. LM algorithm is not able to teach other architectures than the MLP, because the size of Jacobian, which must be processed is proportional to the number patterns of learning, which limits LM algorithm for solving network learning a relatively small problems. The only known algorithm that can learn these new architectures is NBN algorithm (Neuron-by-Neuron) [ 6-8 ]. It is faster than LM and can be used for all architectures, including BMLP, FCC DNN and MLP, of course, and gives good learning results. However, published in 2012 ISO algorithm [ 9 ] and published in 2014 ErrCor (Error Correction) algorithm [ 10 ].allow to get even better results ϕh (xp) = exp − kxp − chk2 !

σh

H Op = X whϕh (xp) + wo

h=1 2 2.1

Enhanced Error Correction Algorithm Error Correction Fundamentals

Error Correction is second order LM based algorithm that has been designed for RBF networks where as neurons RBF units with Gaussian activation function defined by (1) are used. (1) (2) where: ch and σh are the center and width of RBF unit h, respectively. k·k represents the computation of Euclidean Norm. The output of such network is given by: where: wh presents the weight on the connection between RBF unit h and network output. w0 is the bias weight of output unit. Note that the RBF networks can be implemented using neurons with sigmoid activation function in MLP architecture [ 11,12 ]. The main idea of the ErrCor algorithm is increasing the number of RBF units one by one and adjusting all RBF units in networkby training after adding of each unit. New unit is initially set to compensate largest error in the current error surface and after that all units are trained changing both centers and widths as well as output weights. Details of algorithm can be found in [ 10 ].

As can be found in [ 10, 13 ] ErrCor algorithm had been successfully used to solve several problems like function approximation, classification or forecasting. The main disadvantage of ErrCor algorithm is long computation time caused mainly by requirement of training of whole network at each iteration. 2.2

Enhanced ErrCor

Long computation time depends on many factors. One of the most important is number of patterns used in training and long training of whole network after adding of next RBF unit. In order of improve this process we suggest the following modifica-tions of ErrCor algorithm: – after adding new RBF unit only this new unit is trained using LM-based method used in ErrCor algorithm [ 10 ] and after that all output weights are justified using regression; – after added N new RBF units whole network is trained using the same LM-based method used in ErrCor algorithm where N is arbitrary assigned value.

Such modification allow to shortened training process because critical whole training process is limited to cases when N new units are added to network. In the other cases the training is much faster because in fact trained is only one RBF unit and regression is quite small absorbing process.

Pseudo code of the enhanced ErrCor algorithm is shown below. Changes to original ErrCor algorithm [ 10 ] are bolded.

Enhanced ErrCor pseudo code

evaluate error of each pattern; while 1

C = pattern with biggest error; add a new RBF unit with center = C; if N new RBF units are added

train the whole network using LM-based method; else train only one new added RBF unit using LM-based method; adjust output weights for whole network by regression end evaluate error of each pattern; calculate SSE = Sum of Squared Errors; if SSE < desired SSE

break; end

end;

In the next section some experimental results for this approach is presented. 3

Experiments Results

To confirm suggested approach several experiments for different approximation benchmark functions and training parameters have been prepared. The following functions have been selected: Peaks function, Second Shaffer function and Shwefel function. In the next three subsections the ErrCor algorithm and the Enhanced ErrCor algorithm have been used to solve approximation problem of mentioned functions. In all experiments 900 training patterns and 3481 testing patterns have been generated. For such prepared data series experiments have been done with different values of parameter N and compared to results achieved using original ErrCor algorithm. To prepare experiments Matlab 2009b software with Windows 7 64 on Intel Core i5-M560 CPU and 8GB platform was used. 3.1

Shwefel Function

First experiment was prepared for Shwefel function given by z (x, y) = 2 ∗ 418.9829−xsin p|x| −ysin p|y| (3) shown in Figure 2.

Results achieved for Shwefel function are shown in Table 3. Result for original ErrCor that can be treated as a reference is denoted as OrgErrCor. Parameter N means the number of units that are added to network between full training. The case when training process is done without full network training is denoted as X in column N. The RMSE is Root Mean Square Error given by:

RM SE = s

Pin=1 (outT − outE ) n 2 where outT is the output of trained network and outE is expected value and n is the number of patterns.

As shown in Table 1 training time decreases with increased value of N. This is ob-vious because frequency of full training, that is the most time consuming part of training process is smaller for higher N. More important is that values of testing and training RMSE for small values of N (2 and 3) are better than these achieved with original ErrCor, and for higher value of N are only slightly worse. Note that results for N=10 are only 53% worse but achieved almost 5 times faster. (5)

Results achieved with Enhanced Error Correction algorithm is shown in Table 2. Similarly like for Shwefel function training time decreases with N while RMSE is relatively are close to or even lower than for original ErrCor. Fig. 4. Training process for approximation of Second Shaffer function with: (a) original ErrCor algorithm, (b) Enhanced ErrCor (N=2) 3.3

Peaks Function

The last experiment with described Enhanced Error Correction algorithm has been used for approximation of Peaks function given by: z(x, y) = − 310 e(−1−6x−9x2−9y2)+ − 0.6x − 27x3 − 243y5 e(−9x2−9y2) + 0.3 − 1.8x + 2.7x2 e(−1−6y−9x2−9x2) (6) and shown in Figure 5.

Results achieved for this function in the same way like for previous functions are shown in Table 3. Unfortunately, they are not so clear like for previous functions. While training time decreases with N in the same time RMSE values increase.

Examples of training process by original ErrCor and Enhanced ErrCor with N=3 is presented in Figure 6. As can be observed the full network training is seen as a rapid RMSE decreasing while adding and training of one RBF unit initially produces similar effect but later does not decrease RMSE. In the case when N value is higher than maximal number of units in the network the training is limited to adding new units and training then one-by-one without full network training. Such training process is shown in Figure 7. Note that starting from 14th unit added to network RMSE values are not decreasing. This is because each new unit added to network is localized ac-cording to the pattern with the highest error and in these case each new unit, starting from 14th, is initially localized in the same place. 4

Conclusions

Achieved results confirm effectiveness of suggested method for improvement Er-ror Correction algorithm that is currently one of the most powerful for training RBF networks. Proposed modification allows to reduce training time in most cases without losses of low training and testing errors. Further work will be focused on improvement of proposed algorithm by correction of method for selection of initial localization for new RBF units and on applying described algorithm for wider spectrum of functions and real world classification datasets from UCI Machine Learning Repository.

D. E.

Rumelhart ,

G. E.

Hinton , and

R. J.

Williams , "Learning representations by backpropagating errors," Nature , vol. 323 , pp. 533 - 536 , 1986 .

S. E.

Fahlman and

Lebiere , "The cascade-correlation learning architecture" . In D. S. Touretzky (ed.) Advances in Neural Information Processing Systems 2 . Morgan Kaufmann, San Mateo, CA, 1990 , pp. 524 - 532

3. K. L. Lang , M.J. Witbrock , "Learning to Tell Two Spirals Apart" . Proceedings of the 1988 Connectionists Models Summer School , Morgan Kaufman.

B. M.

Wilamowski , "Challenges in Applications of Computational Intelligence in Industrial Electronics" , IEEE International Symposium on Industrial Electronics (ISIE 2010 ), Jul 04-07 , 2010 , pp. 15 - 22 .

Bengio , "Learning deep architectures for AI" . Foundations and Trends in Machine Learning , 2 ( 1 ), 1 - 127 . Also published as a book . Now Publishers , 2009 .

D. C.

Ciresan ,

Meier ,

L.M.

Gambardella , and

Schmidhuber , "Deep big simple neural nets excel on handwritten digit recognition" , CoRR , 2010 .

B. M.

Wilamowski and

Yu , "Neural Network Learning Without Backpropagation," IEEE Trans. on Neural Networks , vol. 21 , no. 11 , pp1793 - 1803 , Nov. 2010 .

P. J.

Werbos , "Back-propagation: Past and Future" . Proceeding of International Conference on Neural Networks , San Diego, CA, 1 , 343 - 354 , 1988 .

Yu ,

Xie ,

Hewlett ,

Rozycki ,

Wilamowski , "Fast and Efficient Second Order Method for Training Radial Basis Function Networks" , IEEE Transactions on Neural Networks , 2012 , Vol. 24 , Issue : 4, pp. 609 - 619

10.

Yu ,

Reiner ,

Xie ,

Bartczak ,

Wilamowski , "An Incremental Design of Radial Basis Function Networks" , IEEE Transactions on Neural Networks and Learning Systems , vol 25 , No. 10 , Oct 2014 , pp. 1793 - 1803 .

11. B. M. Wilamowski , R. C. Jaeger , "Implementation of RBF type networks by MLP networks" , IEEE International Conference on Neural Networks (ICNN 96) , pp. 1670 - 1675

12. X. Wu , B.M. Wilamowski "Advantage analysis of sigmoid based RBF networks" . In: Proceedings of the 17th IEEE International Conference on Intelligent Engineering Sys-tems (INES'13) . 2013 . p. 243 - 248 .

13. C. Cecati , J.

Kolbusz , P.

Rozycki , P.

Siano , B.

Wilamowski , "A Novel RBF Training Algorithm for Short-Term Electric Load Forecasting and Comparative Studies" , IEEE Trans. on Ind. Electronics, Eearly Access , 2015 .