<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Outliers Elimination for Error Correction Algorithm Improvement</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Janusz Kolbusz</string-name>
          <email>jkolbusz@wsiz.rzeszow.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pawel Rozycki</string-name>
          <email>prozycki@wsiz.rzeszow.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Information Technology and Management in Rzeszow</institution>
        </aff>
      </contrib-group>
      <fpage>223</fpage>
      <lpage>234</lpage>
      <abstract>
        <p>Neural networks are still very important part of artificial intelligence. RBF networks seems to be more powerfull than that based on sigmoid function. Error Correction is second order training algorithm dedicated for RBF networks. The paper proposes method for improvement this algorithm by elimination of inconsistent patterns. The approach is also experimentally confirmed.</p>
      </abstract>
      <kwd-group>
        <kwd>Error Correction</kwd>
        <kwd>ErrCor</kwd>
        <kwd>outliers</kwd>
        <kwd>RBF networks</kwd>
        <kwd>training algorithms</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Our civilization encounters increasingly complex problems that often exceeds human
capabilities. Until recently, the aim was to create artificial intelligence systems so
perfect, like a man. Currently, we are able to create intelligent learning systems exceeding
the intelligence of the people. For example, we can create a model and predict the
behavior of complex natural processes, which cannot be described mathematically. We can
also identify economic trends that are invisible to humans. In order to efficiently model
complex multidimensional nonlinear systems should be used unconventional methods.
For given multidimensionality and nonlinear nature, algorithmic or statistical
methods give unsatisfactory solutions. Methods based on computational intelligence allow
to more effectively address complex problems such as foreseeing of economic trends,
modeling natural phenomena, etc. To a greater extent than now harness the power of
this type of network, you must:
– understand the neural network architecture and its impact on the functioning of the
system and the learning process.
– find effective learning algorithms that allow faster and more effectively teach a
network using its properties.</p>
      <p>Both of problems are strictly connected.</p>
      <p>
        The commonly used network MLP(Multi-Layer Perception) have relatively
limited capacity[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It turns out that the new neural networks such as BMLP (Bridged
MLP)[
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] or DNN (Dual Neutral Networks) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with the same number of neurons area
to solve problems 10 or 100 times more complex [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ].
      </p>
      <p>
        A way of connecting neurons in the network is fundamental. For example, if you
combine 10 neurons in the most commonly used three-tiered architecture MLP (with
one hidden layer) the biggest problem that can be solved solve with such network is
the problem of Parity-9 type. If the same 10 neurons are connected in the FCC
architecture (Fully Connected Cascade), it is possible to solve the problem of Parity-1023
type. As can be seen a departure from the commonly used MLP architecture, while
maintaining the same number of neurons, increases network capacity, even a hundred
times [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ]. The problem is that the commonly known learning algorithms, such as EBP
(Error Back Propagation) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or LM (Levenberg-Marquardt), are not able to effective
train these new highly efficient architectures. It is important to note that not only
architecture, but also the training algorithm is needed to solve given problem. Currently, the
only algorithm that is able to teach the new architecture is the NBN (Neuron by
Neuron) published recently in [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ]. This algorithm can be used for all architectures with
arbitrally connected neurons, including BMLP and DNN. This algorithm works well
solving the problems impossible to solve by other algorithms.
      </p>
      <p>Already now we can build intelligent systems, such as artificial neural networks,
setting weights with random values initially, and then use an algorithm that will teach
this system adjusting these weights in order to solve complex problems. It is
interesting that such a system can achieve a higher level of competence than teachers. Such
systems can be very useful wherever decisions are taken, even if the man is not able to
understand the details of their actions. Neural networks helped solve thousands of
practical problems. Most scientists used the MLP and the EBP algorithm. However, since
the EBP algorithm is not efficient, usually using inflated the number of neurons which
meant that the network with a high degree of freedom to consume their capabilities to
learn the noise. Consequently, after the step of learning system was score responsive
to the patterns that are not used during the learning, and it resulted in frustration. A
new breakthrough in intelligent systems is possible due to new, better architectures and
better, more effective learning algorithms.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Training Algorithms</title>
      <p>
        Currently, the most effective and commonly known ANN training algorithms are
algorithms based on LM[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Unfortunately, the LM algorithm is not able to teach other
architectures than MLP. Because the size of Jacobian, which must be processed as
proportional to the number patterns of learning. It means that LM algorithm may be used
only for relatively small problems. Our newly developed second-order learning
algorithm NBN [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ] is even slightly faster than LM and allows to solve problems with a
virtually unlimited number of patterns, and it may very effectively teach new powerful
architecture of ANN, such as BMLP, FCC, whether DNN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Using the NBN we can
solve much more complex problems with more powerful system architectures.
      </p>
      <p>Training of RBF (Radial Basis Function) network with the second order algorithm
is even more complicated than training sigmoidal networks where are needed only to
adjusted weights. Our preliminary research shows that if we can teach widths and
locations of RBF centers it is possible to solve many problems in just a few units of RBF
instead of hundreds sigmoid neurons.</p>
      <p>
        The discovery of the EBP algorithm [
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        ] started a rapid growth of computational
intelligent systems. Thousands of practical problems have been solved with the help of
neural networks. Although other neural networks are possible, the main
accomplishments were noticed using feed forward neural networks using primarily MLP
architectures. Although EBP was a real breakthrough, this is not only a very slow algorithm,
but also it is not capable of training networks with super compact architectures [
        <xref ref-type="bibr" rid="ref1 ref6">1,6</xref>
        ].
Many improvements to the EBP were proposed, but most of them did not address the
main faults of EBP. The most noticeable progress was done with an adaptation of the
LM algorithm to neural network training [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The LM algorithm is capable of training
networks with 100 to 1000 fewer iterations. The above mentioned LM algorithm [
        <xref ref-type="bibr" rid="ref10 ref3">3,10</xref>
        ]
was adapted only for MLP architectures, and only relatively small problems can be
solved with this algorithm because the size of the computed Jacobian is proportional to
the number of training patterns multiplied by the number of network outputs. Several
years ago adapted the LM algorithm to train arbitrarily connected feed forward.
      </p>
      <p>
        ANN architectures [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], but still the problem of the number of patternâA˘ Z´s
limitations in the LM algorithm remained unsolved until recently when we developed the
NBN algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Now we have a tool which is not only very fast, but we can train
using second order algorithm problems with basically an unlimited number of patterns.
Also NBN algorithm can train compact close to optimal architectures which cannot be
trained by the EBP algorithm.
      </p>
      <p>
        Both technologies (SVM and ELM) are adjusting only parameters, which are easy
to adjust, like output weights, while other essential parameters such as radiuses of RBF
units σh, and the location of centers of the RBF units ch are either fixed or selected
randomly. As a consequence, the SVM and ELM algorithms are producing significantly
more networks than needed. From this experiment one may notice that the SVR
(Support Vector Regression) [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ] and the Incremental Extreme Learning Machine
(IELM) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and the Convex I-ELM (CI-ELM) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] need 30 to 100 more RBF units
than the NBN [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the ISO [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and the ErrCor [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] algorithms. Another advantage
of ErrCor is that there is no randomness in the learning process so only one learning
process is needed, while in the case of SVM (or SVR) a lengthy and tedious trial and
error process is needed before optimal training parameters are found.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Error Correction Algorithm Improvement</title>
      <sec id="sec-3-1">
        <title>Error Correction Fundamentals</title>
        <p>Error Correction (ErrCor) is the second order LM based algorithm that has been
designed for RBF networks where as neurons RBF units with Gaussian activation function
defined by (1) are used.</p>
        <p>ϕh (xp) = exp
−
kxp − chk2 !
σh
where: ch and σh are the center and width of RBF unit h, respectively. k·k represents
the computation of Euclidean Norm.
(1)</p>
        <p>
          The output of such network is given by:
where: wh presents the weight on the connection between RBF unit h and network
output. w0 is the bias weight of output unit. Note that the RBF networks can be
implemented using neurons with sigmoid activation function [
          <xref ref-type="bibr" rid="ref19 ref20">19,20</xref>
          ].
        </p>
        <p>
          The main idea of the ErrCor algorithm is increasing the number of RBF units one by
one and adjusting all RBF units in network by training after adding of each unit. New
unit is initially set to compensate largest error in the current error surface and after that
all units are trained changing both centers and widths as well as output weights. Details
of algorithm can be found in [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. As can be found in [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] ErrCor algorithm had
(2)
been successfully used to solve several problems like function approximation,
classification or forecasting. The main disadvantage of ErrCor algorithm is long computation
time caused mainly by requirement of training of whole network at each iteration.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>ErrCor Improvement</title>
        <p>
          Long computation time depends on many factors. One of the most important is number
of patterns used in training. We can reduce their number by removing from training
dataset outlier patterns that includes data inconsistent with rest of patterns. This
approach has been used in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] to eliminate patterns that contain unusual data like
hurricanes, political or criminal events. Such operation allows not only to reduce number of
patterns or time of training but also to improve training results achieving lower
training error and better generalization. The important issue is how to identify inconsistent
patterns (outliers). We suggest to remove patterns for which error has higher value than
Outlier Threshold (OT) that can be arbitrary assumed value. In our experiments OT was
current MERR (Mean Error) dependent value given by:
        </p>
        <p>OT = n ∗ M ERR
(3)
where n is typically in range (5-10).</p>
        <p>
          Removing of outliers can be done after adding to network several number of units.
Pseudo code of the enhanced ErrCor algorithm is shown below. Changes to original
ErrCor algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] are bolded.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Improved ErrCor pseudo code</title>
        <p>evaluate error of each pattern;
while 1</p>
        <p>C = pattern with biggest error;
add a new RBF unit with center = C;
train the whole network using ISO-based method;
evaluate error of each pattern;
calculate SSE = Sum of Squared Errors;
if SSE &lt; desired SSE</p>
        <p>break;
end;
after each N added RBF units remove outliers with error &gt; OT;
end</p>
        <p>
          Described mechanism has been successfully used in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] to improve training
process of RBF network for forecasting energy load. It allowed to achieve both better
training error and validation error as well as lower training time.
To confirm suggested approach several experiments with different dataset and
training parameters have been prepared. The first experiment was approximation of noised
Schwefel function. Noised function has been built by adding random values to about
20% of randomly selected Shwefel function samples. Original and noised function is
shown in Figure 2. In presented experiments number 503 of 2500 samples ware noised.
Such created data has been divided into training and testing datasets in the ratio of 4
to 1, to give 2000 training and 500 testing patterns. First, the training process to has
been prepared using original ErrCor algorithm and next repeated for different values of
parameter OT (from 1.5 to 4.65) and parameter N (5 and 10). In all experiments
number of RBF units have been limited to 30. Results contain training MSE (Mean Square
Error) and testing MSE are shown in Table 1.
        </p>
        <p>Results show that outliers removing allow to achieve better results than original
ErrCor algorithm. Best testing MSE for Improved ErrCor have been achieved for OT=3.5
for both N=5 and N=10. Similarly, best training MSE for both N values have been
achieved for the same value OT=1.5. This is because for lower value of OT much more
patterns are removed during training process that causes better training. Table 2 shows
the number of removed patterns during experiments. As can be observed for OT=5
number of removed patterns are higher than number of noised samples. Moreover, for
best results with OT=3.5 number of removed patterns are lower than number of noised
samples. Note, that for both values of N reaching the same value of OT=4.61 no outliers
have been detected and removed that means results the same like for original ErrCor.
Figure 3 shows training and testing process for original ErrCor. Training error is
assigned by blue stars and testing error is assigned by red circles. It can be observed that
best result is reached very quickly on the level of about 0.035 for both training and
OT
1.5
2.5
2
3
3.5
4
4.5
4.6
testing datasets. Figures 4-7 show training process for selected values of OT and both
analyzed N. As can be observer the training error is changing abruptly with removing
of patterns while testing error is decreasing rather slowly but in the wake of changes of
training error. The interesting results have been achieved for OT = 3 where both
training and validating errors are relatively high and very close to each other. It means that
for some values of OT training process can falls into local minimum and is not able to
reach better results. This is especially visible in the case of N=5, where achiever result
is not significantly better than for original ErrCor. In this case only 29 outliers have
been removed during training process that was too small to eliminate noised patterns.
On the second case, for N=10, better results have been obtained only for larger RBF
network reaching testing MSE = 0.004294 and training MSE as low as 0.000046.
Fig. 5. The learning process modified algorithm ErrCor: a) OT=2.5, N=5, b) OT=2.5,
N=10
Fig. 6. The learning process modified algorithm ErrCor: a) OT=3, N=5, b) OT=3, N=10</p>
        <p>In the second experiment have been used real world datasets from UCI Machine
Learning Repository commonly used as a benchmarks, such as Airplane Delay,
Machine CPU, Auto Price, California Housing. For each dataset results of original ErrCor
algorithm has been compared to discussed modified ErrCor with parameters OT=5 and
N=5. Results of these experiments are shown in Figure 8 and Figure 9. Again, blue stars
for given number of RBF units are a training MSE, red circles are testing MSE. As can
be observed outliers eliminating allows to reach better results for smaller number of
units also for real world datasets.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>
        The paper presents proposition of improvement for Error Correction algorithm by
elimination of inconsistent patterns from training process. Achieved experimental results
confirm effectiveness of proposed method that was originally suggested in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
Mentioned effectiveness depends however on the content of processed dataset and will be
higher for more noisy data with more random corrupted data, that will be easy
eliminated. Further work in this area will be focused on improvement proposed approach by
searching a way that allow to find optimal training parameters for given dataset, as well
as applying presented method for other training algorithms such as ELM or NBN.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>"Learning representations by backpropagating errors"</article-title>
          ,
          <source>Nature</source>
          , vol.
          <volume>323</volume>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>536</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Dahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ranzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>"Phone recognition with the meancovariance restricted Boltzmann machine"</article-title>
          ,
          <source>NIPS</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>"Learning deep architectures for AI"</article-title>
          ,
          <source>Foundations and Trends in Machine Learning</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>127</lpage>
          .
          <article-title>Also published as a book</article-title>
          .
          <source>Now Publishers</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Challenges in Applications of Computational Intelligence in Industrial Electronics"</article-title>
          ,
          <source>IEEE International Symposium on Industrial Electronics (ISIE</source>
          <year>2010</year>
          ),
          <source>JUL 04- 07</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Neural Network Architectures and Learning algorithms- How Not to Be Frustrated with Neural Networks"</article-title>
          ,
          <source>IEEE Industrial Electronics Magazine</source>
          , vol
          <volume>3</volume>
          , no 4, pp.
          <fpage>56</fpage>
          -
          <lpage>63</lpage>
          , (
          <year>2009</year>
          )
          <article-title>(best paper award).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Ciresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Meier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>"Deep big simple neural nets excel on handwritten digit recognition"</article-title>
          ,
          <source>CoRR</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>"Neural Network Learning Without Backpropagation,"</article-title>
          <source>IEEE Trans. on Neural Networks</source>
          , vol.
          <volume>21</volume>
          , no.
          <volume>11</volume>
          ,
          <fpage>pp1793</fpage>
          -
          <lpage>1803</lpage>
          , Nov.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Werbos</surname>
          </string-name>
          ,
          <article-title>"Back-propagation: Past and Future"</article-title>
          .
          <source>Proceeding of International Conference on Neural Networks</source>
          , San Diego, CA,
          <volume>1</volume>
          ,
          <fpage>343</fpage>
          -
          <lpage>354</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hao</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Pukish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kolbusz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.M.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Selection of Proper Neural Network Sizes and Architectures: Comparative Study"</article-title>
          ,
          <source>IEEE Trans. on Industrial Informatics</source>
          , vol.
          <volume>8</volume>
          , May
          <year>2012</year>
          , pp.
          <fpage>228</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>K. L. Lang</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          <string-name>
            <surname>Witbrock</surname>
          </string-name>
          ,
          <article-title>"Learning to Tell Two Spirals Apart"</article-title>
          <source>Proceedings of the 1988 Connectionists Models Summer School</source>
          , Morgan Kaufman.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. S. E. Fahlman and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lebiere</surname>
          </string-name>
          ,
          <article-title>"The cascade-correlation learning architecture"</article-title>
          , In D. S. Touretzky (ed.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>2</volume>
          . Morgan Kaufmann, San Mateo, CA,
          <year>1990</year>
          , pp.
          <fpage>524</fpage>
          -
          <lpage>532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          , Statistical Learning Theory. New York: Wiley,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Smola</surname>
            and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Scholkopf</surname>
          </string-name>
          ,
          <article-title>"A tutorial on support vector regression"</article-title>
          ,
          <source>NeuroCOLT2 Tech. Rep. NC2-TR-1998-030</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>G.-B. Huang; L. Chen; C.-K. Siew</surname>
          </string-name>
          ,
          <article-title>"Universal approximation using incremental constructive feedforward networks with random hidden nodes"</article-title>
          ,
          <source>IEEE Transactions on Neural Network</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>879</fpage>
          -
          <lpage>892</lpage>
          ,
          <year>July 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>G. B. Huang</surname>
            and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>"Convex incremental extreme learning machine"</article-title>
          ,
          <source>Neurocomputing</source>
          , vol.
          <volume>70</volume>
          , no.
          <fpage>16</fpage>
          -
          <issue>18</issue>
          , pp.
          <fpage>3056</fpage>
          -
          <lpage>3062</lpage>
          , Oct.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>B. M. Wilamowski</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>"Improved Computation for Levenberg Marquardt Training,"</article-title>
          <source>IEEE Trans. on Neural Networks</source>
          , vol.
          <volume>21</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>930</fpage>
          -
          <lpage>937</lpage>
          ,
          <year>June 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewlett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rozycki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Fast and Efficient Second Order Method for Training Radial Basis Function Networks"</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <year>2012</year>
          , Vol.
          <volume>24</volume>
          ,
          <issue>Issue</issue>
          : 4, pp.
          <fpage>609</fpage>
          -
          <lpage>619</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bartczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"An Incremental Design of Radial Basis Function Networks"</article-title>
          <source>IEEE Trans. on Neural Networks and Learning Systems</source>
          , vol
          <volume>25</volume>
          , No.
          <volume>10</volume>
          ,
          <year>Oct 2014</year>
          , pp.
          <fpage>1793</fpage>
          -
          <lpage>1803</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>B. M. Wilamowski</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          <string-name>
            <surname>Jaeger</surname>
          </string-name>
          ,
          <article-title>"Implementation of RBF type networks by MLP networks"</article-title>
          ,
          <source>1996 IEEE International Conference on Neural Networks (ICNN 96)</source>
          , pp.
          <fpage>1670</fpage>
          -
          <lpage>1675</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>X. Wu</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          <string-name>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"Advantage analysis of sigmoid based RBF networks"</article-title>
          ,
          <source>Proceedings of the 17th IEEE International Conference on Intelligent Engineering Systems (INES'13)</source>
          .
          <year>2013</year>
          . p.
          <fpage>243</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>C. Cecati</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kolbusz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rozycki</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Siano</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wilamowski</surname>
          </string-name>
          ,
          <article-title>"A Novel RBF Training Algorithm for Short-Term Electric Load Forecasting and Comparative Studies"</article-title>
          ,
          <source>IEEE Trans. on Ind. Electronics, Eearly Access</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>