=Paper= {{Paper |id=Vol-1139/paper4 |storemode=property |title=Performance Analysis of the Activation Neuron Function in the Flexible Neural Tree Model |pdfUrl=https://ceur-ws.org/Vol-1139/paper4.pdf |volume=Vol-1139 |dblpUrl=https://dblp.org/rec/conf/dateso/BurianekB14 }} ==Performance Analysis of the Activation Neuron Function in the Flexible Neural Tree Model== https://ceur-ws.org/Vol-1139/paper4.pdf
     Performance Analysis
     Performance   Analysis of
                            of the
                               the Activation
                                   Activation Neuron
                                              Neuron
       Function in the Flexible  Neural Tree Model
      Function in the Flexible Neural Tree Model

                        Tomáš Buriánek and Sebastián Basterrech
                        Tomáš Buriánek and Sebastián Basterrech
                                    IT4Innovation
                                    IT4Innovation
                          VŠB–Technical  University of Ostrava,
                          VŠB–Technical
                                    CzechUniversity
                                          Republic of Ostrava,
                                    Czech Republic
             {Tomas.Burianek.St1,Sebastian.Basterrech.Tiscordio}@vsb.cz
             {Tomas.Burianek.St1,Sebastian.Basterrech.Tiscordio}@vsb.cz


           Abstract. The time series prediction and forecasting is an important
           area in the field of Machine Learning. Around ten years ago, a kind of
           Multilayer Neural Network was introduced under the name of Flexible
           Neural Tree (FNT). This model uses meta-heuristic techniques to deter-
           minate its topology and its embedded parameters. The FNT model has
           been successfully employed on time-series modeling and temporal learn-
           ing tasks. The activation function used in the FNT belongs to the family
           of radial basis functions. It is a parametric function and the parameters
           are set employing an heuristic procedure. In this article, we analyze the
           impact on the performance of the FNT model when it used other fam-
           ily of neuron activation functions. For that, we use the hyperbolic tan-
           gent and Fermi activation functions on the tree nodes. Both functions
           have been extensively used in the field of Neural Networks. Moreover,
           we study the FNT technique with a linear variant of the Fermi function.
           We present an experimental comparison of our approaches on two widely
           used time-series benchmarks.

           Keywords: Neural Network, Flexible Neural Tree, Neuron Activation
           Function, Time-series modeling, Forecasting


    1    Introduction
    The Flexible Neural Tree (FNT) have been used for several time-series problems
    as learning predictor. The model consists of an interconnected nodes forming a
    tree architecture [9, 10]. There are two kind of nodes, functional and terminal
    nodes. The terminal nodes contains the information of the input patterns. The
    functional nodes process the information using a specific activation function.
    The parameters of the model are: the weight connections among the nodes, the
    parameters in the activation function and the pattern of connectivity on the
    tree. The method combines two heuristic algorithms in order to find theses pa-
    rameters. Several bio-inspired methods can be used as meta-heuristic technique
    to find the topology of the tree such as: Probabilistic Incremental Program Evo-
    lution (PIPE), Genetic Programing (GP), Ant Programming (AP). In order to
    find the embedded parameters, it can be used: Genetic Algorithms (GA), Par-
    ticle Swarm Optimization (PSO) and Differential Evolution (DE), and so on.


J. Pokorný, K. Richta, V. Snášel (Eds.): Dateso 2014, pp. 35–46, ISBN 978-80-01-05482-6.
36      Tomáš Buriánek, Sebastián Basterrech

The first FNT model was developed with a Gaussian activation function in the
functional nodes.
    In this paper we analyze the FNT performance when another kind of activa-
tion functions is used in the nodes. We study the nodes with hyperbolic tangent
and Fermi activation function, both family of functions were extensively studied
in the field of Neural Networks. Additionally, we test the model with a linear
variant of the Fermi activation neurons. We present an experimental comparison
of the different activation functions on two widely used time-series benchmarks.
The first one is a simulated benchmark commonly used in the forecasting liter-
ature called 20-order fixed NARMA data set. The another one is a real data set
about the Internet traffic from an European Internet service provider.
    The paper is organized as follows. In Section 2, we present a description
of the Flexible Neural Tree model. Section 3 contains the description of other
activation functions. Next, we present our experimental results. Finally, the last
part presents the article conclusion.


2     The Flexible Neural Tree model description
About ten years ago a new kind of Neural Network (NN) was introduced under
the name of Flexible Neural Tree (FNT) [9, 10]. A FNT is a multilayer feed-
forward NN with an irregular topology. In the original FNT model each neuron
has a parametric activation function. The network architecture is designed in an
automatic way considering a pre-defined set of instructions and functional oper-
ators. The automatic process involves the parameters of the activation function,
defines the pattern of connectivity among the units and selects the input vari-
ables. An evolutionary procedure is used to evaluate the performance of the
tree topology. In temporal learning tasks is hard to select the proper input vari-
ables, the FNT technique uses an automatic procedure for this selection. Another
meta-heuristic algorithm is employed to find the neural tree parameters. The
FNT model proposes to use a simple random search method for setting the tree
parameters. For this task can be use some of the following techniques:Particle
Swarm Optimization (PSO) [13, 19] and Differential Evolution (DE) [16, 20]. In
the pioneering FNT approach was used the Probabilistic Incremental Program
Evolution (PIPE) [17] for encoding the NN in a tree [9]. In the literature other
techniques were studied to find the topology and tree parameters, such that the
Genetic Programing (GP) [4–6, 8] and Ant Programming (AP) [?].

2.1   The tree construction
The topology of the tree is generated using a pre-defined instruction set. We
consider the following function set F = {+2 , +3 , . . . , +Nf }. A node operator +i
is an internal vertex instruction with i inputs. In the original FNT, the activation
neuron function of any unit i is the following parametric function:
                                                          x−ai 2
                                                           bi )
                                                                                (1)
                                                     −(
                               f (ai , bi , x) = e                 ,
                 Performance Analysis of the Activation Neuron Function . . .        37

where the parameters ai and bi are adjustable parameters. The terminal set of
the tree consists of T = {x1 , x2 , . . . , xNt }. The instruction set S is defined as
follows:
               S = F ∪ T = {+2 , +3 , . . . , +Nf } ∪ {x1 , x2 , . . . , xNt }.    (2)
Each functional node +i has i random nodes as its inputs, which can be inter-
nal and terminal nodes. The model outputs can be computed using a depth-
first strategy for traversing the tree. Given a functional node +i with inputs
{x1 , . . . , xi }, the total input charge of +i is defined as:

                                            X
                                            i
                                  neti =          wj x j ,                         (3)
                                            j=1

where wj is the weight connection between xj and +i . The output of the node
+i is obtained using the expression (1)
                                                 neti −ai 2
                                                         )
                                                                                   (4)
                                            −(
                                 outi = e           bi
                                                              .

Let i be a functional node if some of its inputs are other functional nodes, then
the output of i is computed in a recursive way following the expressions (3)
and (4). The algorithm complexity for traversing the tree is O(N ) algorithmic
time where N is the number of unit in the tree. Figure 1 illustrates an example
of an FNT.

2.2   The fitness function
In order to evaluate the accuracy of the FNT model in learning tasks an error
distance is considered. In this context, this performance measure is called fitness
function. In a supervised learning problem given a training set {(x(t), ytarget (t)), t =
1, . . . , T }, the goal is to infer a mapping ϕ(·) in order to predict ytarget , such
that the fitness function is minimized. Most often is used the Mean Square Error
(MSE) or Root Mean Square Error (RMSE) [10], given by the expression:

                                  1X
                                      T
                        M SE =          (y(t) − ytarget (t))2 .                    (5)
                                  T t=1

Another important parameter of the model performance is the number of nodes
in the tree. Obviously, between two trees with equal accuracy the smaller one is
preferred.

2.3   The parameter optimization using meta-heuristic techniques
Several strategies have been used to find “good” parameters and topology [4,
7–10]. The model parameters embedded in the tree are the activation function
parameters (ai and bi , for all +i ) and the weight connections. In this paper,
we use Genetic Programming (GP) for finding a “good” connectivity among the
 38       Tomáš Buriánek, Sebastián Basterrech


                                             +7




   x3          x4             +3        x1             x5        +5             x7




          x3             +3        x2             x1        +4             x6             +2        x3




          x5        x6        x2             x1        x2   x4        x6             x3        x4


Fig. 1: Example of a Flexible Neural Tree with the function instruction
set F = {+2 , +3 , +4 , +5 , +6 , +7 } and the terminal instruction set T =
{x1 , x2 , x3 , x4 , x5 , x6 , x7 }. The root in the tree (+7 ) is the output layer and the
input layer is composed by the leaves in the bottom level of the tree.




neurons in the tree and we use the PSO method to find the embedded tree
parameters.
    The GP theory was intensely developed in the 90s [15]. A GP algorithm
starts defining an initial population of same specific devices. The procedure is
iterative, at each epoch it transforms a selected group of individuals producing a
new generation. This transformation consists in applying some bio-inspired rules
to the individuals. In our problem the individuals are the flexible trees. A set
of individuals are probabilistically selected and a set of genetic rules is applied.
The operating rules arise from some biological genetic operations, which basi-
cally consist of: reproduction, crossover and mutation. The reproduction is the
identity operation, an individual i of a generation at time t is also presented at
the generation at time t + 1. Given two sub-tree the crossover consists in inter-
changing their parents. The mutation consists in selecting a tree and realizing
one of the following operations: to change a leaf node to another leaf node, to
replace a leaf node for a sub-tree, and to replace a functional node by a leaf node.
The GP algorithms applied for our specific problem is described in Algorithm 1.
    There are two kinds of adjustable parameters on the tree: the activation
function parameters ai and bi for each functional node i presented in the expres-
sion (1), another one refers to the weight connections between the nodes in the
tree. In this paper, we use Particle Swarm Optimization (PSO) [13] for finding
these parameters. The PSO algorithm is an evolutionary computation technique
                        Performance Analysis of the Activation Neuron Function . . .      39


 Algorithm 1: Specification of the Genetic Programming algorithm used
 for finding the optimal flexible tree topology.
   Inputs : Nt , Nf , training data set, algorithm parameters
   Outputs: Tree
 1 Initialize the population of trees;
 2 Compute the fitness function for all trees;
 3 while (Termination criterion is not satisfied) do
       // Creation of the new population
 4     while (Size of population is not satisfied) do
 5         Select genetic operation;
 6         if (Selection is crossover) then
 7             Select two trees from population using tournament selection;
 8             Realize the crossover operation;
 9             Insert the two new offspring into the new population;
10            if (Selection is mutation) then
11                Select one tree from population using tournament selection;
12                Realize the mutation operation;
13                Insert the new offspring into the new population;
14            if (Selection is reproduction) then
15                Select one tree from population;
16                Insert it to the new population;
17            Compute the fitness function of the new trees;
18        Replace old population by the new population;
19     Return the tree with best fitness function.



based on the social behaviors in a simplified social environment. A swarm is a
set of particles, which are characterized by their position and their velocity in
a multidimensional space. We denote the position of a particle i with the col-
                         (i)         (i)
umn Nx -vector x(i) = (x1 , . . . , xNx ). The velocity of i is defined by the column
                           (i)      (i)
Nx -vector ve(i) = (v1 , . . . , vNx ). Besides, we use auxiliary vectors p∗ and p(i) ,
∀i, each one has dimension Nx × 1. The vector p(i) denotes the best position of
i presented until the current iteration. The best swarm position is represented
                                                                        (i)     (i)
by the vector p∗ . Besides, we use two auxiliary random weights r1 and r2 of
dimensions Nx × 1, which are randomly initialized in [0, 1] for each particle i. At
any iteration t, the simulation of the dynamics among the particles is given by
the following expressions [18]:

 (i)              (i)                     (i)    (i)                         (i)
vj (t+1) = c0 vj (t)+c1 r1j (t)(i) (pj (t)−xj (t))+c2 r2j (t)(i) (p∗j (t)−xj (t)), j ∈ [1, Nx ]
                                                                                     (6)
and
                   (i)              (i)     (i)
                  xj (t + 1) = xj (t) + vj (t + 1), j ∈ [1, Nx ],                    (7)
40       Tomáš Buriánek, Sebastián Basterrech

where the constant c0 is called the inertia weight the constants c1 and c2 regulate
local and global position of the swarm, respectively.
    In order to use PSO for estimating the embedded tree parameters, the po-
sition p(i) of the particle i is associated with the embedded parameters in one
flexible tree (the weights and aj , bj , ∀j ∈ F). The PSO algorithm return the
global optimal position according to the fitness function, presented in the expres-
sion (5). The relationship between the tree parameters and the particle position
is given by:
                     (i)          (i)
                   (p1 , . . . , pNx ) = (a1 , . . . , aNf , b1 , . . . , bNf , w), (8)
where w is a vector with the tree weights.
   We present in the Algorithm 2 the PSO method used as meta-heuristic tech-
nique for finding the tree parameters.


    Algorithm 2: Specification of the Particle Swarm Optimization used for
    finding the embedded tree parameters.
      Inputs : Nx , number of particles, S, training data set, algorithmic
                 parameters
      Outputs: Tree parameters
 1 t = 0;
 2 Random initialization of p(i) (t) and v(i) (t) for all i;
 3 Compute the fitness value associated with i using (8) and the fitness
   function;
 4 Set p (t) and p (t) for all i;
         ∗           (i)

 5 while (Termination criterion is not satisfied) do
 6    for (Each particle i) do
 7         Compute v(i) (t + 1) using the expression (6);
 8         Compute x(i) (t + 1) using the expression (7);
 9         Compute the fitness value associated with i using (8) and the
           fitness function;
10         Compute p(i) (t + 1);
11      Compute p∗ (t + 1);
12      t=t+1;
13   Return the parameters using p∗ (t) and the expression (8);




3     Performance of other neuron activation functions in
      the nodes of the tree

In this paper we analyze the impact of other neuron activation function in the
performance of the FNT model. The original model uses a Gaussian function
                 Performance Analysis of the Activation Neuron Function . . .           41

presented in the expression (4). This kind of function has been widely used in
Support Vector Machine (SVM) where the model uses the radial basis function
(RBF) kernel [11]. Additionally it has been used in Self-Organizing Maps (SOM)
as neighbourhood function among the neurons on the Kohonen networks [14]. In
the area of Neural Network two kind of activation function have been exten-
sively used: the tanh(·) function and the Fermi function. The sigmoid activation
function is
                                             ex − e−x
                           f (x) = tanh(x) = x        .                      (9)
                                             e + e−x
The Fermi activation function is
                                                 1
                                   f (x) =            .                             (10)
                                              1 + e−x
In this paper we test the FNT model using the tanh(·) and Fermi function.
Moreover, we analyze a parametric variation of these functions, given by:

                                 g(x) = ai f (x) + bi ,                             (11)

where f (x) is the function of expression (9) and (10) and the parameters ai and
bi are specifics for each tree node i. We use PSO in order to find the parameters
ai and bi .


4     Empirical results
4.1    Description of the benchmarks
We use two benchmarks, a simulated data set which has been widely used in the
forecasting literature and a real data set about the Internet traffic data.
(1) Fixed kth order NARMA data set. This data set presents a high non-
    linearity, for this reason has been extensively analyzed in the literature [1,3],
                                        X
                                        k−1
        b(t + 1) = 0.3b(t) + 0.05b(t)         b(t − i) + 1.5s(t − (k − 1))s(t) + 0.1,
                                        i=0

    where k = 20 and s(t) ∼ U nif [0, 0.5]. The task consists to predict the value
    b(t + 1) based on the history of b(t) up to time t. The size of the training
    data was 3980 and the test data numbered 780.
(2) The Internet traffic data from United Kingdom Education and Research Net-
    working Association (UKERNA). The data was collected from 19 November
    and 27 January, 2005. This data set was analyzed in [2, 12]. The problem
    was studied collecting the data in five minute scale, The goal is to pre-
    dict the traffic at time t using the information of the Traffic in the time
    {t − 1, t − 2, t − 3, t − 4, t − 5, t − 6}. This sliding window was studied in [12].
    The training data corresponds to the initial 66% of the data. The size of the
    training set is the 13126 and the test set has 6762 patterns. We normalized
    the data in [0, 1].
42     Tomáš Buriánek, Sebastián Basterrech

4.2   Results

We can see in the two graphics of Figure 2 the performance of the FNT prediction
for the NARMA test data set using Gaussian activation function and Fermi
activation function. Figure 3 shows the prediction of the model for the NARMA
data set using the Gaussian activation function. In this last case we present
the estimation on the last 200 samples. A example of estimation of the FNT
using Gaussian is showed in the left figure of 4. The right figure of 4 shows
the estimation of the model when the activation function is Fermi function. A
example of prediction of the FNT model using the Fermi activation function for
the Internet traffic data set is illustrated in Figure 5. We can see in Figure 4 the
FNT estimation with the Fermi activation and the Gaussian activation function
are very close. The Fermi function is not parametric, then the PSO algorithm
is used only to estimate the weights in the tree. As a consequence the FNT
model with Fermi function is faster in the training process than the FNT with
exponential parametric function. The accuracy for the Internet Traffic data using
the FNT technique and Fermi activation function was better than when we used
the Gaussian activation function. Table 1 shows a comparison of the accuracy
of the model using the different kinds of activation function in the nodes.


                 Function        Narma      Internet Traffic data
                 Gaussian    2.25646 × 10−3    10.0642 × 10−5
                  tanh(·)    3.47218 × 10−3
                                               10.2117 × 10−5
              Fermi function 2.37082 × 10−3
                                               8.67417 × 10−5
               Linear Fermi 3.23204 × 10 −3
                                               9.95649 × 10−5

Table 1: Accuracy of the FNT models for different kind of activation function
in the tree nodes. The first and second column show the MSE obtained by the
model, the error is presented using a scientific notation. First row refers the
function used in the original FNT. The last three rows refer to the functions 9,
10 and 11.




5     Conclusions and future work

Ten years ago, the Flexible Neural Tree (FNT), a specific type of Neural Network
was presented. The method has proved to be a very powerful tool for time series
processing and forecasting problems. The model uses meta-heuristic techniques
for defining the topology of the tree and for finding “good” parameters in learning
tasks. The FNT uses activation function with exponential form in the nodes.
In this paper we analyze the performance of the model with another family of
function in its nodes, we studied the hyperbolic tangent and the Fermi functions.
We tested the performance in two widely used benchmark data: a simulated and
                                      Performance Analysis of the Activation Neuron Function . . .                                             43

a real data set. The results are promising, specifically for the FNT model that
uses Fermi function its the nodes. In future works, we will use statistical tests for
comparing the different approaches presented here, as well as we will compare
our results with other forecasting techniques.



                            FNT prediction with Gaussian function                                   FNT prediction with Fermi function




                                                                                       0.40
               0.40




                                                                                       0.35
               0.35




                                                                                       0.30
               0.30




                                                                                       0.25
               0.25
  prediction




                                                                          prediction

                                                                                       0.20
               0.20




                                                                                       0.15
               0.15




                                                                                       0.10
               0.10




                                                                                       0.05




                      0.1             0.2            0.3            0.4                       0.1           0.2             0.3          0.4

                                            target                                                                target




Fig. 2: Example of the FNT prediction for the NARMA data set. Both figures
show the estimation of the last 80 time units of the test data set. The left
figure shows the FNT prediction using the Gaussian activation function. The
right figure illustrates the estimation of the model using the Fermi activation
function. The red line corresponds the identity function.




Acknowledgments

This article has been elaborated in the framework of the project New creative
teams in priorities of scientific research, reg. no. CZ.1.07/2.3.00/30.0055, sup-
ported by Operational Program Education for Competitiveness and co-financed
by the European Social Fund and the state budget of the Czech Republic. Addi-
tionally, this work was partially supported by the Grant of SGS No. SP2014/110,
VŠB - Technical University of Ostrava, Czech Republic, and was supported by
the European Regional Development Fund in the IT4Innovations Centre of Ex-
cellence project (CZ.1.05/1.1.00/02.0070) and by the Bio-Inspired Methods: re-
search, development and knowledge transfer project, reg. no. CZ.1.07/2.3.00/20.0073
funded by Operational Programme Education for Competitiveness, co-financed
by ESF and state budget of the Czech Republic.
44                                  Tomáš Buriánek, Sebastián Basterrech


                                                                                        FNT prediction with Gaussian function




                                                                          0.4
                                                          Narma dataset

                                                                          0.3
                                                                          0.2
                                                                          0.1




                                                                                700       720             740                                 760            780

                                                                                                        time




Fig. 3: FNT prediction using the Gaussian activation function on the NARMA
data set. The estimation was realized on the last 200 time units of the test data.
The red line is the prediction o the data and the black line is the target data.


                                          FNT prediction with Gaussian function                                                                       FNT prediction with Fermi function
                       0.5




                                                                                                                                 0.5
                       0.4




                                                                                                                                 0.4
     Traffic dataset




                                                                                                               Traffic dataset
                       0.3




                                                                                                                                 0.3
                       0.2




                                                                                                                                 0.2




                             6550        6600           6650                     6700     6750                                         6550         6600           6650            6700    6750

                                                          time                                                                                                       time




Fig. 4: FNT prediction on the Internet traffic dataset. Both figures show the
estimation of the last 200 time units of the test data set. The left figure shows
the prediction with the Gaussian activation function (red line). The right figure
illustrates the estimation of the model (blue line) using the Fermi activation
function. In both cases the black line is the target data.



References
 1. Sebastián Basterrech, Colin Fyfe, and Gerardo Rubino. Self-Organizing Maps and
    Scale-Invariant Maps in Echo State Networks. In Intelligent Systems Design and
                Performance Analysis of the Activation Neuron Function . . .                45


                                              FNT prediction with Gaussian function




                                  0.5
                                  0.4
                     prediction

                                  0.3
                                  0.2




                                        0.2              0.3            0.4           0.5

                                                               target




Fig. 5: FNT prediction using the Gaussian activation function on the Internet
traffic data set. The estimation was realized on the last 80 time units of the test
data set. The red line is the identity function.




    Applications (ISDA), 2011 11th International Conference on, pages 94–99, nov.
    2011.
 2. Sebastián Basterrech and Gerardo Rubino. Echo State Queueing Network: a new
    Reservoir Computing learning tool. IEEE Consumer Comunications & Networking
    Conference (CCNC’13), pages 118–123, January 2013.
 3. Sebastián Basterrech and Václav Snášel. Initializing Reservoirs With Exhibitory
    And Inhibitory Signals Using Unsupervised Learning Techniques. In International
    Symposium on Information and Communication Technology (SoICT), pages 53–60,
    Danang, Viet Nam, December 2013. ACM Digital Library.
 4. Yuehui Chen, Ajith Abraham, and Bo Yang. Feature selection and classification
    using flexible neural tree. Neurocomputing, 70(1-3):305–313, 2006.
 5. Yuehui Chen, Ajith Abraham, and Bo Yang. Hybrid flexible neural-tree-based in-
    trusion detection systems. International Journal of Intelligent Systems, 22(4):337–
    352, 2007.
 6. Yuehui Chen, Ajith Abraham, and Ju Yang. Feature selection and intrusion detec-
    tion using hybrid flexible neural tree. In Jun Wang, Xiao-Feng Liao, and Zhang Yi,
    editors, Advances in Neural Networks, volume 3498 of Lecture Notes in Computer
    Science, pages 439–444. Springer Berlin Heidelberg, 2005.
 7. Yuehui Chen, Lizhi Peng, and Ajith Abraham. Exchange rate forecasting using
    flexible neural trees. In Jun Wang, Zhang Yi, JacekM. Zurada, Bao-Liang Lu,
    and Hujun Yin, editors, Advances in Neural Networks - ISNN 2006, volume 3973
    of Lecture Notes in Computer Science, pages 518–523. Springer Berlin Heidelberg,
    2006.
46      Tomáš Buriánek, Sebastián Basterrech

 8. Yuehui Chen, Bo Yang, and Ajith Abraham. Flexible neural trees ensemble for
    stock index modeling. Neurocomputing, 70(4-6):697–703, 2007.
 9. Yuehui Chen, Bo Yang, and Jiwen Dong. Nonlinear System Modelling Via Optimal
    Design of Neural Trees. International Journal of Neural Systems, 14(02):125–137,
    2004.
10. Yuehui Chen, Bo Yang, Jiwen Dong, and Ajith Abraham. Time-series forecasting
    using flexible neural tree model. Inf. Sci., 174(3-4):219–235, August 2005.
11. Corinna Cortes and Vladimir Vapnik. Support-Vector Networks. Mach. Learn.,
    20(3):273–297, September 1995.
12. P. Cortez, M. Rio, M. Rocha, and P. Sousa. Multiscale Internet traffic forecasting
    using Neural Networks and time series methods. Expert Systems, 2012.
13. J. Kennedy and R. Eberhart. Particle Swarm Optimization. In Neural Networks,
    1995. Proceedings., IEEE International Conference on, volume 4, pages 1942–1948,
    1995.
14. Teuvo Kohonen. Self-Organizing Maps. Springer Series in Information Sciences,
    third edition, 2001.
15. John R. Koza, Forrest H. Bennett III, and Oscar Stiffelman. Genetic Programming
    as a Darwinian Invention Machine. In Riccardo Poli, Peter Nordin, WilliamB.
    Langdon, and TerenceC. Fogarty, editors, Genetic Programming, volume 1598 of
    Lecture Notes in Computer Science, pages 93–108. Springer Berlin Heidelberg,
    1999.
16. Kenneth Price, Rainer M. Storn, and Jouni A. Lampinen. Differential Evolution: A
    Practical Approach to Global Optimization (Natural Computing Series). Springer-
    Verlag New York, Inc., Secaucus, NJ, USA, 2005.
17. Rafal Salustowicz and Jürgen Schmidhuber. Probabilistic incremental program
    evolution. Evolutionary Computation, 5(2):123–141, 1997.
18. Yuhui Shi and R. Eberhart. A modified particle swarm optimizer. In Evolutionary
    Computation Proceedings, 1998. IEEE World Congress on Computational Intelli-
    gence., The 1998 IEEE International Conference on, pages 69–73, 1998.
19. Yuhui Shi and RussellC. Eberhart. Parameter Selection in Particle Swarm Op-
    timization. In V.W. Porto, N. Saravanan, D. Waagen, and A.E. Eiben, editors,
    Evolutionary Programming VII, volume 1447 of Lecture Notes in Computer Sci-
    ence, pages 591–600. Springer Berlin Heidelberg, 1998.
20. Rainer Storn and Kenneth Price. Differential Evolution – A Simple and Efficient
    Heuristic for global Optimization over Continuous Spaces. Journal of Global Op-
    timization, 11(4):341–359, 1997.