Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)


 Training feedforward neural networks using hybrid
      particle swarm optimization, Multi-Verse
                    Optimization
                                                                     *


            1st Rabab Bousmaha                       2nd Reda Mohamed Hamou                           3rd Amine Abdelmalek
            GeCoDe Laboratory                            GeCoDe Laboratory                             GeCoDe Laboratory
      Department of Computer Science               Department of Computer Science                Department of Computer Science
            University of Saida                          University of Saida                           University of Saida
              Saida, Algeria                               Saida, Algeria                                Saida, Algeria
        Rabab.bousmaha@gmail.com                        hamoureda@yahoo.fr                            amineabd1@yahoo.fr


   Abstract—The learning process of artificial neural networks           value of weights and biases and the tendency to be trapped
is an important and complex task in the supervised learning              in local minima [1].To address these problems, stochastic
field. The main difficulty of training a neural network is the           search methods, such as metaheuristics have been proposed as
process of fine-tuning the best set of control parameters in terms
of weight and bias. This paper presents a new training method            alternative methods for training feedforward neural network.
based on hybrid particle swarm optimization with Multi-Verse             Metaheuristics have many advantages: they apply to any type
Optimization (PMVO) to train the feedforward neural networks.            of ANN with any activation function [2], are particularly useful
The hybrid algorithm is utilized to search better in solution            for dealing with large complex problems that generate many
space which proves its efficiency in reducing the problems of            local optima [3] [4]. Genetic algorithm (GA) and Particle
trapping in local minima. The performance of the proposed
approach was compared with five evolutionary techniques and              Swarm Optimization (PSO) considered as the most well-
the standard momentum backpropagation and adaptive learning              known nature inspired MLP trainers. Montana and Davis
rate. The comparison was benchmarked and evaluated using                 proposed one of the earliest works on training the feedforward
six bio-medical datasets. The results of the comparative study           neural network (FFNN) with GA [22]. They showed that GA
show that PMVO outperformed other training methods in most               outperform BP when solving real problems.Slowik and Bialko
datasets and can be an alternative to other training methods.
   Index Terms—Particle swarm optimization, Multi-Verse Op-              [23] employed Differential Evolution (DE) for training MLP
timization, Training feedforward neural networks, Real world             and showed that it has promising performance compared to
Datasets                                                                 BP and Levenberg-Marquardt methods.
                                                                            Others metaheuristics algorithms have been applied in train-
                       I. I NTRODUCTION                                  ing feedforward MLP, such as the modified BAT [5], Multi-
   Artificial neural network (ANN) is one of the most impor-             Verse Optimization MVO [6], Whale Optimization Algorithm
tant data mining techniques. It has been successfully applied            (WOA) [7], Grey Wolf Optimizer (GWO) [8] [9], Biogeogra-
to many fields. The feedforward multilayer perceptron (MLP)              phy Based on Optimizer (BBO) [10], Moth-Flame Optimiza-
is one of the best-known neural networks. The multilayer per-            tion (MFO) [11] and Improved Monarch Butterfly Optimiza-
ceptron (MLP) consists of three layers composed of neurons               tion (IMBO) [12]. Furthermore, several hybrid algorithms have
organized into input, output and hidden layers. The success              been proposed to train a neural network. Tarkhaneh and Shen
of an MLP generally depends on the training process that is              [13] suggested a hybrid approach to neural network training
determined by training algorithms. The objective of the train-           by combining PSO, Mantegna Levy flight and neighbor search
ing algorithms is to find the best connection between weights            (LPSONS). The comparison experiments showed that the
and biases that minimize the classification error. Training              proposed algorithm can find optimal results. Khan et al [14]
algorithms can be classified into two classes: gradient-based            introduced a new method based on two algorithms, accelerated
and stochastic search methods. Backpropagation (BP) and its              particle swarm optimization (APSO) and cuckoo search (CS),
variants are gradient-based methods and considered as one of             named HACPSO. The comparison results demonstrated that
the most popular techniques used to train the MLP neural                 the proposed algorithm outperforms other algorithms in term
network. Gradient-based methods have many drawbacks, such                of accuracy, MSE and standard deviation. This paper presents
as the slow convergence, the high dependency on the initial              a new training approach based on hybrid particle swarm opti-
                                                                         mization (PSO) with Multi-Verse Optimization (MVO), called
                                                                         PMVO, to train the feedforward neural network (FFNN). Six
                                                                                max    min
                                                                                   −w
datasets were solved by the proposed trainer. Moreover, the         wmax −( (w maxiteration
                                                                                         )∗iteration
                                                                                                     ),C1 , C2 two positive constants
                                                                                                                               t+1
application of the trainer was investigated in bio-medical. The     , R1 ,R2 are random numbers in the interval of [0, 1], Vi,j    is
performance of PMVO was compared with five well-known               the velocity of jth member of ith particle at iteration number
trainer metaheuristics algorithms in the literature: PSO [15],      (t) and (t+1). The new position values X t+1 are obtained by
MFO [11], MVO [6], WOA [7], HACPSO [14].                            adding the velocity updates determined by the formula given
                                                                    in Equation (5).
   II. ARTIFICIAL NEURAL NETWORKS (ANN S )
   An artificial neural network (ANN) is a computational                      IV. M ULTI - VERSE OPTIMIZATION (MVO)
model based on the structure and functions of the biological           Multiverse optimization proposed by Syed Ali mirjalili in
brain and nervous system. The feedforward neural network            2015 [17] As Inspired by the concepts of white holes, black
(FFNN) is one of the most popular types of artificial neural        holes, and wormholes in the multi-verse theory and big bang
network [6].FFNN has three interconnected layers. The first         theory. In this algorithm, the models of these three concepts are
layer consists of input neurons. These neurons send the data        developed to perform exploration and exploitation and local
to the second layer, called the hidden layer, which sends the       search. The fitness function for each search agent is indicated
output neurons to the third layer. In FFNN, the information         by the inflation rate, and each object and each universe in the
travels in one direction, from the input layer to the output        search agent represent a candidate solution and a variable in
layer. The node or the artificial neuron multiplies each of these   the candidate solution.
inputs by weight, as shown in (1):                                     In this algorithm, the larger universes tend to send objectives
                            n
                            X                                       to smaller universes. A large universe is defined based on
                     Sj =         wi,j Ii + βj               (1)    inflation rate in the multi-verse theory. The following rules
                            i=1                                     are applied to the universes of the MVO:
  where, n is the total number of neuron inputs, W ij is the           • If the inflation rate rate is higher, the probability of having
connection weight connecting Ij to neuron j and βj is a bias              a white hole is higher.
weight [6]. Then, the node or the artificial neuron adds the           • If the inflation rate rate is higher, the probability of having
multiplications and sends the sum to a transfer function, for             black holes is lower.
example, Sigmoid function presented in (2):                            • Universes having higher inflation rate rate send the ob-

                                 1                                        jects through white holes.
                      f (x) =                             (2)          • Universes having lower inflation rate rate tend to receive
                              1 + e−x
                                                                          more objects through black holes.
The output of the neuron j can be described as follows (3):
                                                                       • The objects of all universes may be replaced by the
                               n
                               X                                          objects of the universe with the greater inflation rate.
                   yj = fj (         wi,j Ii + βj )          (3)       The mathematical model of this algorithm is as follows:
                               i=1

After building the neural network, the set of network weights
                                                                             
are adjusted to approximate the desired results. This process                 xj + T DR + ((ubj − lbj ) ∗ r4 + lbj ),if r3 < 0.5
                                                                                                                                   ,if r2 < W EP
is carried out by applying a training algorithm to adapt the           xji =    xj − T DR + ((ubj − lbj ) ∗ r4 + lbj ),if r3 ≥ 0.5
                                                                                                     j
weights until error criteria are met [6].                                                           xi , if r2 ≥ W EP
                                                                             
                                                                    (6)
       III. PARTICLE S WARM O PTIMIZATION (PSO)
   in 1995 Russell Eberhart and James Kennedy have invented
the particle swarm optimization which is a population-based          Where xj indicates the jth variable in the bests universe,
stochastic optimization technique inspired by birds flocking        lbi indicates the lower bound in jth variable, ubi shows the
around food sources. like each other evolutionary computa-          upper bound in jth variable, r2 ,r3 , r4 are random numbers
tional algorithms. In PSO, each individual is a bird in the         in the interval of [0, 1], T DP/W EP are coefficients, and xji
search space. We call it a particle. All of the particles have      indicates the jth parameter in ith universe.
fitness values which are evaluated by the fitness function to
be optimized and flies in the space with a velocity which is                            V. H YBRID PSO-MVO:
dynamically adjusted according to its own flying experience            Hybrid PSO-MVO is sequential combination of PSO and
[16].                                                               MVO. The algorithm merges the best strength of both PSO
                                                                    in exploitation and MVO in exploration towards the optimum
  t+1     t                                                         solution when the universe value of MVO replace the Pbest
Vi,j  = Vi,j W + C1 R1 (P bestt − X t ) + C2 R2 (Gbestt − X t )     value of PSO [20] [21]. In this paper we propose a novel
                                                           (4)      training algorithm based on this algorithm for the first time in
 X t+1 = X t + V t+1 − i = 1, 2...N P )And(j = 1, 2...N G)          the following section.The equation can be written as follows:
                                                           (5)
                                                                      t+1     t
where P bestt and Gbestt denote the best particle position          Vi,j  = Vi,j W + C1 R1 (U niversest − X t ) + C2 R2 (Gbestt − X t )
and best group position and w is inertia weight w =                                                                          (7)
   Step 1: Initialize the PSO values                              of PMVO was carried out with five approaches used to train
Step 2: Evaluate the fitness function of each particle            feedforward neural network in the literature: PSO [15], MFO
Step 3: Determine Gbest from the Pbest value                      [11], MVO [6], WOA [7], HACPSO [14] In addition, the
Step 4: updated velocity and position values of each particule    proposed algorithm was compared with standard momentum
Step 5: verify the solution whether it is feasible or not         Back-Propagation and adaptive learning rate and (BP), which
Step 6: steps 2 to 5 were repeated until the maximum number       are gradient-based algorithms.
of iterations was reached.
Step 9: Use the optimal solutions of PSO as boundary to                           VIII. E XPERIMENTAL SETUP
MVO algorithm                                                        The proposed trainer and other algorithms were imple-
Step 10: Initialize the MVO values                                mented with Python language and a personal computer with
Step 11: Evaluate the inflation rate of the universe (fitness     Intel(R) Core(TM) CPU 1.60 GHz 2.30 GHz, 64 Bits Windows
function)                                                         7 operating system and 4 GB (RAM).
Step 12: Update the position of the universes                        The metaheuristics are sensitive to the value of their pa-
Step 13: if the convergence criterion is reached; get the         rameters, which requires a careful initialization. Therefore, the
results                                                           control parameters recommended in the literature were used
Step 14: if the convergence criterion is not reached; continue    [15] [6] [7] and summarized in Table II. All datasets were
the process from step 11-14                                       divided into 66% for training and 34% for testing. Moreover,
                                                                  all features were mapped to the interval [0, 1] to eliminate
                                                                  the effect of features that have different scales. Min-max
               VI. PMVO FOR TRAINING MLP
                                                                  normalization is applied to perform a linear transformation
   This section presents the proposed approach based on the       on the original data, were v 0 is the normalized value of v in
PMVO to train the MLP network named PMVO.Two impor-               the range [minA , maxA ] as given in (10).
tant points are taken into consideration: the fitness function
                                                                                               vi − minA
and the representation of the PMVO solutions. In this work,                            v0 =                                   (10)
the PMVO algorithm was applied for training MLP network                                       maxA − minA
with a single hidden layer and each PMVO solution (weights        In the literature, there is no standard method for selecting the
and biases) was formed by three parts: the connection weights     number of hidden neurons. In this work, the method proposed
between the input layer and the hidden layer, the weights         in [18] [19] [6] was used; the number of neurons in the hidden
between the hidden layer and the output layer, and the bias       layer equals to 2N + 1 , where N , is a number of features in
weights.The length of each solution vector is given by equation   the dataset.
(8), where n is the number of input features and m is the
number of neurons in the hidden layer [6].                                                 IX. R ESULTS
                                                                     All algorithms were tested ten times for every dataset and
        IndividualLength = (n × m) + (2 × m) + 1            (8)   the population size and the maximum number of generations
                                                                  were set to 50 and 200, respectively.
   PMVO solutions are implemented as real number vectors             Table IV shows the statistical results: average, best, worst
when each vector belongs to the interval [-1, 1]. The mean        and standard deviation of classification accuracy. The results
square error (MSE) was used to measure the fitness value of       of PMVO outperformed other approaches in breast cancer,
PMVO solutions.MSE was calculated based on the difference         blood, liver, vertebral with an average accuracy of 0.962,
between the estimated and actual values of the neural network     0.766, 0.752, 0.839. In addition, PMVO was ranked second
using the training datasets, as shown in equation (9), where n    in diabetes and Parkinson datasets with an average accuracy
is the number of samples in the training dataset y and ŷ are     of 0.783, 0.842 respectively. Moreover, it can also be seen that
respectively the actual and predicted values:                     the PMVO has a smaller Std which indicates that PMVO is
                                       n                          stable. Table V shows the average, best and worst MSE with
                                    1X             2
                        M SE =            (y − ŷ)          (9)   standard deviation, obtained for each algorithm. As a result,
                                    n i=1                         it can be noted that PMVO outperforms other techniques in
           VII. EXPERIMENTS AND RESULTS                           four datasets: breast cancer, blood, liver and vertebral with an
                                                                  average MSE of 0.032, 0.168, 0.176, 0.131, respectively. In
   This section presents the evaluation of the proposed PMVO
                                                                  addition, it can be also noticed that PMVO has small standard
for training MLP networks on six well-known datasets, which
                                                                  deviation value for all datasets which proves the efficiency and
were selected from University Of California Irvine machine
                                                                  robustness of this algorithm.
learning (UCI)1 and Kaggle2 dataset repositories. Table I
                                                                     Figures 1 2 3 4 5 and 6 show the convergence curves of
shows the classification of these datasets in terms of features
                                                                  all metaheuristics training algorithms based on the average
number, classes, training and testing samples. The comparison
                                                                  values of MSE.The convergence curves show that PMVO has
 1 http://archive.ics.uci.edu/ml/                                 the lowest value of MSE for four datasets: breast cancer,
 2 https://www.kaggle.com/datasets                                blood, liver and vertebral. Moreover, PMVO has the fastest
convergence speed in liver, vertebral, blood and European
datasets. For diabetes dataset, PMVO provides a very close
performance compared to MVO algorithm. These results show
that PMVO has a faster convergence and a better optimization
than other metaheuristic algorithms.
TableIII shows the average ranks obtained by each optimiza-
tion technique in the Friedman test. The comparative shows
that the proposed algorithm outperforms other algorithms.

                              TABLE I
               S UMMARY OF THE CLASSIFICATION DATASETS

   DataSets         Features    Training samples     Testing samples
   Blood            4           493                  255
   Breast cancer    8           461                  238
   Diabetes         8           506                  262
   Vertebral        6           204                  106
   Liver            6           79                   41
   Parkinson        22          128                  67                Fig. 1. MSE convergence curve of Breast cancer


                            TABLE II
      I NITIAL PARAMETERS OF THE OPTIMIZATION ALGORITHMS

              Parameter    Definition                  Value
                           Acceleration constants      [2.1,2.1]
              PSO          Intertia weights            [0.9,0.6]
                           Number of particles         50
                           Minimum wormhole            0.2
              MVO
                           Maximum wormhole            1
                           Population size             50
                           Numer of generations        200
                           Number of search agents     50
              MFO          b                           1
                           t                           [-1,1]
                           Population size             200
                           Numer of generations        200
                           r                           [0,1]
              WOA
                           Population size             50
                           B                           0.5
              HACPSO
                           Population size             50
                           Numer of generations        200                Fig. 2. MSE convergence curve of Blood


                           TABLE III
         AVERAGE R ANKING OF THE TECHNIQUES (F RIEDMAN )

                          Algorithms    Ranking
                          PMVO          1.33
                          PSO           3.25
                          MFO           4.08
                          MVO           3
                          WOA           4.92
                          HACPSO        4.91
                          BP            6.5


                           X. C ONCLUSION
   In this paper, we have proposed a new training approach
based on Particle swarm optimization, Multi-Verse Optimiza-
tion to train the feedforward neural network. The training
method took into account the capabilities of the PMVO in
terms of high exploration and exploitation to locate the optimal         Fig. 3. MSE convergence curve of Diabetes
values for weights and biases of FFNN. The approach is
                                  TABLE IV
                               ACCURACY RESULTS


                    PMVO     PSO       MFO       MVO     WOA     HACPSO   BP
B.cancer    Avg     0.962    0.959     0.958     0.958   0.956   0.959    0.744
            STd     0.002    0.003     0.002     0.002   0.006   0.004    0.254
            Best    0.964    0.960     0.963     0.960   0.963   0.965    0.945
            Worst   0.957    0.953     0.954     0.954   0.947   0.952    0.680
Blood       Avg     0.766    0.762     0.763     0.764   0.762   0.760    0.744
            STd     0.003    0.002     0.005     0.009   0.004   0.003    0.254
            Best    0.768    0.765     0.765     0.784   0.774   0.760    0.945
            Worst   0.762    0.760     0.760     0.760   0.758   0.758    0.680
Diabetes    Avg     0.783    0.780     0.724     0.792   0.720   0.768    0.619
            STd     0.018    0.011     0.008     0.006   0.028   0.006    0.08
            Best    0.792    0.796     0.735     0.802   0.750   0.774    0.690
            Worst   0.732    0.758     0.709     0.782   0.657   0.764    0.601
Liver       Avg     0.752    0.722     0.722     0.737   0.665   0.679    0.519
            STd     0.003    0.017     0.010     0.006   0.030   0.020    0.055
            Best    0.785    0.748     0.744     0.744   0.700   0.709    0.586
            Worst   0.721    0.700     0.709     0.726   0.603   0.669    0.495
Vertebral   Avg     0.839    0.836     0.836     0.836   0.774   0.717    0.651
            STd     0.006    0.012     0.007     0.005   0.052   0.015    0.170
            Best    0.845    0.848     0.843     0.838   0.862   0.833    0.866
            Worst   0.828    0.808     0.82      0.828   0.676   0.794    0.627
Parkinson   Avg     0.842    0.841     0.802     0.802   0.824   0.875    0.750
            STd     0.029    0.040     0.022     0.046   0.016   0.013    0.199
            Best    0.882    0.898     0.0.828   0.867   0.852   0.905    0.849
            Worst   0.788    0.773     0.773     0.695   0.800   0.863    0.623


                                      TABLE V
                                     MSE R ESULTS


                    PMVO     PSO        MFO      MVO     WOA     HACPSO   BP
B.cancer    Avg     0.032    0.032      0.033    0.032   0.047   0.040    0.049
            Std     0.001    0.001      0.001    0.002   0.004   0.001    0.015
            Best    0.030    0.030      0.310    0.030   0.043   0.038    0.030
            Worst   0.0336   0.326      0.036    0.032   0.058   0.041    0.050
Blood       Avg     0.168    0.178      0.176    0.170   0.180   0.169    0.174
            Std     0.005    0.003      0.005    0.008   0.004   0.003    0.009
            Best    0.160    0.182      0.174    0.155   0.174   0.162    0.172
            Worst   0.174    0.182      0.177    0.181   0.187   0.175    0.175
Diabetes    Avg     0.151    0.155      0.171    0.147   0.186   0.163    0.179
            Std     0.002    0.004      0.005    0.001   0.013   0.002    0.066
            Best    0.1491   0.151      0.165    0.145   0.168   0.160    0.168
            Worst   0.153    0.157      0.175    0.149   0.213   0.166    0.180
Liver       Avg     0.176    0.191      0.193    0.186   0.220   0.212    0.210
            Std     0.004    0.003      0.002    0.004   0.007   0.002    0.003
            Best    0.170    0.185      0.189    0.179   0.210   0.208    0.190
            Worst   0.180    0.195      0.194    0.194   0.233   0.215    0.220
Vertebral   Avg     0.131    0.134      0.136    0.133   0.163   0.146    0.168
            Std     0.006    0.002      0.002    0.002   0.018   0.002    0.015
            Best    0.120    0.132      0.135    0.131   0.137   0.142    0.160
            Worst   0.138    0.135      0.137    0.134   0.202   0.147    0.172
Parkinson   Avg     0.134    0.141      0.158    0.147   0.165   0.119    0.158
            STd     0.014    0.023      0.019    0.020   0.030   0.005    0.018
            Best    0.119    0.099      0.135    0.125   0.127   0.112    0.137
            Worst   0.157    0.182      0.203    0.197   0.228   0.130    0.203
                                             proposed to minimize the training error and to increase the
                                             accuracy. The approach is benchmarked and evaluated using
                                             six standard bio-medical datasets.
                                                The comparison between the proposed algorithm and PSO,
                                             MFO, MVO, WOA, HACPSO and standard BP with momen-
                                             tum term and adaptive learning rate shows the superiority of
                                             the PMVO algorithm with high accuracy and small MSE in
                                             most of the datasets compared with other training algorithms.
                                             Moreover, the small value of standard deviation shows that
                                             our trainer is robust and stable. Finally, from the experiment,
                                             we can conclude that PMVO can give good results and can
                                             be an alternative to other training methods.
                                                In future works, we focus on how to extend this work to
                                             solve a more real world problem and we test the performance
                                             of PMVO to train other types of neural networks.

 Fig. 4. MSE convergence curves of Liver                                   R EFERENCES
                                              [1] J.-R. Zhang, J. Zhang, T.-M. Lok, and M. R. Lyu, A hybrid particle
                                                  swarm optimizationback-propagation algorithm for feedforward neural
                                                  network training, Applied Mathematics and Computation, vol. 185, no.
                                                  2, pp. 10261037, 2007.
                                              [2] S. Kiranyaz, T. Ince, A. Yildirim, and M. Gabbouj, Evolutionary artificial
                                                  neural networks by multi-dimensional particle swarm optimization,
                                                  Neural Networks, vol. 22, no. 10, pp. 14481462, 2009.
                                              [3] T. Kenter, A. Borisov, C. V. Gysel, M. Dehghani, M. D. Rijke, and
                                                  B. Mitra, Neural Networks for Information Retrieval, Proceedings of
                                                  the Eleventh ACM International Conference on Web Search and Data
                                                  Mining - WSDM 18, 2018.
                                              [4] L. Wang, Y. Li, J. Huang, and S. Lazebnik, Learning Two-Branch Neural
                                                  Networks for Image-Text Matching Tasks, IEEE Transactions on Pattern
                                                  Analysis and Machine Intelligence, vol. 41, no. 2, pp. 394407, Jan. 2019.
                                              [5] N. S. Jaddi, S. Abdullah, and A. R. Hamdan, Optimization of neural
                                                  network model using modified bat-inspired algorithm, Applied Soft
                                                  Computing, vol. 37, pp. 7186, 2015.
                                              [6] H. Faris, I. Aljarah, and S. Mirjalili, Training feedforward neural
                                                  networks using multi-verse optimizer for binary classification problems,
                                                  Applied Intelligence, vol. 45, no. 2, pp. 322332, May 2016.
                                              [7] I. Aljarah, H. Faris, and S. Mirjalili, Optimizing connection weights in
                                                  neural networks using the whale optimization algorithm, Soft Comput-
                                                  ing, vol. 22, no. 1, pp. 115, 2016.
Fig. 5. MSE convergence curve of Vertebral    [8] M. F. Hassanin, A. M. Shoeb, and A. E. Hassanien, Grey wolf optimizer-
                                                  based back-propagation neural network algorithm, 2016 12th Interna-
                                                  tional Computer Engineering Conference (ICENCO), 2016.
                                              [9] H. Faris, S. Mirjalili, and I. Aljarah, Automatic selection of hidden
                                                  neurons and weights in neural networks using grey wolf optimizer based
                                                  on a hybrid encoding scheme, International Journal of Machine Learning
                                                  and Cybernetics, vol. 10, no. 10, pp. 29012920, Jun. 2019.
                                             [10] I. Aljarah, H. Faris, S. Mirjalili, and N. Al-Madi, Training radial
                                                  basis function networks using biogeography-based optimizer, Neural
                                                  Computing and Applications, vol. 29, no. 7, pp. 529553, 2016.
                                             [11] H. Faris, I. Aljarah, and S. Mirjalili, Evolving Radial Basis Function
                                                  Networks Using MothFlame Optimizer, Handbook of Neural Computa-
                                                  tion, pp. 537550, 2017.
                                             [12] H. Faris, I. Aljarah, and S. Mirjalili, Improved monarch butterfly opti-
                                                  mization for unconstrained global search and neural network training,
                                                  Applied Intelligence, vol. 48, no. 2, pp. 445464, 2017.
                                             [13] O. Tarkhaneh and H. Shen, Training of feedforward neural networks for
                                                  data classification using hybrid particle swarm optimization, Mantegna
                                                  Lvy flight and neighborhood search, Heliyon, vol. 5, no. 4, 2019.
                                             [14] A. Khan, R. Shah, M. Imran, A. Khan, J. I. Bangash, and K. Shah,
                                                  An alternative approach to neural network training based on hybrid bio
                                                  meta-heuristic algorithm, Journal of Ambient Intelligence and Human-
                                                  ized Computing, vol. 10, no. 10, pp. 38213830, 2019.
                                             [15] R. Mendes, P. Cortez, M. Rocha, and J. Neves, Particle swarms for feed-
                                                  forward neural network training, Proceedings of the 2002 International
Fig. 6. MSE convergence curve of Parkinson
                                                  Joint Conference on Neural Networks. IJCNN02 (Cat. No.02CH37290),
                                                  2002.
[16] J. Kennedy, ; R.Eberhart, Particle Swarm Optimization, Proceedings of
     IEEE International Conference on Neural Networks. ICNN.1995.488968
     IV. pp. 19421948 , 1995.
[17] S. Mirjalili, S. M. Mirjalili, and A. Hatamlou, Multi-Verse Optimizer:
     a nature-inspired algorithm for global optimization, Neural Computing
     and Applications, vol. 27, no. 2, pp. 495513, 2015.
[18] S. Mirjalili, S. M. Mirjalili, and A. Lewis, Let a biogeography-based
     optimizer train your Multi-Layer Perceptron, Information Sciences, vol.
     269, pp. 188209, 2014.
[19] Mirjalili, S. (2015). S. Mirjalili, How effective is the Grey Wolf
     optimizer in training multi-layer perceptrons, Applied Intelligence, vol.
     43, no. 1, pp. 150161, 2015.
[20] Sagarika, T.R.Jyothsna, Tunning of PSO algorithm for single machine
     and multi machine power system using STATCOM controller, interna-
     tional journal of engineering and technology, vol 2, issue 4, pp.175-182,
     2015.
[21] K.Karthikeyan, P.K.Dhal, Transient stability enhancement by optimal
     location and tuning of STATCOM using PSO, Procedia technology,
     2015.
[22] DJ.Montana , L.Davis Training feedforward neural networks using
     genetic algorithms. In: Proceedings of the 11th International Joint
     Conference on Artificial Intelligence - Volume 1, Morgan Kaufmann
     Publishers Inc., San Francisco, CA, USA, IJCAI89, pp 762767, 1989.
[23] A.Slowik, M.Bialko Training of artificial neural networks using differ-
     ential evolution algorithm. In: Conference on Human System Interac-
     tions,IEEE, pp 6065,2008.