Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Training feedforward neural networks using hybrid particle swarm optimization, Multi-Verse Optimization * 1st Rabab Bousmaha 2nd Reda Mohamed Hamou 3rd Amine Abdelmalek GeCoDe Laboratory GeCoDe Laboratory GeCoDe Laboratory Department of Computer Science Department of Computer Science Department of Computer Science University of Saida University of Saida University of Saida Saida, Algeria Saida, Algeria Saida, Algeria Rabab.bousmaha@gmail.com hamoureda@yahoo.fr amineabd1@yahoo.fr Abstract—The learning process of artificial neural networks value of weights and biases and the tendency to be trapped is an important and complex task in the supervised learning in local minima [1].To address these problems, stochastic field. The main difficulty of training a neural network is the search methods, such as metaheuristics have been proposed as process of fine-tuning the best set of control parameters in terms of weight and bias. This paper presents a new training method alternative methods for training feedforward neural network. based on hybrid particle swarm optimization with Multi-Verse Metaheuristics have many advantages: they apply to any type Optimization (PMVO) to train the feedforward neural networks. of ANN with any activation function [2], are particularly useful The hybrid algorithm is utilized to search better in solution for dealing with large complex problems that generate many space which proves its efficiency in reducing the problems of local optima [3] [4]. Genetic algorithm (GA) and Particle trapping in local minima. The performance of the proposed approach was compared with five evolutionary techniques and Swarm Optimization (PSO) considered as the most well- the standard momentum backpropagation and adaptive learning known nature inspired MLP trainers. Montana and Davis rate. The comparison was benchmarked and evaluated using proposed one of the earliest works on training the feedforward six bio-medical datasets. The results of the comparative study neural network (FFNN) with GA [22]. They showed that GA show that PMVO outperformed other training methods in most outperform BP when solving real problems.Slowik and Bialko datasets and can be an alternative to other training methods. Index Terms—Particle swarm optimization, Multi-Verse Op- [23] employed Differential Evolution (DE) for training MLP timization, Training feedforward neural networks, Real world and showed that it has promising performance compared to Datasets BP and Levenberg-Marquardt methods. Others metaheuristics algorithms have been applied in train- I. I NTRODUCTION ing feedforward MLP, such as the modified BAT [5], Multi- Artificial neural network (ANN) is one of the most impor- Verse Optimization MVO [6], Whale Optimization Algorithm tant data mining techniques. It has been successfully applied (WOA) [7], Grey Wolf Optimizer (GWO) [8] [9], Biogeogra- to many fields. The feedforward multilayer perceptron (MLP) phy Based on Optimizer (BBO) [10], Moth-Flame Optimiza- is one of the best-known neural networks. The multilayer per- tion (MFO) [11] and Improved Monarch Butterfly Optimiza- ceptron (MLP) consists of three layers composed of neurons tion (IMBO) [12]. Furthermore, several hybrid algorithms have organized into input, output and hidden layers. The success been proposed to train a neural network. Tarkhaneh and Shen of an MLP generally depends on the training process that is [13] suggested a hybrid approach to neural network training determined by training algorithms. The objective of the train- by combining PSO, Mantegna Levy flight and neighbor search ing algorithms is to find the best connection between weights (LPSONS). The comparison experiments showed that the and biases that minimize the classification error. Training proposed algorithm can find optimal results. Khan et al [14] algorithms can be classified into two classes: gradient-based introduced a new method based on two algorithms, accelerated and stochastic search methods. Backpropagation (BP) and its particle swarm optimization (APSO) and cuckoo search (CS), variants are gradient-based methods and considered as one of named HACPSO. The comparison results demonstrated that the most popular techniques used to train the MLP neural the proposed algorithm outperforms other algorithms in term network. Gradient-based methods have many drawbacks, such of accuracy, MSE and standard deviation. This paper presents as the slow convergence, the high dependency on the initial a new training approach based on hybrid particle swarm opti- mization (PSO) with Multi-Verse Optimization (MVO), called PMVO, to train the feedforward neural network (FFNN). Six max min −w datasets were solved by the proposed trainer. Moreover, the wmax −( (w maxiteration )∗iteration ),C1 , C2 two positive constants t+1 application of the trainer was investigated in bio-medical. The , R1 ,R2 are random numbers in the interval of [0, 1], Vi,j is performance of PMVO was compared with five well-known the velocity of jth member of ith particle at iteration number trainer metaheuristics algorithms in the literature: PSO [15], (t) and (t+1). The new position values X t+1 are obtained by MFO [11], MVO [6], WOA [7], HACPSO [14]. adding the velocity updates determined by the formula given in Equation (5). II. ARTIFICIAL NEURAL NETWORKS (ANN S ) An artificial neural network (ANN) is a computational IV. M ULTI - VERSE OPTIMIZATION (MVO) model based on the structure and functions of the biological Multiverse optimization proposed by Syed Ali mirjalili in brain and nervous system. The feedforward neural network 2015 [17] As Inspired by the concepts of white holes, black (FFNN) is one of the most popular types of artificial neural holes, and wormholes in the multi-verse theory and big bang network [6].FFNN has three interconnected layers. The first theory. In this algorithm, the models of these three concepts are layer consists of input neurons. These neurons send the data developed to perform exploration and exploitation and local to the second layer, called the hidden layer, which sends the search. The fitness function for each search agent is indicated output neurons to the third layer. In FFNN, the information by the inflation rate, and each object and each universe in the travels in one direction, from the input layer to the output search agent represent a candidate solution and a variable in layer. The node or the artificial neuron multiplies each of these the candidate solution. inputs by weight, as shown in (1): In this algorithm, the larger universes tend to send objectives n X to smaller universes. A large universe is defined based on Sj = wi,j Ii + βj (1) inflation rate in the multi-verse theory. The following rules i=1 are applied to the universes of the MVO: where, n is the total number of neuron inputs, W ij is the • If the inflation rate rate is higher, the probability of having connection weight connecting Ij to neuron j and βj is a bias a white hole is higher. weight [6]. Then, the node or the artificial neuron adds the • If the inflation rate rate is higher, the probability of having multiplications and sends the sum to a transfer function, for black holes is lower. example, Sigmoid function presented in (2): • Universes having higher inflation rate rate send the ob- 1 jects through white holes. f (x) = (2) • Universes having lower inflation rate rate tend to receive 1 + e−x more objects through black holes. The output of the neuron j can be described as follows (3): • The objects of all universes may be replaced by the n X objects of the universe with the greater inflation rate. yj = fj ( wi,j Ii + βj ) (3) The mathematical model of this algorithm is as follows: i=1 After building the neural network, the set of network weights  are adjusted to approximate the desired results. This process  xj + T DR + ((ubj − lbj ) ∗ r4 + lbj ),if r3 < 0.5 ,if r2 < W EP is carried out by applying a training algorithm to adapt the xji = xj − T DR + ((ubj − lbj ) ∗ r4 + lbj ),if r3 ≥ 0.5 j weights until error criteria are met [6]. xi , if r2 ≥ W EP  (6) III. PARTICLE S WARM O PTIMIZATION (PSO) in 1995 Russell Eberhart and James Kennedy have invented the particle swarm optimization which is a population-based Where xj indicates the jth variable in the bests universe, stochastic optimization technique inspired by birds flocking lbi indicates the lower bound in jth variable, ubi shows the around food sources. like each other evolutionary computa- upper bound in jth variable, r2 ,r3 , r4 are random numbers tional algorithms. In PSO, each individual is a bird in the in the interval of [0, 1], T DP/W EP are coefficients, and xji search space. We call it a particle. All of the particles have indicates the jth parameter in ith universe. fitness values which are evaluated by the fitness function to be optimized and flies in the space with a velocity which is V. H YBRID PSO-MVO: dynamically adjusted according to its own flying experience Hybrid PSO-MVO is sequential combination of PSO and [16]. MVO. The algorithm merges the best strength of both PSO in exploitation and MVO in exploration towards the optimum t+1 t solution when the universe value of MVO replace the Pbest Vi,j = Vi,j W + C1 R1 (P bestt − X t ) + C2 R2 (Gbestt − X t ) value of PSO [20] [21]. In this paper we propose a novel (4) training algorithm based on this algorithm for the first time in X t+1 = X t + V t+1 − i = 1, 2...N P )And(j = 1, 2...N G) the following section.The equation can be written as follows: (5) t+1 t where P bestt and Gbestt denote the best particle position Vi,j = Vi,j W + C1 R1 (U niversest − X t ) + C2 R2 (Gbestt − X t ) and best group position and w is inertia weight w = (7) Step 1: Initialize the PSO values of PMVO was carried out with five approaches used to train Step 2: Evaluate the fitness function of each particle feedforward neural network in the literature: PSO [15], MFO Step 3: Determine Gbest from the Pbest value [11], MVO [6], WOA [7], HACPSO [14] In addition, the Step 4: updated velocity and position values of each particule proposed algorithm was compared with standard momentum Step 5: verify the solution whether it is feasible or not Back-Propagation and adaptive learning rate and (BP), which Step 6: steps 2 to 5 were repeated until the maximum number are gradient-based algorithms. of iterations was reached. Step 9: Use the optimal solutions of PSO as boundary to VIII. E XPERIMENTAL SETUP MVO algorithm The proposed trainer and other algorithms were imple- Step 10: Initialize the MVO values mented with Python language and a personal computer with Step 11: Evaluate the inflation rate of the universe (fitness Intel(R) Core(TM) CPU 1.60 GHz 2.30 GHz, 64 Bits Windows function) 7 operating system and 4 GB (RAM). Step 12: Update the position of the universes The metaheuristics are sensitive to the value of their pa- Step 13: if the convergence criterion is reached; get the rameters, which requires a careful initialization. Therefore, the results control parameters recommended in the literature were used Step 14: if the convergence criterion is not reached; continue [15] [6] [7] and summarized in Table II. All datasets were the process from step 11-14 divided into 66% for training and 34% for testing. Moreover, all features were mapped to the interval [0, 1] to eliminate the effect of features that have different scales. Min-max VI. PMVO FOR TRAINING MLP normalization is applied to perform a linear transformation This section presents the proposed approach based on the on the original data, were v 0 is the normalized value of v in PMVO to train the MLP network named PMVO.Two impor- the range [minA , maxA ] as given in (10). tant points are taken into consideration: the fitness function vi − minA and the representation of the PMVO solutions. In this work, v0 = (10) the PMVO algorithm was applied for training MLP network maxA − minA with a single hidden layer and each PMVO solution (weights In the literature, there is no standard method for selecting the and biases) was formed by three parts: the connection weights number of hidden neurons. In this work, the method proposed between the input layer and the hidden layer, the weights in [18] [19] [6] was used; the number of neurons in the hidden between the hidden layer and the output layer, and the bias layer equals to 2N + 1 , where N , is a number of features in weights.The length of each solution vector is given by equation the dataset. (8), where n is the number of input features and m is the number of neurons in the hidden layer [6]. IX. R ESULTS All algorithms were tested ten times for every dataset and IndividualLength = (n × m) + (2 × m) + 1 (8) the population size and the maximum number of generations were set to 50 and 200, respectively. PMVO solutions are implemented as real number vectors Table IV shows the statistical results: average, best, worst when each vector belongs to the interval [-1, 1]. The mean and standard deviation of classification accuracy. The results square error (MSE) was used to measure the fitness value of of PMVO outperformed other approaches in breast cancer, PMVO solutions.MSE was calculated based on the difference blood, liver, vertebral with an average accuracy of 0.962, between the estimated and actual values of the neural network 0.766, 0.752, 0.839. In addition, PMVO was ranked second using the training datasets, as shown in equation (9), where n in diabetes and Parkinson datasets with an average accuracy is the number of samples in the training dataset y and ŷ are of 0.783, 0.842 respectively. Moreover, it can also be seen that respectively the actual and predicted values: the PMVO has a smaller Std which indicates that PMVO is n stable. Table V shows the average, best and worst MSE with 1X 2 M SE = (y − ŷ) (9) standard deviation, obtained for each algorithm. As a result, n i=1 it can be noted that PMVO outperforms other techniques in VII. EXPERIMENTS AND RESULTS four datasets: breast cancer, blood, liver and vertebral with an average MSE of 0.032, 0.168, 0.176, 0.131, respectively. In This section presents the evaluation of the proposed PMVO addition, it can be also noticed that PMVO has small standard for training MLP networks on six well-known datasets, which deviation value for all datasets which proves the efficiency and were selected from University Of California Irvine machine robustness of this algorithm. learning (UCI)1 and Kaggle2 dataset repositories. Table I Figures 1 2 3 4 5 and 6 show the convergence curves of shows the classification of these datasets in terms of features all metaheuristics training algorithms based on the average number, classes, training and testing samples. The comparison values of MSE.The convergence curves show that PMVO has 1 http://archive.ics.uci.edu/ml/ the lowest value of MSE for four datasets: breast cancer, 2 https://www.kaggle.com/datasets blood, liver and vertebral. Moreover, PMVO has the fastest convergence speed in liver, vertebral, blood and European datasets. For diabetes dataset, PMVO provides a very close performance compared to MVO algorithm. These results show that PMVO has a faster convergence and a better optimization than other metaheuristic algorithms. TableIII shows the average ranks obtained by each optimiza- tion technique in the Friedman test. The comparative shows that the proposed algorithm outperforms other algorithms. TABLE I S UMMARY OF THE CLASSIFICATION DATASETS DataSets Features Training samples Testing samples Blood 4 493 255 Breast cancer 8 461 238 Diabetes 8 506 262 Vertebral 6 204 106 Liver 6 79 41 Parkinson 22 128 67 Fig. 1. MSE convergence curve of Breast cancer TABLE II I NITIAL PARAMETERS OF THE OPTIMIZATION ALGORITHMS Parameter Definition Value Acceleration constants [2.1,2.1] PSO Intertia weights [0.9,0.6] Number of particles 50 Minimum wormhole 0.2 MVO Maximum wormhole 1 Population size 50 Numer of generations 200 Number of search agents 50 MFO b 1 t [-1,1] Population size 200 Numer of generations 200 r [0,1] WOA Population size 50 B 0.5 HACPSO Population size 50 Numer of generations 200 Fig. 2. MSE convergence curve of Blood TABLE III AVERAGE R ANKING OF THE TECHNIQUES (F RIEDMAN ) Algorithms Ranking PMVO 1.33 PSO 3.25 MFO 4.08 MVO 3 WOA 4.92 HACPSO 4.91 BP 6.5 X. C ONCLUSION In this paper, we have proposed a new training approach based on Particle swarm optimization, Multi-Verse Optimiza- tion to train the feedforward neural network. The training method took into account the capabilities of the PMVO in terms of high exploration and exploitation to locate the optimal Fig. 3. MSE convergence curve of Diabetes values for weights and biases of FFNN. The approach is TABLE IV ACCURACY RESULTS PMVO PSO MFO MVO WOA HACPSO BP B.cancer Avg 0.962 0.959 0.958 0.958 0.956 0.959 0.744 STd 0.002 0.003 0.002 0.002 0.006 0.004 0.254 Best 0.964 0.960 0.963 0.960 0.963 0.965 0.945 Worst 0.957 0.953 0.954 0.954 0.947 0.952 0.680 Blood Avg 0.766 0.762 0.763 0.764 0.762 0.760 0.744 STd 0.003 0.002 0.005 0.009 0.004 0.003 0.254 Best 0.768 0.765 0.765 0.784 0.774 0.760 0.945 Worst 0.762 0.760 0.760 0.760 0.758 0.758 0.680 Diabetes Avg 0.783 0.780 0.724 0.792 0.720 0.768 0.619 STd 0.018 0.011 0.008 0.006 0.028 0.006 0.08 Best 0.792 0.796 0.735 0.802 0.750 0.774 0.690 Worst 0.732 0.758 0.709 0.782 0.657 0.764 0.601 Liver Avg 0.752 0.722 0.722 0.737 0.665 0.679 0.519 STd 0.003 0.017 0.010 0.006 0.030 0.020 0.055 Best 0.785 0.748 0.744 0.744 0.700 0.709 0.586 Worst 0.721 0.700 0.709 0.726 0.603 0.669 0.495 Vertebral Avg 0.839 0.836 0.836 0.836 0.774 0.717 0.651 STd 0.006 0.012 0.007 0.005 0.052 0.015 0.170 Best 0.845 0.848 0.843 0.838 0.862 0.833 0.866 Worst 0.828 0.808 0.82 0.828 0.676 0.794 0.627 Parkinson Avg 0.842 0.841 0.802 0.802 0.824 0.875 0.750 STd 0.029 0.040 0.022 0.046 0.016 0.013 0.199 Best 0.882 0.898 0.0.828 0.867 0.852 0.905 0.849 Worst 0.788 0.773 0.773 0.695 0.800 0.863 0.623 TABLE V MSE R ESULTS PMVO PSO MFO MVO WOA HACPSO BP B.cancer Avg 0.032 0.032 0.033 0.032 0.047 0.040 0.049 Std 0.001 0.001 0.001 0.002 0.004 0.001 0.015 Best 0.030 0.030 0.310 0.030 0.043 0.038 0.030 Worst 0.0336 0.326 0.036 0.032 0.058 0.041 0.050 Blood Avg 0.168 0.178 0.176 0.170 0.180 0.169 0.174 Std 0.005 0.003 0.005 0.008 0.004 0.003 0.009 Best 0.160 0.182 0.174 0.155 0.174 0.162 0.172 Worst 0.174 0.182 0.177 0.181 0.187 0.175 0.175 Diabetes Avg 0.151 0.155 0.171 0.147 0.186 0.163 0.179 Std 0.002 0.004 0.005 0.001 0.013 0.002 0.066 Best 0.1491 0.151 0.165 0.145 0.168 0.160 0.168 Worst 0.153 0.157 0.175 0.149 0.213 0.166 0.180 Liver Avg 0.176 0.191 0.193 0.186 0.220 0.212 0.210 Std 0.004 0.003 0.002 0.004 0.007 0.002 0.003 Best 0.170 0.185 0.189 0.179 0.210 0.208 0.190 Worst 0.180 0.195 0.194 0.194 0.233 0.215 0.220 Vertebral Avg 0.131 0.134 0.136 0.133 0.163 0.146 0.168 Std 0.006 0.002 0.002 0.002 0.018 0.002 0.015 Best 0.120 0.132 0.135 0.131 0.137 0.142 0.160 Worst 0.138 0.135 0.137 0.134 0.202 0.147 0.172 Parkinson Avg 0.134 0.141 0.158 0.147 0.165 0.119 0.158 STd 0.014 0.023 0.019 0.020 0.030 0.005 0.018 Best 0.119 0.099 0.135 0.125 0.127 0.112 0.137 Worst 0.157 0.182 0.203 0.197 0.228 0.130 0.203 proposed to minimize the training error and to increase the accuracy. The approach is benchmarked and evaluated using six standard bio-medical datasets. The comparison between the proposed algorithm and PSO, MFO, MVO, WOA, HACPSO and standard BP with momen- tum term and adaptive learning rate shows the superiority of the PMVO algorithm with high accuracy and small MSE in most of the datasets compared with other training algorithms. Moreover, the small value of standard deviation shows that our trainer is robust and stable. Finally, from the experiment, we can conclude that PMVO can give good results and can be an alternative to other training methods. In future works, we focus on how to extend this work to solve a more real world problem and we test the performance of PMVO to train other types of neural networks. Fig. 4. MSE convergence curves of Liver R EFERENCES [1] J.-R. Zhang, J. Zhang, T.-M. Lok, and M. R. Lyu, A hybrid particle swarm optimizationback-propagation algorithm for feedforward neural network training, Applied Mathematics and Computation, vol. 185, no. 2, pp. 10261037, 2007. [2] S. Kiranyaz, T. Ince, A. Yildirim, and M. Gabbouj, Evolutionary artificial neural networks by multi-dimensional particle swarm optimization, Neural Networks, vol. 22, no. 10, pp. 14481462, 2009. [3] T. Kenter, A. Borisov, C. V. Gysel, M. Dehghani, M. D. Rijke, and B. Mitra, Neural Networks for Information Retrieval, Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining - WSDM 18, 2018. [4] L. Wang, Y. Li, J. Huang, and S. Lazebnik, Learning Two-Branch Neural Networks for Image-Text Matching Tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 394407, Jan. 2019. [5] N. S. Jaddi, S. Abdullah, and A. R. Hamdan, Optimization of neural network model using modified bat-inspired algorithm, Applied Soft Computing, vol. 37, pp. 7186, 2015. [6] H. Faris, I. Aljarah, and S. Mirjalili, Training feedforward neural networks using multi-verse optimizer for binary classification problems, Applied Intelligence, vol. 45, no. 2, pp. 322332, May 2016. [7] I. Aljarah, H. Faris, and S. Mirjalili, Optimizing connection weights in neural networks using the whale optimization algorithm, Soft Comput- ing, vol. 22, no. 1, pp. 115, 2016. Fig. 5. MSE convergence curve of Vertebral [8] M. F. Hassanin, A. M. Shoeb, and A. E. Hassanien, Grey wolf optimizer- based back-propagation neural network algorithm, 2016 12th Interna- tional Computer Engineering Conference (ICENCO), 2016. [9] H. Faris, S. Mirjalili, and I. Aljarah, Automatic selection of hidden neurons and weights in neural networks using grey wolf optimizer based on a hybrid encoding scheme, International Journal of Machine Learning and Cybernetics, vol. 10, no. 10, pp. 29012920, Jun. 2019. [10] I. Aljarah, H. Faris, S. Mirjalili, and N. Al-Madi, Training radial basis function networks using biogeography-based optimizer, Neural Computing and Applications, vol. 29, no. 7, pp. 529553, 2016. [11] H. Faris, I. Aljarah, and S. Mirjalili, Evolving Radial Basis Function Networks Using MothFlame Optimizer, Handbook of Neural Computa- tion, pp. 537550, 2017. [12] H. Faris, I. Aljarah, and S. Mirjalili, Improved monarch butterfly opti- mization for unconstrained global search and neural network training, Applied Intelligence, vol. 48, no. 2, pp. 445464, 2017. [13] O. Tarkhaneh and H. Shen, Training of feedforward neural networks for data classification using hybrid particle swarm optimization, Mantegna Lvy flight and neighborhood search, Heliyon, vol. 5, no. 4, 2019. [14] A. Khan, R. Shah, M. Imran, A. Khan, J. I. Bangash, and K. Shah, An alternative approach to neural network training based on hybrid bio meta-heuristic algorithm, Journal of Ambient Intelligence and Human- ized Computing, vol. 10, no. 10, pp. 38213830, 2019. [15] R. Mendes, P. Cortez, M. Rocha, and J. Neves, Particle swarms for feed- forward neural network training, Proceedings of the 2002 International Fig. 6. MSE convergence curve of Parkinson Joint Conference on Neural Networks. IJCNN02 (Cat. No.02CH37290), 2002. [16] J. Kennedy, ; R.Eberhart, Particle Swarm Optimization, Proceedings of IEEE International Conference on Neural Networks. ICNN.1995.488968 IV. pp. 19421948 , 1995. [17] S. Mirjalili, S. M. Mirjalili, and A. Hatamlou, Multi-Verse Optimizer: a nature-inspired algorithm for global optimization, Neural Computing and Applications, vol. 27, no. 2, pp. 495513, 2015. [18] S. Mirjalili, S. M. Mirjalili, and A. Lewis, Let a biogeography-based optimizer train your Multi-Layer Perceptron, Information Sciences, vol. 269, pp. 188209, 2014. [19] Mirjalili, S. (2015). S. Mirjalili, How effective is the Grey Wolf optimizer in training multi-layer perceptrons, Applied Intelligence, vol. 43, no. 1, pp. 150161, 2015. [20] Sagarika, T.R.Jyothsna, Tunning of PSO algorithm for single machine and multi machine power system using STATCOM controller, interna- tional journal of engineering and technology, vol 2, issue 4, pp.175-182, 2015. [21] K.Karthikeyan, P.K.Dhal, Transient stability enhancement by optimal location and tuning of STATCOM using PSO, Procedia technology, 2015. [22] DJ.Montana , L.Davis Training feedforward neural networks using genetic algorithms. In: Proceedings of the 11th International Joint Conference on Artificial Intelligence - Volume 1, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI89, pp 762767, 1989. [23] A.Slowik, M.Bialko Training of artificial neural networks using differ- ential evolution algorithm. In: Conference on Human System Interac- tions,IEEE, pp 6065,2008.