RBF-based surrogate model for evolutionary optimization⋆ Lukáš Bajer1,2 and Martin Holeňa2 1 Faculty of Mathematics and Physics, Charles University Malostranské nám. 25, Prague 1, Czech Republic bajer@cs.cas.cz 2 Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou věžı́ 2, Prague 8, Czech Republic martin@cs.cas.cz Abstract. Many today’s engineering tasks use approx- both continuous and discrete variables [16, 7]. This paper describes a particular surrogate model based imation of their expensive objective function. Surrogate models, which are frequently used for this purpose, can save on radial basis function (RBF) networks and gener- significant costs by substituting some of the experimental alized linear models (GLMs). Most of the exist- evaluations or simulations needed to achieve an optimaling works [22, 17, 9] deal with only continuous do- or near-optimal solution. This paper presents a surrogate mains or combination with integer variables, but the model based on RBF networks. In contrast to the most of works dealing with mixed-variables surrogate models the surrogate models in the current literature, it can be di- are rather few [21, 19]. rectly used for problems with mixed continuous and discrete In our model, multiple RBF networks are trained variables – clustering and generalized linear models are em- and discrete variables are used either for focusing ployed for dealing with discrete covariates. The model has been tested on a benchmark optimization problem and itstraining of the networks on the most appropriate data, or generalized linear model is constructed on this part approximation properties are presented on a real-world ap- plication data. of the data. The paper is organized as follows: in the next sec- tion, we recall principles of surrogate modelling, RBF 1 Introduction networks and GLMs. Section 3 describes our approach to constructing a surrogate models and using it in op- Optimization of different kinds of empirical objective timization. Finally, Section 4 provides the results of functions is included in many of todays engineering testing on a benchmark function and real-world data. or industrial applications – in situations where the value of the objective function is obtained through some measurement, experiment or simulation. High 2 Problem description costs or extensive time demands needed for evaluat- ing such functions motivate engineers to reduce the For any given objective function f : S → IR, number of such evaluations. we consider the mixed-variable optimization problem Surrogate modelling [3, 6] is a popular approach (maximization) as finding the global optimum x⋆ = which substitutes an approximating model for some (x(C) (C) (D) (D) 1 , . . . , xn , x1 , . . . , xd ) ∈ S such that of the original function evaluations. This concept is widely used in connection with evolutionary algo- f (x⋆ ) = max f (x). (1) x∈S rithms (EAs). Here, some of the individuals are as- sessed with not necessary accurate, but much faster The search space S has of n continuous and d dis- model. This brings an important benefit: a notably crete variables; forming corresponding subspaces S(C) larger population can be evolved in parallel. Even and S(D) . In addition, we suppose that the value sets though the precise evaluation can be made only on Vs (Xi(D) ), i = 1, . . . , d of the discrete variables are a limited number of individuals, the EA can explore finite and we do not distinguish between ordinal or a larger part of the input space. nominal categorical variables – we assume no ordering Lots of current literature covers optimization in on any of the Vs (Xi(D) ). continuous, or in discrete domains. However, the area of industrial optimization is often characterized by 2.1 Involved methods ⋆ This work was supported by the Grant Agency of the Charles University (GA UK), grant number Surrogate modelling. Approximation of the fitness 278511/2011 (Lukáš Bajer), and by the Czech Science function with some regression model is a common cure Foundation (GA CR), grant number 201/08/0802 (Mar- for tasks when empirical objective function has to be tin Holeňa). used. These surrogate models simulate behaviour of 4 Lukáš Bajer, Martin Holeňa the original function while being much cheaper and 3 Our strategy for using much less time consuming to evaluate. surrogate-assisted genetic As a surrogate model, mainly nonlinear regression optimization models are used, for example gaussian processes [4] or artificial neural networks. In connection with evo- Our version of the surrogate-assisted genetic algorithm lutionary optimization, neural networks of the type including a detailed pseudo-code has been introduced multilayer perceptrons [10] and networks with radial in the previous article [1]. This section describes the basis functions [22, 17] have been particularly popular. construction and using of surrogate models based on The last mentioned kind of neural networks underlies RBF networks. also the model reported in this paper. Combining of the original fitness function and the surrogate model is determined by evolution con- 3.1 Model construction trol (EC). In the literature [10], individual and gener- RBF networks, which were defined in Section 2.1, ation based approaches are distinguished. While the enable us to use only continuous variables for their individual-based EC chooses for evaluation by the fitting. Construction of our first surrogate model [1] original fitness only part of an enlarged population, starts with clustering of the available training data the generation-based approach evaluates in different according to their discrete values into several clus- generations the whole population by either the origi- ters in order to focus the RBF networks training nal, or the model fitness. on the most similar datapoints. Let us call this model RBF/discrete clustering, or shortly RBF/DSCL RBF networks compute a mapping from the input model. Subsequently, separate RBF networks are fit- space (typically a subspace of IRn ) to IR (for simplicity ted with the data of each such a cluster using the we will focus on versions with scalar output) [5]. The datapoints’ continuous variables. The algorithm is the mapping can be expressed as same as described on the Fig. 1 except the omitted g X steps (1)–(3), and the clustering which is made using f (x) = πi fi (||x − ci ||) (2) discrete values from the training database D in the i=1 step (4). where x ∈ IRn is the input, g ∈ IN the number of This approach does not utilize relation between components, fi : IRn → IR are radial basis functions, values of the discrete input variables and the response πi ∈ IR their weights, ci ∈ IRn radial functions’ cen- variable. As was stated in Section 2.1, such a rela- tres, and ||.|| is a norm. As functions fi , Gaussian tion can be expressed by generalized linear models, functions with scalar width δi and euclidean norm and these models form an important part of our new 2 fi (x; ci , δi ) = e−δi ||x−ci || are used most commonly. RBF/GLM surrogate model. Training the RBF/GLM model starts with con- Generalized linear models are a natural generalization struction of two auxiliary models: the first, global RBF of classical linear regression models [13]. They con- network fˆRBF : S(C) → IR is fitted on the continuous sist of three parts: (1) the random component – in- input variables while the second, GLM fˆGLM : S(D)→IR dependent observed values Y following a distribution is built using the discrete variables. Both of them make from the exponential family with mean E(Y) = µ and use of all the available training data and regress the constant variance σ 2 , (2) the systematic component response-variable values. which relates values of explanatory (input) variables (x1 , x2 , . . . , xd ) through a linear model with parame- Global RBF network. Training of the auxiliary ters β1 , . . . , βd RBF network works similarly to the training of the Xd RBF networks in the previous RBF/DSCL model [1] η= xj βj (3) – the same starting values for centers and weights, and j=0 cross-validation for choosing the best number of com- to a linear predictor η, and (3) a link function g that ponents g is used. However, instead of clusters, all the connects the random and systematic components to- data in the database D are used at once. gether: η = g(µ). The explanatory variables are usu- ally supplemented with the constant vector of ones GLM model. Generalized linear model is used in its corresponding to an intercept parameter β0 . continuous-response form and responses are supposed GLMs are particularly useful for our work because from normal distribution Y ∼ N (µ, σ 2 ). Even though they are able to express a relation between discrete the latter assumption generally does not hold, GLMs (integer or after a transformation of values even nom- still provide useful mean of regression expressed on the inal) input variables and a continuous response. basis of the discrete values. RBF-based surrogate model for evolutionary . . . 5 Before using or fitting the GLM, the discrete values network have, but the more distinct discrete values are must be converted to a proper representation. Since we usually grouped together in one cluster. do not expect any ordering on the discrete values, we One separate RBF network rbf j is trained on the have chosen dummy coding [13] which establishes one data of each cluster Cj , j = 1, . . . , m. The maxi- binary indicating variable Iij ∈ {0, 1} for each nomi- mal number of components of each network is upper- (D) nal value from the value sets Vs (Xi ), i = 1, . . . , d, bounded by gjmax = ⌊( k−1 k |Cj |)/ρ⌋. Training these net- (D) j = 1, . . . , |Vs (Xi )| of the original discrete variables. works is analogous to training of the global RBF net- Assignment between the original discrete values and work described in Section 3.1. The only difference is in the dummy coding the training data: only the data of individual clusters are used for each network. dummy : S(D) → {0, 1}|Vs(X1 )|+...+|Vs (Xd )| (4) 3.2 Evaluation with the surrogate model has to be recorded for evaluation with the surrogate model. Once the surrogate model is built, it can be used for evaluating individuals resulting from the evolu- Final RBF clustered model. Having created the tion. The parameters of the model can m be summa- ˆ global RBF network fRBF and the GLM model fGLM , ˆ rized as {rbf GLOB , glm, (rbf j , mse j , diff j )j=1 }. Here, we can proceed with the construction of the final RBF rbf GLOB are global RBF network parameters, glm = clustered surrogate model f : S ˆ (C) → IR. The process (β 0 , . . . , β r ) are parameters of the GLM, rbf j global starts with clustering of the training data from the RBF network parameters, mse j are the MSE CV ob- (D) (C) N tained from cross-validation, and diff j are the differ- database D = {xi , xi , yi }i=1 according to the dif- ence diff (5) averaged on the j-th cluster’s data. ference between responses of the two auxiliary models Given a new individual (x̃(C) , x̃(D) ), evaluation on the corresponding input variables (for i = 1, . . . , N ) with the surrogate model starts with computing ˆ (C) ˆ (D) the difference between responses of the global RBF diff i = fRBF (xi ) − fGLM (dummy(xi )). (5) network and GLM with corresponding parameters The sizes of the clusters have to be at least smin rbf GLOB and glm – the minimal number of data needed for fitting one g = fˆRBF (x̃(C) ; rbf diff RBF network. This number is provided by the user GLOB ) and its best value depends on a particular task. The −fˆGLM (dummy(x̃(D) ); glm). (6) higher the smin is, the more components can each RBF Based on this value, the index c of the cluster with the average difference most similar to the individual’s difference is obtained FitTheModel(smin, D, e) Arguments: smin – min. size of clusters, g |. c = arg min |diff j − diff (7) j=1,...,m D – database, e – type of error estimate: MSE, AIC, or BIC Finally, the response of the c-th final RBF network is Steps of the procedure: used as a return value of the surrogate model (1) (fˆRBF , rbf GLOB ) ← fit the global RBF (2) (fˆGLM , glm) ← fit the GLM ⋆ gc ˆ ˆ X (3) {diff i }N i=1 ← differences (fRBF − fGLM ) on D ỹ = fˆ(x̃(C) ; rbfc ) = πic fic (||x̃(C) − cic ||). (8) (4) {Cj }mj=1 ← cluster D into clusters of size i=1 at least smin according to {diff i }N i=1 (5) for each cluster Cj , j = 1, . . . , m If more than one cluster is at the same distance (6) for gj = 1, . . . , gjmax from the individual, the RBF network with the lowest (7) mse[j, gj ] ← average MSECV from MSECV is chosen. fitting RBF with gj components (8) gj⋆ ← the number of components of the best RBF 4 Implementation and results of (9) rbf j ← retrained RBF network testing with gj⋆ components (10) mse j ← mse[j, gj⋆ ] Our algorithms were implemented in the MATLAB Output: {rbf GLOB , glm, (rbf j , mse j , diff j )mj=1 } environment. We have been utilizing the Global Opti- mization Toolbox which provided us with a platform Fig. 1. Pseudo-code of the fitting procedure. for testing the model on a benchmark optimization 6 Lukáš Bajer, Martin Holeňa task. Similarly, our hierarchical clustering method Valero, RBF/GLM Valero, RBF/DSCL extends the cluster analysis from the Statistical Tool- box which provide us with GLM fitting procedure, too, 400 400 Model fitness Model fitness and we employ a nonlinear curve-fitting from the Optimization Toolbox for fitting RBF networks. 300 300 4.1 Model fitting 200 200 Our models have been tested on three different kinds 200 300 400 200 300 400 Original fitness Original fitness of data. The first two datasets (Valero and HCN) are the same as in our last article [1], the third is the building1 dataset from Proben1 [18] collection. HCN synthesis HCN synthesis 80 80 Valero’s [20] benchmark fitness function was con- structed to resemble empirical fitness functions from Model fitness Model fitness 60 60 chemical engineering. The surrogate models have been 10-times trained on dataset with 2000 randomly 40 40 generated data. Using the same settings for fitting, the average root of the MSE (RMSE) of the new 20 20 RBF/GLM model has been only slightly decreased. (see Table 1 and the top graphs on Fig. 2). 20 40 60 80 20 40 60 80 Original fitness Original fitness RBF/GLM 14.046 ± 1.0435 building1 building1 Valero RBF/DSCL 14.499 ± 1.518 0.7 0.7 RBF/GLM 10.340 ± 1.866 HCN Model fitness Model fitness RBF/DSCL 15.620 ± 1.519 0.6 0.6 RBF/GLM 0.06407 ± 0.00496 building1 0.5 0.5 RBF/DSCL 0.13618 ± 0.00455 0.4 0.4 Table 1. Surrogate-models’ average regression RMSE on Valero’s benchmark, HCN data and building1 dataset. 0.3 0.3 0.3 0.4 0.5 0.6 0.7 0.3 0.4 0.5 0.6 0.7 Original fitness Original fitness The second dataset is from a real application in Fig. 2. Scatter plots of the RBF/GLM (left column) and chemical engineering (cf. using RBF networks in this RBF/DSCL model (right column) on testing data. application area e.g. in the work of Jun [11]): the op- timization of chemical catalysts for Hydrocyanic acid (HCN) synthesis [14]. Solutions of this task are com- the relation between discrete variables and the out- posed of two discrete and 11 continuous variables, the put. Conversely, the results of the RBF/GLM model whole dataset has 696 items. Fitting results are sub- are at least comparable to the results reported stantially different from the benchmark problem (con- elsewhere [12, 2, 15]. sidering the average response in the dataset ȳ = 31.17, the measured RMSE’s are relatively much higher: see middle row of graphs on Fig. 2). RMSE of the 4.2 Genetic algorithm performance on the new RBF/GLM model has been decreased by nearly benchmark fitness 35 % comparing to the previous model’s error. Prechelt’s Proben1 [18] is a popular collection of datasets for data mining, originally intended for neural The benchmark fitness enabled us to test the model networks benchmarking. We have tested our models on with the GA [1]. As shown in Table 2, the GA with the building1 dataset using the first response variable the surrogate model reaches on this function the same indicating electrical energy consumption in a building; fitness values as the non-surrogate GA using only less multiple-trained on the first 3156 and tested on the than 30 per cent of the original fitness function eval- remaining 1052 data, as suggested by Prechelt. Aver- uations (generation-based EC), or it is able to find age results from ten trainings show that the former 1.1-times better solution with 80 per cent of the orig- RBF/DSCL model is not able to sufficiently express inal fitness evaluations (individual-based EC). RBF-based surrogate model for evolutionary . . . 7 EC settings of fitness of number of 4. D. Buche, N. N Schraudolph, P. Koumoutsakos: Ac- the surrogate the best found orig. fitness celerating evolutionary algorithms with gaussian pro- model individual evaluations cess fitness function models. IEEE Trans. on Systems, without model 486.38 ± 56.5 4130 ± 1546 Man, and Cybernetics, Part C: Applications and Re- individual-based 544.73 ± 3.9 3241 ± 926 views 35(2), 2005, 183–194. generation-based 490.28 ± 44.9 1185 ± 358 5. M. D. Buhmann: Radial basis functions: theory and implementations. Cambridge Univ. Press, 2003. Table 2. GA performance without surrogate model and 6. D. Gorissen, I. Couckuyt, P. Demeester, T. Dhaene, with the RBF/DSCL-based model; average results from K. Crombecq: A surrogate modeling and adaptive sam- 100 runs of the algorithm pling toolbox for computer based design. Journal of Machine Learning Research 11, 2010, 2051–2055. 7. M. Holeňa, T. Cukic, U. Rodemerck, D. Linke: Op- timization of catalysts using specific, description-based 5 Conclusion genetic algorithms. Journal of Chemical Information and Modeling 48(2), 2008, 274–282. Two kinds of surrogate models of expensive objective 8. S. Hosder, L. T. Watson, B. Grossman: Polynomial functions for mixed-variable continuous and discrete response surface approximations for the multidisci- optimization were presented in this paper. Both of plinary design optimization of a high speed civil trans- port. Optimization and Engineering 2(4), 2001, 431– them make use of RBF networks; the first model fo- 452. cuses training of the RBF networks using clustering on 9. Y. Jin: A comprehensive survey of fitness approxima- the discrete part of the data while the second builds tion in evolutionary computation. Soft Computing – GLM on the discrete input variables. Detailed algo- A Fusion of Foundations, Methodologies and Applica- rithms for training the models were provided. Results tions 9(1), 2005, 3–12. of testing on three different datasets showed that espe- 10. Y. Jin, M. Hüsken, M. Olhofer, B. Sendhoff: Neural cially the second model is a competitive kind of regres- networks for fitness approximation in evolutionary op- sion for costly objective functions. Using the model on timization. Knowledge Incorporation in Evolutionary the benchmark fitness function resulted in saving up Computation, 2005, 281. to 70 per cent of the original evaluations or 10 per cent 11. Q. Jun-Fei, Han Hong-Gui: A repair algorithm for increase of the final solution quality. radial basis function neural network and its application to chemical oxygen demand modeling. International One of the most similar works dealing with surro- Journal of Neural Systems 20(1), 2010, 63–74. gate models is the paper of Zhou [22]. He uses RBF 12. Cheol-Taek Kim, Ju-Jang Lee: Training two-layered networks as a local surrogate model in combination feedforward networks with variable projection method. with a global model based on Gaussian processes. IEEE Transactions on Neural Networks 19(2), 2008, Other literature employs polynomials [8], Gaussian 371–375. processes [4], or multilayer perceptron networks [10], 13. P. McCullagh, J. A. Nelder: Generalized linear models. but most publications consider only continuous or con- Chapman & Hall, 1989. tinuous and integer optimization. 14. S. Möhmel, N. Steinfeldt, S. Endgelschalt, M. Holeňa, S. Kolf, U. Dingerdissen, D. Wolf, R. Weber, M. Bewersdorf: New catalytic materials for the References high-temperature synthesis of hydrocyanic acid from methane and ammonia by high-throughput approach. 1. L. Bajer, M. Holeňa: Surrogate model for continuous Applied Catalysis A: General 334, 2008, 73–83. and discrete genetic optimization based on RBF net- 15. P. Narasimha, W. H. Delashmit, M. T. Manry, Jiang works. In C. Fyfe et al.,(Ed.), Intelligent Data En- Li, F. Maldonado: An integrated growing-pruning gineering and Automated Learning – IDEAL 2010, method for feedforward network training. Neurocom- vol. 6283 of Lecture Notes in Computer Science, puting 71(13-15), August 2008, 2831–2847. pp. 251–258. Springer, September 2010. 16. M. Olhofer, T. Arima, T. S. B. Sendhoff, G. Japan: 2. J. L. Bernier, J. Ortega, I. Rojas, E. Ros, A. Prieto: Optimisation of a stator blade used in a transonic com- Obtaining fault tolerant multilayer perceptrons using pressor cascade with evolution strategies. Evolutionary an explicit regularization. Neural Processing Letters, Design and Manufacture, 45–54, 2000. 12(2), October 2000, 107–113. 17. Y. S. Ong, P. B. Nair, A. J. Keane, K. W. Wong: 3. A. J. Booker, J. Dennis, P. D. Frank, D. B. Serafini, V. Surrogate-assisted evolutionary optimization frame- Torczon, M. Trosset: A rigorous framework for op- works for high-fidelity engineering design problems. timization by surrogates. Structural and Multidisci- Knowledge Incorporation in Evolutionary Computa- plinary Optimization 17, 1999, 1–13. tion, 307–332, 2004. 8 Lukáš Bajer, Martin Holeňa 18. L. Prechelt: Proben1: A set of neural network bench- mark problems and benchmarking rules. Technical Re- port 21/94, 1994. 19. G. Singh, R. V. Grandhi: Mixed-variable optimization strategy employing multifidelity simulation and surro- gate models. AIAA Journal 48(1), 2010, 215–223. 20. S. Valero, E. Argente, V. Botti: DoE framework for catalyst development based on soft computing tech- niques. Computers & Chem. Engineer. 33(1), 2009, 225–238. 21. A. Younis, Z. M. Dong: Global optimization using mixed surrogate models for computation intensive de- signs. In 2nd International Symposium on Computa- tional Mechanics Hong Kong 1233, 2010, 1600–1605. 22. Z. Zhou, Y. S. Ong, P. B. Nair, A. J. Keane: Com- bining global and local surrogate models to accelerate evolutionary optimization. IEEE Transactions on Sys- tems, Man, and Cybernetics, Part C: Applications and Reviews 37(1), 2007, 66–76.