RBF-based surrogate model for evolutionary optimization⋆

                                          Lukáš Bajer1,2 and Martin Holeňa2
                               1
                                 Faculty of Mathematics and Physics, Charles University
                                    Malostranské nám. 25, Prague 1, Czech Republic
                                                    bajer@cs.cas.cz
                     2
                       Institute of Computer Science, Academy of Sciences of the Czech Republic
                                  Pod Vodárenskou věžı́ 2, Prague 8, Czech Republic
                                                   martin@cs.cas.cz

Abstract. Many today’s engineering tasks use approx-   both continuous and discrete variables [16, 7]. This
                                                       paper describes a particular surrogate model based
imation of their expensive objective function. Surrogate
models, which are frequently used for this purpose, can save
                                                       on radial basis function (RBF) networks and gener-
significant costs by substituting some of the experimental
                                                       alized linear models (GLMs). Most of the exist-
evaluations or simulations needed to achieve an optimaling works [22, 17, 9] deal with only continuous do-
or near-optimal solution. This paper presents a surrogate
                                                       mains or combination with integer variables, but the
model based on RBF networks. In contrast to the most of
                                                       works dealing with mixed-variables surrogate models
the surrogate models in the current literature, it can be di-
                                                       are rather few [21, 19].
rectly used for problems with mixed continuous and discrete
                                                           In our model, multiple RBF networks are trained
variables – clustering and generalized linear models are em-
                                                       and discrete variables are used either for focusing
ployed for dealing with discrete covariates. The model has
been tested on a benchmark optimization problem and itstraining of the networks on the most appropriate data,
                                                       or generalized linear model is constructed on this part
approximation properties are presented on a real-world ap-
plication data.                                        of the data.
                                                           The paper is organized as follows: in the next sec-
                                                       tion, we recall principles of surrogate modelling, RBF
1 Introduction                                         networks and GLMs. Section 3 describes our approach
                                                       to constructing a surrogate models and using it in op-
Optimization of different kinds of empirical objective
                                                       timization. Finally, Section 4 provides the results of
functions is included in many of todays engineering
                                                       testing on a benchmark function and real-world data.
or industrial applications – in situations where the
value of the objective function is obtained through
some measurement, experiment or simulation. High 2 Problem description
costs or extensive time demands needed for evaluat-
ing such functions motivate engineers to reduce the For any given objective function f : S → IR,
number of such evaluations.                            we consider the mixed-variable optimization problem
    Surrogate modelling [3, 6] is a popular approach (maximization) as finding the global optimum x⋆ =
which substitutes an approximating model for some (x(C)                (C)  (D)          (D)
                                                          1 , . . . , xn , x1 , . . . , xd ) ∈ S such that
of the original function evaluations. This concept is
widely used in connection with evolutionary algo-                            f (x⋆ ) = max f (x).          (1)
                                                                                          x∈S
rithms (EAs). Here, some of the individuals are as-
sessed with not necessary accurate, but much faster The search space S has of n continuous and d dis-
model. This brings an important benefit: a notably crete variables; forming corresponding subspaces S(C)
larger population can be evolved in parallel. Even and S(D) . In addition, we suppose that the value sets
though the precise evaluation can be made only on Vs (Xi(D) ), i = 1, . . . , d of the discrete variables are
a limited number of individuals, the EA can explore finite and we do not distinguish between ordinal or
a larger part of the input space.                      nominal categorical variables – we assume no ordering
    Lots of current literature covers optimization in on any of the Vs (Xi(D) ).
continuous, or in discrete domains. However, the area
of industrial optimization is often characterized by
                                                       2.1 Involved methods
⋆
    This work was supported by the Grant Agency
    of the Charles University (GA UK), grant number             Surrogate modelling. Approximation of the fitness
    278511/2011 (Lukáš Bajer), and by the Czech Science       function with some regression model is a common cure
    Foundation (GA CR), grant number 201/08/0802 (Mar-          for tasks when empirical objective function has to be
    tin Holeňa).                                               used. These surrogate models simulate behaviour of
4       Lukáš Bajer, Martin Holeňa

the original function while being much cheaper and            3     Our strategy for using
much less time consuming to evaluate.                               surrogate-assisted genetic
    As a surrogate model, mainly nonlinear regression               optimization
models are used, for example gaussian processes [4]
or artificial neural networks. In connection with evo-        Our version of the surrogate-assisted genetic algorithm
lutionary optimization, neural networks of the type           including a detailed pseudo-code has been introduced
multilayer perceptrons [10] and networks with radial          in the previous article [1]. This section describes the
basis functions [22, 17] have been particularly popular.      construction and using of surrogate models based on
The last mentioned kind of neural networks underlies          RBF networks.
also the model reported in this paper.
    Combining of the original fitness function and
the surrogate model is determined by evolution con-           3.1    Model construction
trol (EC). In the literature [10], individual and gener-      RBF networks, which were defined in Section 2.1,
ation based approaches are distinguished. While the           enable us to use only continuous variables for their
individual-based EC chooses for evaluation by the
                                                              fitting. Construction of our first surrogate model [1]
original fitness only part of an enlarged population,         starts with clustering of the available training data
the generation-based approach evaluates in different
                                                              according to their discrete values into several clus-
generations the whole population by either the origi-
                                                              ters in order to focus the RBF networks training
nal, or the model fitness.                                    on the most similar datapoints. Let us call this
                                                              model RBF/discrete clustering, or shortly RBF/DSCL
RBF networks compute a mapping from the input
                                                              model. Subsequently, separate RBF networks are fit-
space (typically a subspace of IRn ) to IR (for simplicity
                                                              ted with the data of each such a cluster using the
we will focus on versions with scalar output) [5]. The
                                                              datapoints’ continuous variables. The algorithm is the
mapping can be expressed as
                                                              same as described on the Fig. 1 except the omitted
                         g
                         X                                    steps (1)–(3), and the clustering which is made using
               f (x) =         πi fi (||x − ci ||)     (2)    discrete values from the training database D in the
                         i=1
                                                              step (4).
where x ∈ IRn is the input, g ∈ IN the number of                   This approach does not utilize relation between
components, fi : IRn → IR are radial basis functions,         values of the discrete input variables and the response
πi ∈ IR their weights, ci ∈ IRn radial functions’ cen-        variable. As was stated in Section 2.1, such a rela-
tres, and ||.|| is a norm. As functions fi , Gaussian         tion can be expressed by generalized linear models,
functions with scalar width δi and euclidean norm             and these models form an important part of our new
                                 2
fi (x; ci , δi ) = e−δi ||x−ci || are used most commonly.     RBF/GLM surrogate model.
                                                                   Training the RBF/GLM model starts with con-
Generalized linear models are a natural generalization        struction of two auxiliary models: the first, global RBF
of classical linear regression models [13]. They con-         network fˆRBF : S(C) → IR is fitted on the continuous
sist of three parts: (1) the random component – in-           input variables while the second, GLM fˆGLM : S(D)→IR
dependent observed values Y following a distribution          is built using the discrete variables. Both of them make
from the exponential family with mean E(Y) = µ and            use of all the available training data and regress the
constant variance σ 2 , (2) the systematic component          response-variable values.
which relates values of explanatory (input) variables
(x1 , x2 , . . . , xd ) through a linear model with parame-
                                                        Global RBF network. Training of the auxiliary
ters β1 , . . . , βd                                    RBF network works similarly to the training of the
                                Xd
                                                        RBF networks in the previous RBF/DSCL model [1]
                            η=      xj βj               (3)
                                                        – the same starting values for centers and weights, and
                           j=0                          cross-validation for choosing the best number of com-
to a linear predictor η, and (3) a link function g that ponents g is used. However, instead of clusters, all the
connects the random and systematic components to- data in the database D are used at once.
gether: η = g(µ). The explanatory variables are usu-
ally supplemented with the constant vector of ones GLM model. Generalized linear model is used in its
corresponding to an intercept parameter β0 .            continuous-response form and responses are supposed
    GLMs are particularly useful for our work because from normal distribution Y ∼ N (µ, σ 2 ). Even though
they are able to express a relation between discrete the latter assumption generally does not hold, GLMs
(integer or after a transformation of values even nom- still provide useful mean of regression expressed on the
inal) input variables and a continuous response.        basis of the discrete values.
                                                                      RBF-based surrogate model for evolutionary . . .                   5

   Before using or fitting the GLM, the discrete values             network have, but the more distinct discrete values are
must be converted to a proper representation. Since we              usually grouped together in one cluster.
do not expect any ordering on the discrete values, we                  One separate RBF network rbf j is trained on the
have chosen dummy coding [13] which establishes one                 data of each cluster Cj , j = 1, . . . , m. The maxi-
binary indicating variable Iij ∈ {0, 1} for each nomi-              mal number of components of each network is upper-
                                          (D)
nal value from the value sets Vs (Xi ), i = 1, . . . , d,           bounded by gjmax = ⌊( k−1
                                                                                            k |Cj |)/ρ⌋. Training these net-
                     (D)
j = 1, . . . , |Vs (Xi )| of the original discrete variables.       works is analogous to training of the global RBF net-
Assignment between the original discrete values and                 work described in Section 3.1. The only difference is in
the dummy coding                                                    the training data: only the data of individual clusters
                                                                    are used for each network.
       dummy : S(D) → {0, 1}|Vs(X1 )|+...+|Vs (Xd )|          (4)
                                                             3.2 Evaluation with the surrogate model
has to be recorded for evaluation with the surrogate
model.                                                       Once the surrogate model is built, it can be used
                                                             for evaluating individuals resulting from the evolu-
Final RBF clustered model. Having created the tion. The parameters of the model can m                                      be summa-
                        ˆ
global RBF network fRBF and the GLM model fGLM ,    ˆ        rized      as   {rbf  GLOB  , glm, (rbf j , mse  j , diff j )j=1  }. Here,
we can proceed with the construction of the final RBF        rbf  GLOB       are   global  RBF   network      parameters,        glm =
clustered surrogate model f : S ˆ   (C)
                                        → IR. The process    (β 0 , . . . , β r ) are parameters     of   the   GLM,       rbf j global
starts with clustering of the training data from the         RBF       network      parameters,    mse    j are    the   MSE    CV  ob-
                   (D)    (C)     N                          tained from cross-validation, and diff j are the differ-
database D = {xi , xi , yi }i=1 according to the dif-
                                                             ence diff (5) averaged on the j-th cluster’s data.
ference between responses of the two auxiliary models
                                                                  Given a new individual (x̃(C) , x̃(D) ), evaluation
on the corresponding input variables (for i = 1, . . . , N )
                                                             with the surrogate model starts with computing
             ˆ       (C)      ˆ               (D)            the difference between responses of the global RBF
    diff i = fRBF (xi ) − fGLM (dummy(xi )).             (5)
                                                             network and GLM with corresponding parameters
    The sizes of the clusters have to be at least smin rbf GLOB and glm
– the minimal number of data needed for fitting one
                                                                              g = fˆRBF (x̃(C) ; rbf
                                                                             diff
RBF network. This number is provided by the user                                                         GLOB )
and its best value depends on a particular task. The                                  −fˆGLM (dummy(x̃(D) ); glm).                  (6)
higher the smin is, the more components can each RBF
                                                             Based on this value, the index c of the cluster with
                                                             the average difference most similar to the individual’s
                                                             difference is obtained
   FitTheModel(smin, D, e)
   Arguments: smin – min. size of clusters,                                                               g |.
                                                                                    c = arg min |diff j − diff                          (7)
                                                                                             j=1,...,m
      D – database, e – type of error estimate:
      MSE, AIC, or BIC                                              Finally, the response of the c-th final RBF network is
   Steps of the procedure:
                                                                    used as a return value of the surrogate model
    (1) (fˆRBF , rbf GLOB ) ← fit the global RBF
    (2) (fˆGLM , glm) ← fit the GLM                                                                  ⋆
                                                                                                    gc
                                         ˆ       ˆ
                                                                                                    X
    (3) {diff i }N
                 i=1 ← differences (fRBF − fGLM ) on D                   ỹ = fˆ(x̃(C) ; rbfc ) =         πic fic (||x̃(C) − cic ||).   (8)
    (4) {Cj }mj=1  ←   cluster  D   into clusters of size                                           i=1
           at least smin according to {diff i }N   i=1
    (5) for each cluster Cj , j = 1, . . . , m                      If more than one cluster is at the same distance
    (6)     for gj = 1, . . . , gjmax                               from the individual, the RBF network with the lowest
    (7)          mse[j, gj ] ← average MSECV from                   MSECV is chosen.
                   fitting RBF with gj components
    (8)     gj⋆ ← the number of components
               of the best RBF                                      4     Implementation and results of
    (9)     rbf j ← retrained RBF network                                 testing
               with gj⋆ components
   (10)     mse j ← mse[j, gj⋆ ]
                                                                    Our algorithms were implemented in the MATLAB
   Output: {rbf GLOB , glm, (rbf j , mse j , diff j )mj=1 }
                                                                    environment. We have been utilizing the Global Opti-
                                                                    mization Toolbox which provided us with a platform
       Fig. 1. Pseudo-code of the fitting procedure.                for testing the model on a benchmark optimization
6      Lukáš Bajer, Martin Holeňa

task. Similarly, our hierarchical clustering method                                       Valero, RBF/GLM                                             Valero, RBF/DSCL
extends the cluster analysis from the Statistical Tool-
box which provide us with GLM fitting procedure, too,                         400                                                         400


                                                           Model fitness


                                                                                                                       Model fitness
and we employ a nonlinear curve-fitting from the
Optimization Toolbox for fitting RBF networks.
                                                                              300                                                         300

4.1   Model fitting
                                                                              200                                                         200
Our models have been tested on three different kinds                                    200       300       400                                     200       300       400
                                                                                              Original fitness                                            Original fitness
of data. The first two datasets (Valero and HCN) are
the same as in our last article [1], the third is the
building1 dataset from Proben1 [18] collection.                                               HCN synthesis                                               HCN synthesis
                                                                                  80                                                          80
    Valero’s [20] benchmark fitness function was con-
structed to resemble empirical fitness functions from


                                                                  Model fitness


                                                                                                                              Model fitness
                                                                                  60                                                          60
chemical engineering. The surrogate models have been
10-times trained on dataset with 2000 randomly                                    40                                                          40
generated data. Using the same settings for fitting,
the average root of the MSE (RMSE) of the new                                     20                                                          20
RBF/GLM model has been only slightly decreased.
(see Table 1 and the top graphs on Fig. 2).                                                20     40     60       80                                   20     40     60       80
                                                                                            Original fitness                                            Original fitness


                  RBF/GLM 14.046 ± 1.0435                                                        building1                                                   building1
        Valero
                  RBF/DSCL 14.499 ± 1.518
                                                                                  0.7                                                         0.7
                  RBF/GLM 10.340 ± 1.866
        HCN
                                                              Model fitness


                                                                                                                          Model fitness
                  RBF/DSCL 15.620 ± 1.519                                         0.6                                                         0.6
                  RBF/GLM 0.06407 ± 0.00496
        building1                                                                 0.5                                                         0.5
                  RBF/DSCL 0.13618 ± 0.00455
                                                                                  0.4                                                         0.4
Table 1. Surrogate-models’ average regression RMSE on
Valero’s benchmark, HCN data and building1 dataset.                               0.3                                                         0.3
                                                                                        0.3 0.4 0.5 0.6 0.7                                         0.3 0.4 0.5 0.6 0.7
                                                                                            Original fitness                                            Original fitness


    The second dataset is from a real application in       Fig. 2. Scatter plots of the RBF/GLM (left column) and
chemical engineering (cf. using RBF networks in this       RBF/DSCL model (right column) on testing data.
application area e.g. in the work of Jun [11]): the op-
timization of chemical catalysts for Hydrocyanic acid
(HCN) synthesis [14]. Solutions of this task are com-      the relation between discrete variables and the out-
posed of two discrete and 11 continuous variables, the     put. Conversely, the results of the RBF/GLM model
whole dataset has 696 items. Fitting results are sub-      are at least comparable to the results reported
stantially different from the benchmark problem (con-      elsewhere [12, 2, 15].
sidering the average response in the dataset ȳ = 31.17,
the measured RMSE’s are relatively much higher:
see middle row of graphs on Fig. 2). RMSE of the
                                                           4.2                      Genetic algorithm performance on the
new RBF/GLM model has been decreased by nearly
                                                                                    benchmark fitness
35 % comparing to the previous model’s error.
    Prechelt’s Proben1 [18] is a popular collection of
datasets for data mining, originally intended for neural   The benchmark fitness enabled us to test the model
networks benchmarking. We have tested our models on        with the GA [1]. As shown in Table 2, the GA with
the building1 dataset using the first response variable    the surrogate model reaches on this function the same
indicating electrical energy consumption in a building;    fitness values as the non-surrogate GA using only less
multiple-trained on the first 3156 and tested on the       than 30 per cent of the original fitness function eval-
remaining 1052 data, as suggested by Prechelt. Aver-       uations (generation-based EC), or it is able to find
age results from ten trainings show that the former        1.1-times better solution with 80 per cent of the orig-
RBF/DSCL model is not able to sufficiently express         inal fitness evaluations (individual-based EC).
                                                                RBF-based surrogate model for evolutionary . . .          7

     EC settings of          fitness of   number of             4. D. Buche, N. N Schraudolph, P. Koumoutsakos: Ac-
     the surrogate        the best found orig. fitness             celerating evolutionary algorithms with gaussian pro-
     model                  individual evaluations                 cess fitness function models. IEEE Trans. on Systems,
     without model         486.38 ± 56.5 4130 ± 1546               Man, and Cybernetics, Part C: Applications and Re-
     individual-based      544.73 ± 3.9 3241 ± 926                 views 35(2), 2005, 183–194.
     generation-based      490.28 ± 44.9 1185 ± 358             5. M. D. Buhmann: Radial basis functions: theory and
                                                                   implementations. Cambridge Univ. Press, 2003.
Table 2. GA performance without surrogate model and             6. D. Gorissen, I. Couckuyt, P. Demeester, T. Dhaene,
with the RBF/DSCL-based model; average results from                K. Crombecq: A surrogate modeling and adaptive sam-
100 runs of the algorithm                                          pling toolbox for computer based design. Journal of
                                                                   Machine Learning Research 11, 2010, 2051–2055.
                                                                7. M. Holeňa, T. Cukic, U. Rodemerck, D. Linke: Op-
                                                                   timization of catalysts using specific, description-based
5    Conclusion                                                    genetic algorithms. Journal of Chemical Information
                                                                   and Modeling 48(2), 2008, 274–282.
Two kinds of surrogate models of expensive objective            8. S. Hosder, L. T. Watson, B. Grossman: Polynomial
functions for mixed-variable continuous and discrete               response surface approximations for the multidisci-
optimization were presented in this paper. Both of                 plinary design optimization of a high speed civil trans-
                                                                   port. Optimization and Engineering 2(4), 2001, 431–
them make use of RBF networks; the first model fo-
                                                                   452.
cuses training of the RBF networks using clustering on
                                                                9. Y. Jin: A comprehensive survey of fitness approxima-
the discrete part of the data while the second builds              tion in evolutionary computation. Soft Computing –
GLM on the discrete input variables. Detailed algo-                A Fusion of Foundations, Methodologies and Applica-
rithms for training the models were provided. Results              tions 9(1), 2005, 3–12.
of testing on three different datasets showed that espe-       10. Y. Jin, M. Hüsken, M. Olhofer, B. Sendhoff: Neural
cially the second model is a competitive kind of regres-           networks for fitness approximation in evolutionary op-
sion for costly objective functions. Using the model on            timization. Knowledge Incorporation in Evolutionary
the benchmark fitness function resulted in saving up               Computation, 2005, 281.
to 70 per cent of the original evaluations or 10 per cent      11. Q. Jun-Fei, Han Hong-Gui: A repair algorithm for
increase of the final solution quality.                            radial basis function neural network and its application
                                                                   to chemical oxygen demand modeling. International
    One of the most similar works dealing with surro-
                                                                   Journal of Neural Systems 20(1), 2010, 63–74.
gate models is the paper of Zhou [22]. He uses RBF
                                                               12. Cheol-Taek Kim, Ju-Jang Lee: Training two-layered
networks as a local surrogate model in combination                 feedforward networks with variable projection method.
with a global model based on Gaussian processes.                   IEEE Transactions on Neural Networks 19(2), 2008,
Other literature employs polynomials [8], Gaussian                 371–375.
processes [4], or multilayer perceptron networks [10],         13. P. McCullagh, J. A. Nelder: Generalized linear models.
but most publications consider only continuous or con-             Chapman & Hall, 1989.
tinuous and integer optimization.                              14. S. Möhmel, N. Steinfeldt, S. Endgelschalt, M. Holeňa,
                                                                   S. Kolf, U. Dingerdissen, D. Wolf, R. Weber,
                                                                   M. Bewersdorf:       New catalytic materials for the
References                                                         high-temperature synthesis of hydrocyanic acid from
                                                                   methane and ammonia by high-throughput approach.
 1. L. Bajer, M. Holeňa: Surrogate model for continuous           Applied Catalysis A: General 334, 2008, 73–83.
    and discrete genetic optimization based on RBF net-        15. P. Narasimha, W. H. Delashmit, M. T. Manry, Jiang
    works. In C. Fyfe et al.,(Ed.), Intelligent Data En-           Li, F. Maldonado: An integrated growing-pruning
    gineering and Automated Learning – IDEAL 2010,                 method for feedforward network training. Neurocom-
    vol. 6283 of Lecture Notes in Computer Science,                puting 71(13-15), August 2008, 2831–2847.
    pp. 251–258. Springer, September 2010.                     16. M. Olhofer, T. Arima, T. S. B. Sendhoff, G. Japan:
 2. J. L. Bernier, J. Ortega, I. Rojas, E. Ros, A. Prieto:         Optimisation of a stator blade used in a transonic com-
    Obtaining fault tolerant multilayer perceptrons using          pressor cascade with evolution strategies. Evolutionary
    an explicit regularization. Neural Processing Letters,         Design and Manufacture, 45–54, 2000.
    12(2), October 2000, 107–113.                              17. Y. S. Ong, P. B. Nair, A. J. Keane, K. W. Wong:
 3. A. J. Booker, J. Dennis, P. D. Frank, D. B. Serafini, V.       Surrogate-assisted evolutionary optimization frame-
    Torczon, M. Trosset: A rigorous framework for op-              works for high-fidelity engineering design problems.
    timization by surrogates. Structural and Multidisci-           Knowledge Incorporation in Evolutionary Computa-
    plinary Optimization 17, 1999, 1–13.                           tion, 307–332, 2004.
8       Lukáš Bajer, Martin Holeňa

18. L. Prechelt: Proben1: A set of neural network bench-
    mark problems and benchmarking rules. Technical Re-
    port 21/94, 1994.
19. G. Singh, R. V. Grandhi: Mixed-variable optimization
    strategy employing multifidelity simulation and surro-
    gate models. AIAA Journal 48(1), 2010, 215–223.
20. S. Valero, E. Argente, V. Botti: DoE framework for
    catalyst development based on soft computing tech-
    niques. Computers & Chem. Engineer. 33(1), 2009,
    225–238.
21. A. Younis, Z. M. Dong: Global optimization using
    mixed surrogate models for computation intensive de-
    signs. In 2nd International Symposium on Computa-
    tional Mechanics Hong Kong 1233, 2010, 1600–1605.
22. Z. Zhou, Y. S. Ong, P. B. Nair, A. J. Keane: Com-
    bining global and local surrogate models to accelerate
    evolutionary optimization. IEEE Transactions on Sys-
    tems, Man, and Cybernetics, Part C: Applications and
    Reviews 37(1), 2007, 66–76.