Boosted surrogate models in evolutionary optimization?

                                                      Martin Holeňa

                      Institute of Computer Science, Academy of Sciences of the Czech Republic,
         Pod Vodárenskou věžı́ 2, 18207 Praha 8, Czech Republic, martin@cs.cas.cz, web: cs.cas.cz/~martin

Abstract. The paper deals with surrogate modelling,              Indeed, the impossibility to compute analytically the
a modern approach to the optimization of empirical ob-           function values of such a function makes also an ana-
jective functions. The approach leads to a substantial de-       lytical computation of its gradient and second-order
crease of time and costs of evaluation of the objective func-    derivatives impossible, whereas measurement errors
tion, a property that is particularly attractive in evolution-   usually hinder obtaining sufficiently accurate estima-
ary optimization. In the paper, an extension of surrogate
                                                                 tes of the derivatives.
modelling with regression boosting is proposed. Such an ex-
tension increases the accuracy of surrogate models, thus             Like other methods relying solely on function val-
also the agreement between results of surrogate modelling        ues, evolutionary algorithms need the objective func-
and results of the intended optimization of the original ob-     tion to be evaluated in quite a large number of points.
jective function. The proposed extension is illustrated on       In the context of optimization of empirical objective
a case study in the area of searching catalytic materials op-    functions, this can be quite disadvantageous because
timal with respect to their behaviour in a particular chem-      the evaluation of such a function in the points form-
ical reaction. A genetic algorithm developed specifically for    ing one generation of an evolutionary algorithm is of-
this application area is employed for optimization, multi-       ten costly and time-consuming. Hence, the above men-
layer perceptrons serve as surrogate models, and a method        tioned advantages of using evolutionary algorithms for
called AdaBoost.R2 is used for boosting. Results of the case
                                                                 the optimization of empirical objective functions are
study clearly confirm the usefulness of boosting for surro-
gate modelling.
                                                                 frequently counterbalanced by considerably high costs
                                                                 and time needed for the evaluation of such functions.
                                                                 An area, where the trade-off between successful op-
1      Introduction                                              timization and costly objective function evaluations
                                                                 plays a crucial role, is the computer-aided search for
For more than two decades, evolutionary algorithms,              new materials and chemicals optimal with respect to
especially their most frequently encountered represen-           certain properties [2]. Here, evolutionary algorithms
tative – genetic algorithms, belong to the most suc-             are used in more than 90 % of optimization tasks,
cessful methods for solving difficult optimization tasks         and the rarely encountered alternatives are simulated
[3, 11, 31, 32, 42]. The popularity of evolutionary algo-        annealing [9, 22, 23], simplex method [17], and holo-
rithms is to some extent due to their biological inspi-          graphic search strategy [37, 38, 41], which also use sole-
ration, which increases their comprehensibility out-             ly function values, therefore needing a similarly high
side computer science. Nevertheless, they share sev-             number of objective function evaluations as evolution-
eral purely mathematical properties of all stochastic            ary algorithms. Testing a generation of materials or
optimization methods, most importantly, the valuable             chemicals typically needs hours to days of time and
ability of to escape a local optimum and continue the            costs hundreds to thousands euros. Therefore, the evo-
search for a global one, and the restriction of the infor-       lutionary optimization rarely runs for more than ten
mation on which they rely to function values only.               generations.
Consequently, they do not need information about gra-                The usual approach to decreasing the cost and
dients or second-order partial derivatives, differently          time of optimization of empirical objective functions
to smooth optimization methods (such as steepest de-             is to evaluate the objective function only sometimes
scent, conjugate gradient methods, the popular Le-               and to evaluate a suitable regression model of that
venberg-Marquardt method, etc. ). This makes them                function otherwise. The employed model is termed
particularly attractive for the optimization of empiri-          surrogate model of the empirical objective function,
cal objective functions, the values of which cannot be           and the approach is referred to as surrogate modelling.
analytically computed, but have to be obtained ex-               Needless to say, the time and costs needed to evalu-
perimentally, through some measurement or testing.               ate a regression model are negligible compared to an
?
    The research reported in this paper has been supported       empirical objective function. However, it must not be
    by the grant No. 201/08/1744 of the Grant Agency of          forgotten that the final optimized function coincides
    the Czech Republic and partially supported by the Insti-     with the original empirical objective function only in
    tutional Research Plan AV0Z10300504.                         some points, whereas in the remaining points it coin-
16      Martin Holeňa

cides only with its surrogate model. Consequently, the      that is known to be useful in general [19, 20, 29]. In evo-
agreement between the results of surrogate modelling        lutionary algorithms, most important for the progress
and the results of the intended optimization of the         of the method are on the one hand points that best in-
original objective function depends on the accuracy of      dicate the global optimum (typically through highest
the approximation of the original objective function        values of the fitness function), on the other hand points
by the surrogate model.                                     that most contribute to the diversity of the population.
    This paper suggests to increase the accuracy of sur-        In the context of evolutionary optimization, surro-
rogate models by means of boosting. Boosting is a pop-      gate modelling has the following main steps:
ular approach to increasing the accuracy of classifica-     (i) Collecting an initial set of points in which the ob-
tion, and due to the success of classification boosting,         jective function has already been empirically eval-
also several methods of regression boosting have al-             uated. This can be the first generation or several
ready been proposed. However, so far no attempt has              first generations of the evolutionary algorithm, but
been reported to combine regression boosting with sur-           such points are frequently available in advance.
rogate modelling. Hence, the purpose of the research        (ii) Approximating the objective function by a surro-
reported in the paper is basically a proof of concept: to        gate model, with the use of the set of all points in
extend surrogate modelling through the incorporation             which it has been empirically evaluated.
of regression boosting, and to validate that extension      (iii) Running the evolutionary algorithm for a popu-
on several sufficiently complex case studies. One of             lation considerably larger than is the desired pop-
those case studies is described in the paper.                    ulation size, with the empirical objective function
    In the following section, basic principles of surro-         replaced by the surrogate model.
gate modelling and its strategies in evolutionary opti-     (iv) Forming the next generation of the desired size
mization are recalled, and important surrogate models            as a subset of the large population obtained in
are listed. Section 3 recalls the principles of boosting         the preceding step that includes points most im-
and explains a particular method of regression boost-            portant according to considered criteria for the
ing that will be employed later in a case study in mate-         progress of optimization (such as indication of glo-
rials science. That case study is sketched and its main          bal optimum, diversity).
results are presented in Section 4.                         (v) Empirically evaluating the objective function in
                                                                 all points that belong to the next generation of
                                                                 the desired size, and returning to step (ii).
2     Surrogate modelling                                       Actually, the above steps (ii)–(v) correspond to
Surrogate modelling is a general approach to the op-        only one possible strategy of surrogate modelling in
timization of costly objective functions in which the       evolutionary optimization: the individual-based con-
evaluation of the objective function is restricted to       trol, sometimes also referred to as pre-selection [40].
points that are considered to be most important for         An alternative strategy to the steps (ii)–(v) is to run
the progress of the employed optimization method [5,        the algorithm for only the desired population size, in-
25, 27, 30, 39, 40]. It is most frequently encountered in   terleaving one generation/several generations in which
connection with the optimization of empirical objec-        the original objective function is empirically evaluated
tive functions, but has been equally successfully ap-       with a certain number of generations in which the sur-
plied also to expensive optimization tasks in engineer-     rogate model is evaluated. This is the generation-based
ing design in which the objective function is not em-       control of surrogate modelling in evolutionary opti-
pirical, but its evaluation is connected with intensive     mization.
computations [25]. In the context of computer-aided             For empirical objective functions, it is typical to be
search for new materials and chemicals optimal with         highly nonlinear. Therefore, nonlinear regression mod-
respect to certain properties, surrogate modelling can      els should be used as surrogate models. They can be
be viewed as replacing real experiments with simulated      basically divided into two large groups according to
virtual experiments in a computer: such virtual ex-         whether the set of functions among which the sur-
periments are sometimes referred to as virtual screen-      rogate model has to be chosen has an explicit finite
ing [2].                                                    parametrization.
    Although surrogate modelling is a general opti-          1. So far, mostly parametric models have been used
mization approach (cf. its application in the context           for surrogate modelling. From the point of view of
of conventional optimization in [5]), it is most fre-           their role in this context and/or their overall im-
quently encountered in connection with evolutionary             portance, the following kinds of parametric nonlin-
algorithms. The reason is that in evolutionary opti-            ear regression models are most worth mentioning:
mization, the approach leads to the approximation of            (i) Multilayer feed-forward neural networks, more
the landscape of the fitness function, i.e., to a method             precisely, the nonlinear mappings computed
                                                                          Boosted surrogate models . . .      17

by such networks. Their attractiveness for non-
linear regression in general and for surrogate
modelling in particular [20] is due to their uni-
versal approximation capability, which actual-
ly means that linear spaces of functions com-
puted by certain families of multilayer feed-
forward neural networks are dense in some ge-
neral function spaces [18, 21, 26]. For exam-
ple, considering the most common represen-
tative of such networks – multilayer percep-
trons, the linear space formed by all functions
                                                      Fig. 1. Example MLP architecture with two hidden layers,
computed by the family of perceptrons with            used in the case study presented in Section 4.
one hidden layer and infinitely smooth acti-
vation functions is dense in the space Lp (µ)
of functions with the p-th power of absolute
                                                          (ii) Support vector regression based on positive
value finitely integrable with respect to a fi-
                                                               semi-definite kernels [34, 36]. It is worth men-
nite measure µ, in the space C(X) of functions
                                                               tioning that they generalize the above recalled
continuous on a compact X, and in Sobolev
                                                               RBF networks, and also the historically first
spaces generalizing Lp (µ) to functions that are
                                                               kind of nonlinear regression – polynomial re-
differentiable up to a given order. In the ap-
                                                               gression.
plication domain of catalytic materials, from
                                                          (iii) Gaussian process regression [28] is listed here
which the case study presented in Section 4 is
                                                               also due to a relationship to radial basis func-
taken, nearly all examples of regression analy-
                                                               tion networks, but most importantly due to
sis published since mid 1990s rely on multi-
                                                               the fact that it has already been successfully
layer feed-forward neural networks, typically
                                                               employed in surrogate modelling [6].
on multilayer perceptrons (Figure 1). In the
                                                       2. Nonparametric regression models are, in general,
last edition of “Handbook of heterogeneous
                                                          more flexible than parametric models, but the flex-
catalysis”, more than 20 such examples are
                                                          ibility is typically paid for by more extensive com-
listed, as well as several additional, based on
                                                          putations. Therefore, their importance has been
other kinds of such networks – radial basis
                                                          increasing only during the last two decades, fol-
function networks and piecewise-linear neural
                                                          lowing the increasing power of available comput-
networks [16]. Therefore, these three kinds of
                                                          ers [12, 14]. Nevertheless, there is one noteworthy
neural networks are now briefly recalled:
                                                          exception:
  – Multilayer perceptrons (MLPs) can have
                                                          (v) Regression trees have been successfully used
     an arbitrary number of hidden layers, and
                                                               already since the early 1980s [4]. They are ac-
     the basis functions of their linear space
                                                               tually a modification of a classification met-
     of computed functions are constructed by
                                                               hod, therefore the regression function is piece-
     means of sigmoidal activation functions,
                                                               wise-constant. That property accounts for re-
     such as logistic sigmoid, hyperbolic tan-
                                                               latively low computational requirements of re-
     gent, or arctangent [13, 43].
                                                               gression trees, but also decreases their flexibil-
  – Radial basis function (RBF) networks al-
                                                               ity, otherwise the main advantage of nonpara-
     ways have only one hidden layer, and the
                                                               metric methods.
     basis functions of their space of computed
     functions are radial, i.e., the function value
     depends only on the distance of the vector       3     Boosting regression models
     of input values from some centre, specific
     to the function [7].                             Boosting is a method of improving classification ac-
  – Piecewise-linear neural networks are sim-         curacy that consists in developing the classifier iter-
     ply MLPs with piecewise-linear activation        atively, and increasing the relative influence of the
     functions. Their linear space of computed        training data that most contributed to errors in the
     functions is dense only in C(X), but on          previous iterations on its development in the subse-
     the other hand, they allow a straightfor-        quent iterations [33]. The usefulness of boosting for
     ward extraction of logical rules describing      classification has incited its extension to regression [8].
     the relationships between input and out-         Both for classification and for regression, the basic ap-
     put values of the network [15].                  proach to increasing the relative influence of particular
18       Martin Holeňa

training data is re-sampling the training data accord-                         The errors used to asses the quality of the boost-
ing to a distribution that gives them a higher probabil-                   ing approximation are then called boosting errors, e.g.,
ity of occurrence. This is equivalent to re-weighting the                  boosting MSE, or boosting MAE, where MSE refers to
contributions of the individual training pairs (xj , yj ),                 the mean squared error between the computed and
with higher weights corresponding to higher values of                      measured values, whereas MAE refers to the mean
the error measure.                                                         absolute error, i.e., to the mean Euclidean distance
    Since surrogate models are regression models, any                      between them. For simplicity, also the approximation
method for regression boosting (such as [8, 10, 35])                       in the first iteration, F1 , is called boosting approxima-
is suitable for them. In the following, the met-                           tion if boosting is performed, and the respective errors
hod AdaBoost.R2 will be explained in detail, pro-                          are then called boosting errors, although boosting ac-
posed in [8].                                                              tually does not introduce any modifications in the first
    Similarly to other adaptive boosting methods, each                     iteration.
of the available pairs (x1 , y1 ), . . . . . . , (xp , yp ) of input           The above formulation of the method deals only
and output data is in the first iteration of                               with the case Ēi < 0.5. For Ēi ≥ 0.5, the original
AdaBoost.R2 used exactly once. This corresponds to                         formulation of the method in [8] proposes to stop the
re-sampling them according to the uniform probabil-                        boosting. However, that is not allowed if the stopping
ity distribution P1 with P1 (x1 ) = p1 for j = 1, . . . , p.               criterion should be based on an independent set of
In addition, the weighted average error of the 1st iter-                   validation data. Indeed, the calculation of Ēi does not
ation is set to zero, Ē1 = 0.                                             rely on any such independent data set, but it relies
    In the subsequent iterations (i ≥ 2), the following                    solely on the data employed to construct the regres-
sequence of steps is performed:                                            sion model. A possible alternative for the case Ēi ≥ 0.5
                                                                           is reinitialization, i.e., proceeding as in the 1st itera-
 1. A sample (ξ1 , η1 ), . . . , (ξp , ηp ) is obtained through            tion [1].
    re-sampling (x1 , y1 ), . . . , (xp , yp ) according to the                In connection with using feed-forward neural net-
    distribution Pi−1 .                                                    works as surrogate models, it is important to be aware
 2. Using (ξ1 , η1 ), . . . , (ξp , ηp ) as training data, a re-           of the difference between the iterations of boosting
    gression model Fi is constructed.                                      and the iterations of neural network training. Boost-
 3. A [0,1]-valued squared error vector Ei of Fi with
                                                                           ing iterates on a higher level, one iteration of boosting
    respect to (x1 , y1 ), . . . , (xp , yp ) is calculated as
                                                                           includes a complete training of an ANN, which can
                                                                           proceed for many hundreds of iterations. Nevertheless,
       Ei = (Ei (1), . . . , Ei (p)) =                                     both kinds of iterations are similar in the sense that
                ((Fi (x1 ) − y1 )2 , . . . , (Fi (xp ) − yp )2 )           starting with some iteration, over-training is present.
            =                                                    .   (1)
                      maxk=1,...,p (Fi (xk ) − yk )2                       Therefore, also over-training due to boosting can be
                                                                           reduced through stopping in the iteration after which
 4. The weighted average error of the i-th iteration is                    the error for an independent set of data first time in-
    calculated as                                                          creases. Moreover, cross-validation can be used to find
                           1X
                                p                                          the iteration most appropriate for stopping.
                   Ēi =      Pi (xk , yk )Ei (k).                   (2)
                           p
                              k=1
                                                                           4     Case study in materials science
 5. Provided Ēi < 0.5 , the probability distribution for
    re-sampling (x1 , y1 ), . . . , (xp , yp ) is for k = 1, . . . , p The extension of surrogate modelling with boosting
    updated according to                                               will now be illustrated on a case study using data from
                                                                       the investigation of catalytic materials for the high-
      Pi (xk , yk ) =                                                  temperature synthesis of hydrocyanic acid. That in-
                                    ³         ´(1−Ei (k))              vestigation and its results have been recently described
                                        Ēi
                   Pi−1 (xk , yk ) 1−     Ēi
                                                                       in [24]. It has been performed through high-throughput
           =                            ³        ´            .  (3)   experiments in a circular 48-channel reactor. In most
               Pp                            Ēi
                                                   (1−Ei (k))
                      P     (x
                  i=1 i−1 k k    , y  )    1−Ēi
                                                                       of those experiments, the composition of the materials
                                                                       was designed by means of a genetic algorithm devel-
 6. The boosting approximation in the i-th iteration is oped specifically for heterogeneous catalysis [44]. More
    set to the median of the approximations F1 , . . . , Fi precisely, the algorithm was running for 7 generations
    with respect to the probability distribution                       of population size 92, and in addition 52 other cata-
                    µ                            ¶                     lysts with manually designed composition were inves-
                        Ē1                Ēi
                               ,...,                .            (4) tigated. Consequently, data about altogether 696 cat-
                      1 − Ē1           1 − Ēi                        alytic materials were gathered.
                                                                             Boosted surrogate models . . .     19

   The composition and preparation of the investi-        as test data was calculated, and averaged over all the
gated catalytic materials and the conditions in which     604 folds. The criterion according to which boosting is
they had been tested have been in detail described        considered useful to an architecture was: the average
in [24]. Here, only the independent and dependent         boosting MSE in the 2nd iteration has to be lower than
variables are recalled, the latter corresponding to the   in the 1st iteration. The iteration till which the aver-
considered possible objective functions:                  age boosting MSE continuously decreased was then
                                                          taken as the final iteration of boosting.
 – independent variables: material used as support,           According to that criterion, boosting was useful
   and proportions of the 10 metal additives Y, La,       to 9 from the 12 considered architectures with one
   Mo, Re, Ir, Ni, Pt, Zn, Ag, Au (an 11th metal, Zr,     hidden layer and to 65 from the 78 considered archi-
   was left out due to the fact that the proportions      tectures with two hidden layers. To validate the most
   of all active compounds sum up to 100 %);              promising results of the investigation of the useful-
 – dependent variables, i.e., objective functions: con-   ness of boosting in our case study, the data from the
   versions of CH4 and NH3 , and yield of HCN.            7th generation of the genetic algorithm were used. The
    As the surrogate model, MLPs were employed, in        validation included the 5 architectures that were most
accordance with their leading role among nonlinear        promising for boosting from the point of view of the
regression models in the area of catalytic ma-            lowest boosting MSE on test data in the final iteration.
terials [2, 16]. Each considered neural network had       These were the architectures (14,10,6,3), (14,14,8,3),
14 input neurons: 4 of them coding the material used      (14,13,5,3), (14,10,4,3) and (14,11,3), for which the fi-
as support, the other 10 corresponding to the propor-     nal iterations of boosting were 32, 29, 31, 19 and 3, re-
tions of the 10 metal additives belonging to indepen-     spectively. For each of them, the validation proceeded
dent variables; output neurons were 3, corresponding      as follows:
to the possible objective functions (Figure 1).            1. In each iteration up to the final boosting iteration
    The most appropriate MLP architectures were               corresponding to the respective architecture, a sin-
searched by means of cross-validation, using only data        gle MLP was trained with data about the 604 cat-
about catalysts from the 1.–6. generation of the ge-          alytic materials considered during the architecture
netic algorithm and about the 52 catalysts with manu-         search.
ally designed composition, thus altogether data about      2. Each of those MLPs was employed to approximate
604 catalytic materials. Data about catalysts from the        the conversions of CH4 and NH3 and the yield of
7. generation were completely excluded and left out           HCN for the 92 materials from the 7. generation
for validating the search results. To use as much in-         of the genetic algorithm.
formation as possible from the available data, cross-      3. In each iteration, the medians with respect to the
validation was applied as the extreme 604-fold vari-          probability distribution (4) of the approximations
ant, i.e., leave-1-out validation. The set of architec-       of the two conversions and of the HCN yield ob-
tures within which the search was performed was de-           tained up to that iteration were used as the boost-
limited by means of the heuristic pyramidal condition:        ing approximations.
the number of neurons in a subsequent layer must not       4. From the conversions and the yield predicted by
increase the number of neurons in a previous layer.           the boosting approximations, and from the mea-
Denote nI , nh and nO the numbers of input, hidden            sured values, the boosting MSE and MAE were
and output neurons, respectively, and nH1 and nH2             calculated for each MLP.
the numbers of neurons in the first and second hid-
den layer, respectively. Then the pyramidal condition         The boosting errors (MSE and MAE) are sum-
reads:                                                    marized in Figure 2, whereas Figure 3 compares the
(i) for MLPs with 1 hidden layer: nI ≥ nH ≥ nO , in       boosting approximations of the conversions of CH4
    our case 14 ≥ nH ≥ 3 (12 architectures);              and NH3 and of the yield of HCN in the 1st and final
(ii) for MLPs with 2 hidden layers: nI ≥ nH1 ≥            iteration with their measured values. The presented
    nH2 ≥ nO , in our case 14 ≥ nH1 ≥ nH2 ≥ 3             results clearly confirm the usefulness of boosting for
    (78 architectures).                                   the five considered architectures. For each of them,
    To investigate the usefulness of boosting in our      boosting led to an overall decrease of both considered
case study, the same data were used and the same          error measures, the MSE and MAE, on new data from
set of architectures was considered as for architecture   the 7th generation of the genetic algorithm. Moreover,
search. In each iteration, a leave-1-out validation was   the decrease of the MSE (which is the measure em-
performed, in the way briefly outlined in the preceding   ployed during the investigation of the usefulness of
section: The mean squared error of the performance of     boosting) is uninterrupted or nearly uninterrupted till
the catalytic materials serving in the individual folds   the final boosting iteration. On the other hand, the
20      Martin Holeňa

                                                            five most promising architectures, boosting leads to
                                                            an overall decrease of both considered error measures,
                                                            MSE and MAE, on new data from the 7th generation
                                                            of the genetic algorithm. Moreover, the decrease of the
                                                            MSE (which is the boosting error employed during the
                                                            investigation of the usefulness of boosting) is uninter-
                                                            rupted or nearly uninterrupted till the final boosting
                                                            iteration. On the other hand, the scatter plots in Fig-
                                                            ure 3 do not indicate any apparent difference between
                                                            the effect of boosting on the three catalyst properties
                                                            considered as possible objective functions in our case
                                                            study – conversion of CH4 , conversion of NH3 , and
                                                            yield of HCN. Hence, the performed validation con-
                                                            firms the usefulness of boosting irrespectively of which
                                                            of these objective functions is selected.


Fig. 2. History of the boosting MSE and MAE on the data     References
from the 7th generation of the genetic algorithm for MLPs
with the five architectures included in the validation of    1. H. Altinçay: Optimal resampling and classifier proto-
boosting.                                                       type selection in classifier ensembles using genetic al-
                                                                gorithms. Pattern Analysis and Applications 7, 2004,
                                                                285–295.
scatter plots in Figure 3 do not indicate any apparent       2. M. Baerns and M. Holeňa: Combinatorial Develop-
difference between the effect of boosting on the three          ment of Solid Catalytic Materials. Design of High-
properties employed as catalyst performance measures            Throughput Experiments, Data Analysis, Data Mining.
in our case study – conversion of CH4 , conversion of           World Scientific, Singapore, 2009.
NH3 , and yield of HCN. Hence, the performed valida-         3. T. Bartz-Beielstein: Experimental Research in Evolu-
tion confirms the usefulness of boosting irrespectively         tionary Computation. Springer Verlag, Berlin, 2006.
of which of those performance measures is considered.        4. L. Breiman, J.H. Friedman, R.A. Olshen, and
                                                                C.J. Stone: Classification and Regression Trees.
                                                                Wadsworth, Belmont, 1984.
5     Conclusions                                            5. A.J. Brooker, J. Dennis, P.D. Frank, D.B. Serafini,
                                                                V. Torczon, and M. Trosset: A rigorous framework
The paper dealt with surrogate modelling, a mod-                for optimization by surrogates. Structural and Multi-
                                                                disciplinary Optimization, 17, 1998, 1–13.
ern approach to the optimization of empirical objec-
                                                             6. D. Büche, N.N. Schraudolph, and P. Koumoutsakos:
tive functions, which is particularly attractive in evo-
                                                                Accelerating evolutionary algorithms with gaussian
lutionary optimization. It proposed to extend surro-            process fitness function models. IEEE Transactions on
gate modelling with regression boosting, to increase            Systems, Man, and Cybernetics, Part C: Applications
the accuracy of surrogate models, thus also the agree-          and Reviews 35, 2005, 183–194.
ment between results of surrogate modelling and re-          7. M.D. Buhmann: Radial Basis Functions: Theory and
sults of the intended optimization of the original ob-          Implementations. Cambridge University Press, Cam-
jective function. Needless to say, regression boosting          bridge, 2003.
is not new, though it is less common than the popular        8. H. Drucker: Improving regression using boosting tech-
classification boosting. However, novel is its combina-         niques. In A.J.C. Sharkey, editor, Proceedings of the
tion with surrogate models, which adds the advantage            14th International Conference on Machine Learning,
of increased accuracy to the main advantage of sur-             Springer Verlag, London, 1997, 107–115.
rogate modelling – decreasing the time and costs of          9. A. Eftaxias, J. Font, A. Fortuny, J. Giralt, A. Fabregat,
                                                                and F. Stber: Kinetic modelling of catalytic wet air
optimization of empirical objective functions.
                                                                oxidation of phenol by simulated annealing. Applied
    Theoretical principles of both surrogate modelling          Catalysis B: Environmental 33, 2001, 175–190.
and boosting are known, therefore the main purpose of       10. J. Friedman: Greedy function approximation: A gra-
the reported research was to validate the feasibility of        dient boosting machine. Annals of Statistics 29, 2001,
the proposed extension of surrogate modelling on sev-           1189–1232.
eral sufficiently complex case studies, one of which was    11. D. Goldberg: Genetic Algorithms in Search, Optimiza-
sketched in this paper. The presented case study re-            tion and Machine Learning. Addison-Wesley, Reading,
sults clearly confirm the usefulness of boosting. For the       1989.
                                                                                   Boosted surrogate models . . .       21

12. L. Györfi, M. Kohler, A. Krzyzak, and H. Walk:           28. E. Rasmussen and C. Williams: Gaussian Process for
    A Distribution-Free Theory of Nonparametric Regres-           Machine Learning. MIT Press, Cambridge, 2006.
    sion. Springer Verlag, Berlin, 2002.                      29. A. Ratle: Accelerating the convergence of evolution-
13. M.T. Hagan, H.B. Demuth, and M.H. Beale: Neural               ary algorithms by fitness landscape approximation. In
    Network Design. PWS Publishing, Boston, 1996.                 A.E. Eiben, T. Bäck, M. Schoenauer, and H.-P. Schwe-
14. T.J. Hastie and R.J. Tibshirani: Generalized Additive         fel, editors, Parallel Problem Solving from Nature,
    Models. Chapman & Hall, Boca Raton, 1990.                     Springer Verlag, Berlin, 1998, 87–96.
15. M. Holeňa: Piecewise-linear neural networks and their    30. A. Ratle: Kriging as a surrogate fitness landscape in
    relationship to rule extraction from data. Neural Com-        evolutionary optimization. Artificial Intelligence for
    putation 18, 2006, 2813–2853.                                 Engineering Design, Analysis and Manufacturing 15,
16. M. Holeňa and M. Baerns: Computer-aided strategies           2001, 37–49.
    for catalyst development. In G. Ertl, H. Knözinger,      31. C.R. Reeves and J.E. Rowe: Genetic Algorithms: Prin-
    F. Schüth, and J. Eitkamp, editors, Handbook of Het-         ciples and Perspectives. Kluwer Academic Publishers,
    erogeneous Catalysis, Wiley-VCH, Weinheim, 2008.              Boston, 2003.
17. A. Holzwarth, P. Denton, H. Zanthoff, and                 32. R. Schaefer: Foundation of Global Genetic Optimiza-
    C. Mirodatos:       Combinatorial approaches to het-          tion. Springer Verlag, Berlin, 2007.
    erogeneous catalysis: Strategies and perspectives for     33. R. Schapire: The strength of weak learnability. Ma-
    academic research. Catalysis Today 67, 2001, 309–318.         chine Learning 5, 1990, 197–227.
18. K. Hornik: Approximation capabilities of multilayer       34. B. Schölkopf and A.J. Smola: Learning with Kernels.
    neural networks. Neural Networks 4, 1991, 251–257.            MIT Press, Cambridge, 2002.
19. Y. Jin: A comprehensive survery of fitness approxima-     35. D.L. Shrestha: Experiments with AdaBoost.RT, an im-
    tion in evolutionary computation. Soft Computing 9,           proved boosting scheme for regression. Neural Compu-
    2005, 3–12.                                                   tation 18, 2006, 1678–1710.
20. Y. Jin, M. Hüsken, M. Olhofer, and B. Sendhoff:          36. I. Steinwart and A. Christmann: Support Vector Ma-
    Neural networks for fitness approximation in evolu-           chines. Springer Verlag, New York, 2008.
    tionary optimization. In Y. Jin, editor, Knowledge        37. A. Tompos, J.L. Margitfalvi, E. Tfirst, L. Végvári,
    Incorporation in Evolutionary Computation, Springer           M.A. Jaloull, H.A. Khalfalla, and M.M. Elgarni: De-
    Verlag, Berlin, 2005, 281–306.                                velopment of catalyst libraries for total oxidation of
21. P.C. Kainen, V. Kůrková, and M. Sanguineti: Esti-           methane: A case study for combined application of
    mates of approximation rates by gaussian radial-basis         ”holographic research strategy and artificial neural net-
    functions. In Adaptive and Natural Computing Algo-            works” in catalyst library design. Applied Catalysis A:
    rithms, Springer Verlag, Berlin, 2007, 11–18.                 General 285, 2005, 65–78.
22. B. Li, P. Sun, Q. Jin, J. Wang, and D. Ding: A            38. A. Tompos, L. Vǵvári, E. Tfirst, and J.L. Margit-
    simulated annealing study of Si, Al distribution in the       falvi: Assessment of predictive ability of artificial
    omega framework. Journal of Molecular Catalysis A:            neural networks using holographic mapping. Combi-
    Chemical 148, 1999, 189–195.                                  natorial Chemistry and High Throughput Screening
23. A.S. McLeod and L.F. Gladden: Heterogeneous cat-              10, 2007, 121–134.
    alyst design using stochastic optimization algorithms.    39. H. Ulmer, F. Streichert, and A. Zell: Model-assisted
    m Journal of Chemical Information and Computer Sci-           steady state evolution strategies. In GECCO 2003: Ge-
    ence 40, 2000, 981–987.                                       netic and Evolutionary Computation, Springer Verlag,
24. S. Möhmel, N. Steinfeldt, S. Endgelschalt, M. Holeňa,       Berlin, 2003, 610–621.
    S. Kolf, U. Dingerdissen, D. Wolf, R. Weber, and          40. H. Ulmer, F. Streichert, and A. Zell: Model assisted
    M. Bewersdorf:       New catalytic materials for the          evolution strategies. In Y. Jin, editor, Knowledge In-
    high-temperature synthesis of hydrocyanic acid from           corporation in Evolutionary Computation, Springer
    methane and ammonia by high-throughput approach.              Verlag, Berlin, 2005, 333–355.
    Applied Catalysis A: General 334, 2008, 73–83.            41. L. Végvári, A. Tompos, S. Göbölös, and J.F. Mar-
25. Y.S. Ong, P.B. Nair, A.J. Keane, and K.W. Wong:               gitfalvi: Holographic research strategy for catalyst li-
    Surrogate-assisted evolutionary optimization frame-           brary design: Description of a new powerful optimisa-
    works for high-fidelity engineering design problems. In       tion method. Catalysis Today 81, 2003, 517–527.
    Y. Jin, editor, Knowledge Incorporation in Evolution-     42. M.D. Vose: The Simple Genetic Algorithm. Founda-
    ary Computation, Springer Verlag, Berlin, 2005, 307–          tions and Theory. MIT Press, Cambridge, 1999.
    331.                                                      43. H. White: Artificial Neural Networks: Approxima-
26. A. Pinkus: Approximation theory of the MPL model              tion and Learning Theory. Blackwell Publishers, Cam-
    in neural networks. Acta Numerica 8, 1998, 277–283.           bridge, 1992.
27. K. Rasheed, X. Ni, and S. Vattam: Methods for using       44. D. Wolf, O.V. Buyevskaya, and M. Baerns: An evo-
    surrogate modesl to speed up genetic algorithm oprim-         lutionary approach in the combinatorial selection and
    ization: Informed operators and genetic engineering.          optimization of catalytic materials. Applied Catalyst
    In Y. Jin, editor, Knowledge Incorporation in Evolu-          A: General 200, 2000, 63–77.
    tionary Computation, Springer Verlag, Berlin, 2005,
    103–123.
22      Martin Holeňa


Fig. 3. Comparison of the boosting approximations of the conversions of CH4 and NH3 and of the yield of HCN in the
1st and final iteration with their measured values for the 92 catalytic materials from the 7th generation of the genetic
algorithm.