-

Boosted surrogate models in evolutionary optimization?

Martin Holen

martin@cs.cas.cz 0 0 Institute of Computer Science, Academy of Sciences of the Czech Republic , Pod Vod

15 22

The paper deals with surrogate modelling, Indeed, the impossibility to compute analytically the a modern approach to the optimization of empirical ob- function values of such a function makes also an anajective functions. The approach leads to a substantial de- lytical computation of its gradient and second-order crease of time and costs of evaluation of the objective func- derivatives impossible, whereas measurement errors tion, a property that is particularly attractive in evolution- usually hinder obtaining su±ciently accurate estimaary optimization. In the paper, an extension of surrogate tes of the derivatives. modelling with regression boosting is proposed. Such an extension increases the accuracy of surrogate models, thus Like other methods relying solely on function valalso the agreement between results of surrogate modelling ues, evolutionary algorithms need the objective funcand results of the intended optimization of the original ob- tion to be evaluated in quite a large number of points. jective function. The proposed extension is illustrated on In the context of optimization of empirical objective a case study in the area of searching catalytic materials op- functions, this can be quite disadvantageous because timal with respect to their behaviour in a particular chem- the evaluation of such a function in the points formical reaction. A genetic algorithm developed speci¯cally for ing one generation of an evolutionary algorithm is ofthis application area is employed for optimization, multi- ten costly and time-consuming. Hence, the above menlayer perceptrons serve as surrogate models, and a method tioned advantages of using evolutionary algorithms for called AdaBoost.R2 is used for boosting. Results of the case the optimization of empirical objective functions are sgtautdeymcoledaerllliyngc.on¯rm the usefulness of boosting for surro- frequently counterbalanced by considerably high costs and time needed for the evaluation of such functions. An area, where the trade-o® between successful op1 Introduction timization and costly objective function evaluations plays a crucial role, is the computer-aided search for For more than two decades, evolutionary algorithms, new materials and chemicals optimal with respect to especially their most frequently encountered represen- certain properties [2]. Here, evolutionary algorithms tative { genetic algorithms, belong to the most suc- are used in more than 90 % of optimization tasks, cessful methods for solving di±cult optimization tasks and the rarely encountered alternatives are simulated [3, 11, 31, 32, 42]. The popularity of evolutionary algo- annealing [9, 22, 23], simplex method [17], and holorithms is to some extent due to their biological inspi- graphic search strategy [37, 38, 41], which also use soleration, which increases their comprehensibility out- ly function values, therefore needing a similarly high side computer science. Nevertheless, they share sev- number of objective function evaluations as evolutioneral purely mathematical properties of all stochastic ary algorithms. Testing a generation of materials or optimization methods, most importantly, the valuable chemicals typically needs hours to days of time and ability of to escape a local optimum and continue the costs hundreds to thousands euros. Therefore, the evosearch for a global one, and the restriction of the infor- lutionary optimization rarely runs for more than ten mation on which they rely to function values only. generations. Consequently, they do not need information about gra- The usual approach to decreasing the cost and dients or second-order partial derivatives, di®erently time of optimization of empirical objective functions to smooth optimization methods (such as steepest de- is to evaluate the objective function only sometimes scent, conjugate gradient methods, the popular Le- and to evaluate a suitable regression model of that venberg-Marquardt method, etc. ). This makes them function otherwise. The employed model is termed particularly attractive for the optimization of empiri- surrogate model of the empirical objective function, cal objective functions, the values of which cannot be and the approach is referred to as surrogate modelling. analytically computed, but have to be obtained ex- Needless to say, the time and costs needed to evaluperimentally, through some measurement or testing. ate a regression model are negligible compared to an ? The research reported in this paper has been supported empirical objective function. However, it must not be by the grant No. 201/08/1744 of the Grant Agency of forgotten that the ¯nal optimized function coincides the Czech Republic and partially supported by the Insti- with the original empirical objective function only in tutional Research Plan AV0Z10300504. some points, whereas in the remaining points it coin-

cides only with its surrogate model. Consequently, the that is known to be useful in general [ 19, 20, 29 ]. In evoagreement between the results of surrogate modelling lutionary algorithms, most important for the progress and the results of the intended optimization of the of the method are on the one hand points that best inoriginal objective function depends on the accuracy of dicate the global optimum (typically through highest the approximation of the original objective function values of the ¯tness function), on the other hand points by the surrogate model. that most contribute to the diversity of the population.

This paper suggests to increase the accuracy of sur- In the context of evolutionary optimization, surrorogate models by means of boosting. Boosting is a pop- gate modelling has the following main steps: ular approach to increasing the accuracy of classi¯ca- (i) Collecting an initial set of points in which the obtion, and due to the success of classi¯cation boosting, jective function has already been empirically evalalso several methods of regression boosting have al- uated. This can be the ¯rst generation or several ready been proposed. However, so far no attempt has ¯rst generations of the evolutionary algorithm, but been reported to combine regression boosting with sur- such points are frequently available in advance. rogate modelling. Hence, the purpose of the research (ii) Approximating the objective function by a surroreported in the paper is basically a proof of concept: to gate model, with the use of the set of all points in extend surrogate modelling through the incorporation which it has been empirically evaluated. of regression boosting, and to validate that extension (iii) Running the evolutionary algorithm for a popuon several su±ciently complex case studies. One of lation considerably larger than is the desired popthose case studies is described in the paper. ulation size, with the empirical objective function

In the following section, basic principles of surro- replaced by the surrogate model. gate modelling and its strategies in evolutionary opti- (iv) Forming the next generation of the desired size mization are recalled, and important surrogate models as a subset of the large population obtained in are listed. Section 3 recalls the principles of boosting the preceding step that includes points most imand explains a particular method of regression boost- portant according to considered criteria for the ing that will be employed later in a case study in mate- progress of optimization (such as indication of glorials science. That case study is sketched and its main bal optimum, diversity). results are presented in Section 4. (v) Empirically evaluating the objective function in all points that belong to the next generation of the desired size, and returning to step (ii). 2 Surrogate modelling Actually, the above steps (ii){(v) correspond to only one possible strategy of surrogate modelling in evolutionary optimization: the individual-based control, sometimes also referred to as pre-selection [40].

An alternative strategy to the steps (ii){(v) is to run the algorithm for only the desired population size, interleaving one generation/several generations in which the original objective function is empirically evaluated with a certain number of generations in which the surrogate model is evaluated. This is the generation-based control of surrogate modelling in evolutionary optimization.

For empirical objective functions, it is typical to be highly nonlinear. Therefore, nonlinear regression models should be used as surrogate models. They can be basically divided into two large groups according to whether the set of functions among which the surrogate model has to be chosen has an explicit ¯nite parametrization.

Surrogate modelling is a general approach to the optimization of costly objective functions in which the evaluation of the objective function is restricted to points that are considered to be most important for the progress of the employed optimization method [ 5, 25, 27, 30, 39, 40 ]. It is most frequently encountered in connection with the optimization of empirical objective functions, but has been equally successfully applied also to expensive optimization tasks in engineering design in which the objective function is not empirical, but its evaluation is connected with intensive computations [ 25 ]. In the context of computer-aided search for new materials and chemicals optimal with respect to certain properties, surrogate modelling can be viewed as replacing real experiments with simulated virtual experiments in a computer: such virtual experiments are sometimes referred to as virtual screening [ 2 ].

Although surrogate modelling is a general optimization approach (cf. its application in the context of conventional optimization in [ 5 ]), it is most frequently encountered in connection with evolutionary algorithms. The reason is that in evolutionary optimization, the approach leads to the approximation of the landscape of the ¯tness function, i.e., to a method 1. So far, mostly parametric models have been used for surrogate modelling. From the point of view of their role in this context and/or their overall importance, the following kinds of parametric nonlinear regression models are most worth mentioning: (i) Multilayer feed-forward neural networks, more precisely, the nonlinear mappings computed by such networks. Their attractiveness for nonlinear regression in general and for surrogate modelling in particular [ 20 ] is due to their universal approximation capability, which actually means that linear spaces of functions computed by certain families of multilayer feedforward neural networks are dense in some general function spaces [ 18, 21, 26 ]. For example, considering the most common representative of such networks { multilayer perceptrons, the linear space formed by all functions computed by the family of perceptrons with one hidden layer and in¯nitely smooth activation functions is dense in the space Lp(¹) of functions with the p-th power of absolute value ¯nitely integrable with respect to a ¯nite measure ¹, in the space C(X) of functions continuous on a compact X, and in Sobolev spaces generalizing Lp(¹) to functions that are di®erentiable up to a given order. In the application domain of catalytic materials, from which the case study presented in Section 4 is taken, nearly all examples of regression analysis published since mid 1990s rely on multilayer feed-forward neural networks, typically on multilayer perceptrons (Figure 1). In the last edition of \Handbook of heterogeneous catalysis", more than 20 such examples are listed, as well as several additional, based on other kinds of such networks { radial basis function networks and piecewise-linear neural networks [ 16 ]. Therefore, these three kinds of neural networks are now brie°y recalled: { Multilayer perceptrons (MLPs) can have an arbitrary number of hidden layers, and the basis functions of their linear space of computed functions are constructed by means of sigmoidal activation functions, such as logistic sigmoid, hyperbolic tangent, or arctangent [ 13, 43 ]. { Radial basis function (RBF) networks always have only one hidden layer, and the basis functions of their space of computed functions are radial, i.e., the function value depends only on the distance of the vector of input values from some centre, speci¯c to the function [ 7 ]. { Piecewise-linear neural networks are simply MLPs with piecewise-linear activation functions. Their linear space of computed functions is dense only in C(X), but on the other hand, they allow a straightforward extraction of logical rules describing the relationships between input and output values of the network [ 15 ].

(ii) Support vector regression based on positive semi-de¯nite kernels [34, 36]. It is worth mentioning that they generalize the above recalled RBF networks, and also the historically ¯rst kind of nonlinear regression { polynomial regression. (iii) Gaussian process regression [28] is listed here also due to a relationship to radial basis function networks, but most importantly due to the fact that it has already been successfully employed in surrogate modelling [ 6 ]. 2. Nonparametric regression models are, in general, more °exible than parametric models, but the °exibility is typically paid for by more extensive computations. Therefore, their importance has been increasing only during the last two decades, following the increasing power of available computers [ 12, 14 ]. Nevertheless, there is one noteworthy exception: (v) Regression trees have been successfully used already since the early 1980s [ 4 ]. They are actually a modi¯cation of a classi¯cation method, therefore the regression function is piecewise-constant. That property accounts for relatively low computational requirements of regression trees, but also decreases their °exibility, otherwise the main advantage of nonparametric methods. 3

Boosting regression models

Boosting is a method of improving classi¯cation accuracy that consists in developing the classi¯er iteratively, and increasing the relative in°uence of the training data that most contributed to errors in the previous iterations on its development in the subsequent iterations [33]. The usefulness of boosting for classi¯cation has incited its extension to regression [ 8 ].

Both for classi¯cation and for regression, the basic approach to increasing the relative in°uence of particular training data is re-sampling the training data accord- The errors used to asses the quality of the boosting to a distribution that gives them a higher probabil- ing approximation are then called boosting errors, e.g., ity of occurrence. This is equivalent to re-weighting the boosting MSE, or boosting MAE, where MSE refers to contributions of the individual training pairs (xj ; yj ), the mean squared error between the computed and with higher weights corresponding to higher values of measured values, whereas MAE refers to the mean the error measure. absolute error, i.e., to the mean Euclidean distance

Since surrogate models are regression models, any between them. For simplicity, also the approximation method for regression boosting (such as [ 8, 10, 35 ]) in the ¯rst iteration, F1, is called boosting approximais suitable for them. In the following, the met- tion if boosting is performed, and the respective errors hod AdaBoost.R2 will be explained in detail, pro- are then called boosting errors, although boosting acposed in [ 8 ]. tually does not introduce any modi¯cations in the ¯rst

Similarly to other adaptive boosting methods, each iteration. of the available pairs (x1; y1); : : : : : : ; (xp; yp) of input The above formulation of the method deals only and output data is in the ¯rst iteration of with the case E¹i < 0:5. For E¹i ¸ 0:5, the original AdaBoost.R2 used exactly once. This corresponds to formulation of the method in [ 8 ] proposes to stop the re-sampling them according to the uniform probabil- boosting. However, that is not allowed if the stopping ity distribution P1 with P1(x1) = p1 for j = 1; : : : ; p. criterion should be based on an independent set of In addition, the weighted average error of the 1st iter- validation data. Indeed, the calculation of E¹i does not ation is set to zero, E¹1 = 0. rely on any such independent data set, but it relies

In the subsequent iterations (i ¸ 2), the following solely on the data employed to construct the regressequence of steps is performed: sion model. A possible alternative for the case E¹i ¸ 0:5 is reinitialization, i.e., proceeding as in the 1st itera1. A sample (»1; ´1); : : : ; (»p; ´p) is obtained through tion [ 1 ].

re-sampling (x1; y1); : : : ; (xp; yp) according to the In connection with using feed-forward neural netdistribution Pi¡1. works as surrogate models, it is important to be aware 2. Using (»1; ´1); : : : ; (»p; ´p) as training data, a re- of the di®erence between the iterations of boosting gression model Fi is constructed. and the iterations of neural network training. Boost3. A [ 0,1 ]-valued squared error vector Ei of Fi with ing iterates on a higher level, one iteration of boosting respect to (x1; y1); : : : ; (xp; yp) is calculated as includes a complete training of an ANN, which can proceed for many hundreds of iterations. Nevertheless, Ei = (Ei(1); : : : ; Ei(p)) = both kinds of iterations are similar in the sense that = ((Fi(x1) ¡ y1)2; : : : ; (Fi(xp) ¡ yp)2) : (1) starting with some iteration, over-training is present. maxk=1;:::;p(Fi(xk) ¡ yk)2 Therefore, also over-training due to boosting can be reduced through stopping in the iteration after which 4. The weighted average error of the i-th iteration is the error for an independent set of data ¯rst time incalculated as creases. Moreover, cross-validation can be used to ¯nd the iteration most appropriate for stopping.

(2) 5. Provided E¹i < 0:5 , the probability distribution for re-sampling (x1; y1); : : : ; (xp; yp) is for k = 1; : : : ; p updated according to

The extension of surrogate modelling with boosting will now be illustrated on a case study using data from the investigation of catalytic materials for the highPi(xk; yk) = temperature synthesis of hydrocyanic acid. That investigation and its results have been recently described = Pi¡1(xk; yk) ³ 1¡E³¹Ei¹iE¹´i(1´¡(E1i¡(kE)i)(k)) : (3) ienxp[2er4i]m.Ietnhtsasinbeaenciprceurfloarrm4e8d-cthharnonueglhrheaigcht-otrh.rIonugmhpoustt Pip=1 Pi¡1(xk; yk) 1¡E¹i of those experiments, the composition of the materials was designed by means of a genetic algorithm devel6. The boosting approximation in the i-th iteration is oped speci¯cally for heterogeneous catalysis [44]. More set to the median of the approximations F1; : : : ; Fi precisely, the algorithm was running for 7 generations with respect to the probability distribution of population size 92, and in addition 52 other cataµ 1 ¡E¹1E¹1 ; : : : ; 1 ¡E¹iE¹i ¶ : (4) ltayilgysattsticewdmi.taChtoemnriasaenlqsuuawelelnyrteldyg,easditaghtneaerdeadbc.ooumtpaolstiotgioenthwerer6e96invcaest-k=1

p E¹i = 1 X Pi(xk; yk)Ei(k): p 4

Case study in materials science

The composition and preparation of the investi- as test data was calculated, and averaged over all the gated catalytic materials and the conditions in which 604 folds. The criterion according to which boosting is they had been tested have been in detail described considered useful to an architecture was: the average in [ 24 ]. Here, only the independent and dependent boosting MSE in the 2nd iteration has to be lower than variables are recalled, the latter corresponding to the in the 1st iteration. The iteration till which the averconsidered possible objective functions: age boosting MSE continuously decreased was then taken as the ¯nal iteration of boosting. { independent variables : material used as support, According to that criterion, boosting was useful and proportions of the 10 metal additives Y, La, to 9 from the 12 considered architectures with one Mo, Re, Ir, Ni, Pt, Zn, Ag, Au (an 11th metal, Zr, hidden layer and to 65 from the 78 considered archiwas left out due to the fact that the proportions tectures with two hidden layers. To validate the most of all active compounds sum up to 100 %); promising results of the investigation of the useful{ dependent variables, i.e., objective functions: con- ness of boosting in our case study, the data from the versions of CH4 and NH3, and yield of HCN. 7th generation of the genetic algorithm were used. The validation included the 5 architectures that were most promising for boosting from the point of view of the lowest boosting MSE on test data in the ¯nal iteration.

These were the architectures (14,10,6,3), (14,14,8,3), (14,13,5,3), (14,10,4,3) and (14,11,3), for which the ¯nal iterations of boosting were 32, 29, 31, 19 and 3, respectively. For each of them, the validation proceeded as follows:

As the surrogate model, MLPs were employed, in accordance with their leading role among nonlinear regression models in the area of catalytic materials [ 2, 16 ]. Each considered neural network had 14 input neurons : 4 of them coding the material used as support, the other 10 corresponding to the proportions of the 10 metal additives belonging to independent variables; output neurons were 3, corresponding to the possible objective functions (Figure 1). 1. In each iteration up to the ¯nal boosting iteration

The most appropriate MLP architectures were corresponding to the respective architecture, a sinsearched by means of cross-validation, using only data gle MLP was trained with data about the 604 catabout catalysts from the 1.{6. generation of the ge- alytic materials considered during the architecture netic algorithm and about the 52 catalysts with manu- search. ally designed composition, thus altogether data about 2. Each of those MLPs was employed to approximate 604 catalytic materials. Data about catalysts from the the conversions of CH4 and NH3 and the yield of 7. generation were completely excluded and left out HCN for the 92 materials from the 7. generation for validating the search results. To use as much in- of the genetic algorithm. formation as possible from the available data, cross- 3. In each iteration, the medians with respect to the validation was applied as the extreme 604-fold vari- probability distribution (4) of the approximations ant, i.e., leave-1-out validation. The set of architec- of the two conversions and of the HCN yield obtures within which the search was performed was de- tained up to that iteration were used as the boostlimited by means of the heuristic pyramidal condition: ing approximations. the number of neurons in a subsequent layer must not 4. From the conversions and the yield predicted by increase the number of neurons in a previous layer. the boosting approximations, and from the meaDenote nI , nh and nO the numbers of input, hidden sured values, the boosting MSE and MAE were and output neurons, respectively, and nH1 and nH2 calculated for each MLP. the numbers of neurons in the ¯rst and second hidden layer, respectively. Then the pyramidal condition The boosting errors (MSE and MAE) are sumreads: marized in Figure 2, whereas Figure 3 compares the (i) for MLPs with 1 hidden layer: nI ¸ nH ¸ nO, in boosting approximations of the conversions of CH4 our case 14 ¸ nH ¸ 3 (12 architectures); and NH3 and of the yield of HCN in the 1st and ¯nal (ii) for MLPs with 2 hidden layers: nI ¸ nH1 ¸ iteration with their measured values. The presented nH2 ¸ nO, in our case 14 ¸ nH1 ¸ nH2 ¸ 3 results clearly con¯rm the usefulness of boosting for (78 architectures). the ¯ve considered architectures. For each of them, To investigate the usefulness of boosting in our boosting led to an overall decrease of both considered case study, the same data were used and the same error measures, the MSE and MAE, on new data from set of architectures was considered as for architecture the 7th generation of the genetic algorithm. Moreover, search. In each iteration, a leave-1-out validation was the decrease of the MSE (which is the measure emperformed, in the way brie°y outlined in the preceding ployed during the investigation of the usefulness of section: The mean squared error of the performance of boosting) is uninterrupted or nearly uninterrupted till the catalytic materials serving in the individual folds the ¯nal boosting iteration. On the other hand, the scatter plots in Figure 3 do not indicate any apparent di®erence between the e®ect of boosting on the three properties employed as catalyst performance measures in our case study { conversion of CH4, conversion of NH3, and yield of HCN. Hence, the performed validation con¯rms the usefulness of boosting irrespectively of which of those performance measures is considered. 5

Conclusions

The paper dealt with surrogate modelling, a modern approach to the optimization of empirical objective functions, which is particularly attractive in evolutionary optimization. It proposed to extend surrogate modelling with regression boosting, to increase the accuracy of surrogate models, thus also the agreement between results of surrogate modelling and results of the intended optimization of the original objective function. Needless to say, regression boosting is not new, though it is less common than the popular classi¯cation boosting. However, novel is its combination with surrogate models, which adds the advantage of increased accuracy to the main advantage of surrogate modelling { decreasing the time and costs of optimization of empirical objective functions.

Theoretical principles of both surrogate modelling and boosting are known, therefore the main purpose of the reported research was to validate the feasibility of the proposed extension of surrogate modelling on several su±ciently complex case studies, one of which was sketched in this paper. The presented case study results clearly con¯rm the usefulness of boosting. For the ¯ve most promising architectures, boosting leads to an overall decrease of both considered error measures, MSE and MAE, on new data from the 7th generation of the genetic algorithm. Moreover, the decrease of the MSE (which is the boosting error employed during the investigation of the usefulness of boosting) is uninterrupted or nearly uninterrupted till the ¯nal boosting iteration. On the other hand, the scatter plots in Figure 3 do not indicate any apparent di®erence between the e®ect of boosting on the three catalyst properties considered as possible objective functions in our case study { conversion of CH4, conversion of NH3, and yield of HCN. Hence, the performed validation con¯rms the usefulness of boosting irrespectively of which of these objective functions is selected.

Altin »cay: Optimal resampling and classi¯er prototype selection in classi¯er ensembles using genetic algorithms . Pattern Analysis and Applications 7 , 2004 , 285 { 295 .

Baerns and M. Holen·a: Combinatorial Development of Solid Catalytic Materials . Design of HighThroughput Experiments, Data Analysis, Data Mining . World Scienti¯c, Singapore, 2009 .

3. T. Bartz-Beielstein: Experimental Research in Evolutionary Computation. Springer Verlag, Berlin, 2006 .

Breiman ,

J.H.

Friedman ,

R.A.

Olshen , and

C.J.

Stone : Classi¯cation and Regression Trees . Wadsworth, Belmont, 1984 .

A.J.

Brooker , J. Dennis,

P.D.

Frank , D.B. Sera¯ni,

Torczon , and M. Trosset: A rigorous framework for optimization by surrogates . Structural and Multidisciplinary Optimization , 17 , 1998 , 1 { 13 .

6. D. BuÄche , N.N. Schraudolph , and P. Koumoutsakos: Accelerating evolutionary algorithms with gaussian process ¯tness function models . IEEE Transactions on Systems, Man, and Cybernetics , Part C: Applications and Reviews 35, 2005 , 183 { 194 .

7. M.D. Buhmann: Radial Basis Functions: Theory and Implementations . Cambridge University Press, Cambridge, 2003 .

Drucker : Improving regression using boosting techniques . In A.J.C. Sharkey, editor, Proceedings of the 14th International Conference on Machine Learning , Springer Verlag, London, 1997 , 107 { 115 .

Eftaxias ,

Font ,

Fortuny ,

Giralt ,

Fabregat , and

Stber : Kinetic modelling of catalytic wet air oxidation of phenol by simulated annealing . Applied Catalysis B: Environmental 33 , 2001 , 175 { 190 .

10. J. Friedman: Greedy function approximation: A gradient boosting machine . Annals of Statistics 29 , 2001 , 1189 { 1232 .

11. D. Goldberg: Genetic Algorithms in Search, Optimization and Machine Learning . Addison-Wesley , Reading, 1989 .

12. L. GyÄor¯,

Kohler ,

Krzyzak , and H. Walk: 28. E. Rasmussen and

Williams : Gaussian Process for A Distribution-Free Theory of Nonparametric Regres- Machine Learning . MIT Press, Cambridge, 2006 . sion. Springer Verlag, Berlin, 2002 . 29. A. Ratle: Accelerating the convergence of evolution-

13. M.T. Hagan , H.B.

Demuth , and M.H.

Beale : Neural ary algorithms by ¯tness landscape approximation . In Network Design. PWS Publishing , Boston, 1996 .

A.E.

Eiben , T. BÄack, M. Schoenauer, and H. -P. Schwe-

14. T.J. Hastie and R.J. Tibshirani : Generalized Additive fel, editors, Parallel Problem Solving from Nature , Models. Chapman & Hall, Boca Raton, 1990 . Springer Verlag, Berlin, 1998 , 87 { 96 .

15. M. Holen·a: Piecewise-linear neural networks and their 30. A. Ratle: Kriging as a surrogate ¯tness landscape in relationship to rule extraction from data . Neural Com- evolutionary optimization. Arti¯cial Intelligence for putation 18 , 2006 , 2813 { 2853 . Engineering

Design

, Analysis and Manufacturing 15 ,

16. M. Holen·a and M. Baerns: Computer-aided strategies 2001 , 37 { 49 . for catalyst development . In G. Ertl, H. KnÄozinger, 31.

C.R.

Reeves and

J.E.

Rowe : Genetic Algorithms: PrinF . SchuÄth, and J. Eitkamp, editors, Handbook of Het- ciples and Perspectives . Kluwer Academic Publishers, erogeneous Catalysis, Wiley-VCH, Weinheim , 2008 . Boston, 2003 .

17.

Holzwarth ,

Denton , H. Zantho®, and 32. R. Schaefer: Foundation of Global Genetic OptimizaC . Mirodatos: Combinatorial approaches to het- tion. Springer Verlag, Berlin, 2007 . erogeneous catalysis: Strategies and perspectives for 33 . R. Schapire: The strength of weak learnability . Maacademic research. Catalysis Today 67 , 2001 , 309 { 318 . chine Learning 5 , 1990 , 197 { 227 .

18. K. Hornik: Approximation capabilities of multilayer 34. B. SchoÄlkopf and A . J. Smola : Learning with Kernels. neural networks . Neural Networks 4 , 1991 , 251 { 257 . MIT Press, Cambridge, 2002 .

19.

Jin : A comprehensive survery of ¯tness approxima- 35. D.L. Shrestha: Experiments with AdaBoost.RT, an imtion in evolutionary computation. Soft Computing 9, proved boosting scheme for regression . Neural Compu2005 , 3 { 12 . tation 18, 2006 , 1678 { 1710 .

20.

Jin , M. HuÄsken, M. Olhofer, and

Sendho ®: 36 . I. Steinwart and A. Christmann: Support Vector MaNeural networks for ¯tness approximation in evolu- chines . Springer Verlag, New York, 2008 . tionary optimization . In Y. Jin, editor, Knowledge 37. A . Tompos , J.L. Margitfalvi , E. T¯rst, L. V¶egva¶ri, Incorporation in Evolutionary Computation, Springer M.A. Jaloull , H.A. Khalfalla , and M.M. Elgarni : DeVerlag, Berlin, 2005 , 281 { 306 . velopment of catalyst libraries for total oxidation of

21. P.C. Kainen , V. Kºurkova¶, and M. Sanguineti: Esti- methane: A case study for combined application of mates of approximation rates by gaussian radial-basis "holographic research strategy and arti¯cial neural netfunctions. In Adaptive and Natural Computing Algo- works" in catalyst library design . Applied Catalysis A: rithms , Springer Verlag, Berlin, 2007 , 11 { 18 . General 285, 2005 , 65 { 78 .

22.

Li ,

Sun ,

Jin ,

Wang , and

Ding : A 38. A. Tompos , L. V¶gva¶ri, E. T¯rst, and J.L. Margitsimulated annealing study of Si, Al distribution in the falvi: Assessment of predictive ability of arti¯cial omega framework . Journal of Molecular Catalysis A: neural networks using holographic mapping . CombiChemical 148 , 1999 , 189 { 195 . natorial Chemistry and High Throughput Screening

23. A.S. McLeod and L.F. Gladden : Heterogeneous cat- 10 , 2007 , 121 { 134 . alyst design using stochastic optimization algorithms . 39. H. Ulmer , F. Streichert , and

Zell

: Model-assisted m Journal of Chemical Information and Computer Sci- steady state evolution strategies . In GECCO 2003: Geence 40 , 2000 , 981 { 987 . netic and Evolutionary Computation, Springer Verlag,

24. S. MÄohmel,

Steinfeldt ,

Endgelschalt , M. Holen·a, Berlin, 2003 , 610 {621.

Kolf ,

Dingerdissen ,

Wolf ,

Weber , and 40. H. Ulmer , F. Streichert , and

Zell: Model assisted M. Bewersdorf : New catalytic materials for the evolution strategies . In Y. Jin, editor, Knowledge Inhigh-temperature synthesis of hydrocyanic acid from corporation in Evolutionary Computation, Springer methane and ammonia by high-throughput approach . Verlag, Berlin, 2005 , 333 { 355 . Applied Catalysis A: General 334, 2008 , 73 { 83 . 41. L. V¶ egva¶ri, A. Tompos, S. GÄoboÄloÄs, and

J.F.

Mar -

25.

Y.S.

Ong ,

P.B.

Nair ,

A.J.

Keane , and

K.W.

Wong : gitfalvi: Holographic research strategy for catalyst liSurrogate-assisted evolutionary optimization frame- brary design: Description of a new powerful optimisaworks for high-¯delity engineering design problems . In tion method. Catalysis Today 81 , 2003 , 517 {527. Y. Jin, editor, Knowledge Incorporation in Evolution- 42 . M.D. Vose: The Simple Genetic Algorithm . Foundaary Computation , Springer Verlag, Berlin, 2005 , 307 { tions and Theory. MIT Press, Cambridge, 1999 . 331 . 43. H. White: Arti¯cial Neural Networks: Approxima-

26. A. Pinkus: Approximation theory of the MPL model tion and Learning Theory . Blackwell Publishers, Camin neural networks . Acta Numerica 8 , 1998 , 277 { 283 . bridge, 1992 .

27.

Rasheed ,

Ni , and S. Vattam: Methods for using 44 . D. Wolf , O.V.

Buyevskaya , and M.

Baerns: An evosurrogate modesl to speed up genetic algorithm oprim- lutionary approach in the combinatorial selection and ization: Informed operators and genetic engineering. optimization of catalytic materials . Applied Catalyst In Y. Jin, editor, Knowledge Incorporation in Evolu- A: General 200 , 2000 , 63 { 77 . tionary Computation, Springer Verlag, Berlin, 2005 , 103 { 123 .