Boosted surrogate models in evolutionary optimization? Martin Holeňa Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod Vodárenskou věžı́ 2, 18207 Praha 8, Czech Republic, martin@cs.cas.cz, web: cs.cas.cz/~martin Abstract. The paper deals with surrogate modelling, Indeed, the impossibility to compute analytically the a modern approach to the optimization of empirical ob- function values of such a function makes also an ana- jective functions. The approach leads to a substantial de- lytical computation of its gradient and second-order crease of time and costs of evaluation of the objective func- derivatives impossible, whereas measurement errors tion, a property that is particularly attractive in evolution- usually hinder obtaining sufficiently accurate estima- ary optimization. In the paper, an extension of surrogate tes of the derivatives. modelling with regression boosting is proposed. Such an ex- tension increases the accuracy of surrogate models, thus Like other methods relying solely on function val- also the agreement between results of surrogate modelling ues, evolutionary algorithms need the objective func- and results of the intended optimization of the original ob- tion to be evaluated in quite a large number of points. jective function. The proposed extension is illustrated on In the context of optimization of empirical objective a case study in the area of searching catalytic materials op- functions, this can be quite disadvantageous because timal with respect to their behaviour in a particular chem- the evaluation of such a function in the points form- ical reaction. A genetic algorithm developed specifically for ing one generation of an evolutionary algorithm is of- this application area is employed for optimization, multi- ten costly and time-consuming. Hence, the above men- layer perceptrons serve as surrogate models, and a method tioned advantages of using evolutionary algorithms for called AdaBoost.R2 is used for boosting. Results of the case the optimization of empirical objective functions are study clearly confirm the usefulness of boosting for surro- gate modelling. frequently counterbalanced by considerably high costs and time needed for the evaluation of such functions. An area, where the trade-off between successful op- 1 Introduction timization and costly objective function evaluations plays a crucial role, is the computer-aided search for For more than two decades, evolutionary algorithms, new materials and chemicals optimal with respect to especially their most frequently encountered represen- certain properties [2]. Here, evolutionary algorithms tative – genetic algorithms, belong to the most suc- are used in more than 90 % of optimization tasks, cessful methods for solving difficult optimization tasks and the rarely encountered alternatives are simulated [3, 11, 31, 32, 42]. The popularity of evolutionary algo- annealing [9, 22, 23], simplex method [17], and holo- rithms is to some extent due to their biological inspi- graphic search strategy [37, 38, 41], which also use sole- ration, which increases their comprehensibility out- ly function values, therefore needing a similarly high side computer science. Nevertheless, they share sev- number of objective function evaluations as evolution- eral purely mathematical properties of all stochastic ary algorithms. Testing a generation of materials or optimization methods, most importantly, the valuable chemicals typically needs hours to days of time and ability of to escape a local optimum and continue the costs hundreds to thousands euros. Therefore, the evo- search for a global one, and the restriction of the infor- lutionary optimization rarely runs for more than ten mation on which they rely to function values only. generations. Consequently, they do not need information about gra- The usual approach to decreasing the cost and dients or second-order partial derivatives, differently time of optimization of empirical objective functions to smooth optimization methods (such as steepest de- is to evaluate the objective function only sometimes scent, conjugate gradient methods, the popular Le- and to evaluate a suitable regression model of that venberg-Marquardt method, etc. ). This makes them function otherwise. The employed model is termed particularly attractive for the optimization of empiri- surrogate model of the empirical objective function, cal objective functions, the values of which cannot be and the approach is referred to as surrogate modelling. analytically computed, but have to be obtained ex- Needless to say, the time and costs needed to evalu- perimentally, through some measurement or testing. ate a regression model are negligible compared to an ? The research reported in this paper has been supported empirical objective function. However, it must not be by the grant No. 201/08/1744 of the Grant Agency of forgotten that the final optimized function coincides the Czech Republic and partially supported by the Insti- with the original empirical objective function only in tutional Research Plan AV0Z10300504. some points, whereas in the remaining points it coin- 16 Martin Holeňa cides only with its surrogate model. Consequently, the that is known to be useful in general [19, 20, 29]. In evo- agreement between the results of surrogate modelling lutionary algorithms, most important for the progress and the results of the intended optimization of the of the method are on the one hand points that best in- original objective function depends on the accuracy of dicate the global optimum (typically through highest the approximation of the original objective function values of the fitness function), on the other hand points by the surrogate model. that most contribute to the diversity of the population. This paper suggests to increase the accuracy of sur- In the context of evolutionary optimization, surro- rogate models by means of boosting. Boosting is a pop- gate modelling has the following main steps: ular approach to increasing the accuracy of classifica- (i) Collecting an initial set of points in which the ob- tion, and due to the success of classification boosting, jective function has already been empirically eval- also several methods of regression boosting have al- uated. This can be the first generation or several ready been proposed. However, so far no attempt has first generations of the evolutionary algorithm, but been reported to combine regression boosting with sur- such points are frequently available in advance. rogate modelling. Hence, the purpose of the research (ii) Approximating the objective function by a surro- reported in the paper is basically a proof of concept: to gate model, with the use of the set of all points in extend surrogate modelling through the incorporation which it has been empirically evaluated. of regression boosting, and to validate that extension (iii) Running the evolutionary algorithm for a popu- on several sufficiently complex case studies. One of lation considerably larger than is the desired pop- those case studies is described in the paper. ulation size, with the empirical objective function In the following section, basic principles of surro- replaced by the surrogate model. gate modelling and its strategies in evolutionary opti- (iv) Forming the next generation of the desired size mization are recalled, and important surrogate models as a subset of the large population obtained in are listed. Section 3 recalls the principles of boosting the preceding step that includes points most im- and explains a particular method of regression boost- portant according to considered criteria for the ing that will be employed later in a case study in mate- progress of optimization (such as indication of glo- rials science. That case study is sketched and its main bal optimum, diversity). results are presented in Section 4. (v) Empirically evaluating the objective function in all points that belong to the next generation of the desired size, and returning to step (ii). 2 Surrogate modelling Actually, the above steps (ii)–(v) correspond to Surrogate modelling is a general approach to the op- only one possible strategy of surrogate modelling in timization of costly objective functions in which the evolutionary optimization: the individual-based con- evaluation of the objective function is restricted to trol, sometimes also referred to as pre-selection [40]. points that are considered to be most important for An alternative strategy to the steps (ii)–(v) is to run the progress of the employed optimization method [5, the algorithm for only the desired population size, in- 25, 27, 30, 39, 40]. It is most frequently encountered in terleaving one generation/several generations in which connection with the optimization of empirical objec- the original objective function is empirically evaluated tive functions, but has been equally successfully ap- with a certain number of generations in which the sur- plied also to expensive optimization tasks in engineer- rogate model is evaluated. This is the generation-based ing design in which the objective function is not em- control of surrogate modelling in evolutionary opti- pirical, but its evaluation is connected with intensive mization. computations [25]. In the context of computer-aided For empirical objective functions, it is typical to be search for new materials and chemicals optimal with highly nonlinear. Therefore, nonlinear regression mod- respect to certain properties, surrogate modelling can els should be used as surrogate models. They can be be viewed as replacing real experiments with simulated basically divided into two large groups according to virtual experiments in a computer: such virtual ex- whether the set of functions among which the sur- periments are sometimes referred to as virtual screen- rogate model has to be chosen has an explicit finite ing [2]. parametrization. Although surrogate modelling is a general opti- 1. So far, mostly parametric models have been used mization approach (cf. its application in the context for surrogate modelling. From the point of view of of conventional optimization in [5]), it is most fre- their role in this context and/or their overall im- quently encountered in connection with evolutionary portance, the following kinds of parametric nonlin- algorithms. The reason is that in evolutionary opti- ear regression models are most worth mentioning: mization, the approach leads to the approximation of (i) Multilayer feed-forward neural networks, more the landscape of the fitness function, i.e., to a method precisely, the nonlinear mappings computed Boosted surrogate models . . . 17 by such networks. Their attractiveness for non- linear regression in general and for surrogate modelling in particular [20] is due to their uni- versal approximation capability, which actual- ly means that linear spaces of functions com- puted by certain families of multilayer feed- forward neural networks are dense in some ge- neral function spaces [18, 21, 26]. For exam- ple, considering the most common represen- tative of such networks – multilayer percep- trons, the linear space formed by all functions Fig. 1. Example MLP architecture with two hidden layers, computed by the family of perceptrons with used in the case study presented in Section 4. one hidden layer and infinitely smooth acti- vation functions is dense in the space Lp (µ) of functions with the p-th power of absolute (ii) Support vector regression based on positive value finitely integrable with respect to a fi- semi-definite kernels [34, 36]. It is worth men- nite measure µ, in the space C(X) of functions tioning that they generalize the above recalled continuous on a compact X, and in Sobolev RBF networks, and also the historically first spaces generalizing Lp (µ) to functions that are kind of nonlinear regression – polynomial re- differentiable up to a given order. In the ap- gression. plication domain of catalytic materials, from (iii) Gaussian process regression [28] is listed here which the case study presented in Section 4 is also due to a relationship to radial basis func- taken, nearly all examples of regression analy- tion networks, but most importantly due to sis published since mid 1990s rely on multi- the fact that it has already been successfully layer feed-forward neural networks, typically employed in surrogate modelling [6]. on multilayer perceptrons (Figure 1). In the 2. Nonparametric regression models are, in general, last edition of “Handbook of heterogeneous more flexible than parametric models, but the flex- catalysis”, more than 20 such examples are ibility is typically paid for by more extensive com- listed, as well as several additional, based on putations. Therefore, their importance has been other kinds of such networks – radial basis increasing only during the last two decades, fol- function networks and piecewise-linear neural lowing the increasing power of available comput- networks [16]. Therefore, these three kinds of ers [12, 14]. Nevertheless, there is one noteworthy neural networks are now briefly recalled: exception: – Multilayer perceptrons (MLPs) can have (v) Regression trees have been successfully used an arbitrary number of hidden layers, and already since the early 1980s [4]. They are ac- the basis functions of their linear space tually a modification of a classification met- of computed functions are constructed by hod, therefore the regression function is piece- means of sigmoidal activation functions, wise-constant. That property accounts for re- such as logistic sigmoid, hyperbolic tan- latively low computational requirements of re- gent, or arctangent [13, 43]. gression trees, but also decreases their flexibil- – Radial basis function (RBF) networks al- ity, otherwise the main advantage of nonpara- ways have only one hidden layer, and the metric methods. basis functions of their space of computed functions are radial, i.e., the function value depends only on the distance of the vector 3 Boosting regression models of input values from some centre, specific to the function [7]. Boosting is a method of improving classification ac- – Piecewise-linear neural networks are sim- curacy that consists in developing the classifier iter- ply MLPs with piecewise-linear activation atively, and increasing the relative influence of the functions. Their linear space of computed training data that most contributed to errors in the functions is dense only in C(X), but on previous iterations on its development in the subse- the other hand, they allow a straightfor- quent iterations [33]. The usefulness of boosting for ward extraction of logical rules describing classification has incited its extension to regression [8]. the relationships between input and out- Both for classification and for regression, the basic ap- put values of the network [15]. proach to increasing the relative influence of particular 18 Martin Holeňa training data is re-sampling the training data accord- The errors used to asses the quality of the boost- ing to a distribution that gives them a higher probabil- ing approximation are then called boosting errors, e.g., ity of occurrence. This is equivalent to re-weighting the boosting MSE, or boosting MAE, where MSE refers to contributions of the individual training pairs (xj , yj ), the mean squared error between the computed and with higher weights corresponding to higher values of measured values, whereas MAE refers to the mean the error measure. absolute error, i.e., to the mean Euclidean distance Since surrogate models are regression models, any between them. For simplicity, also the approximation method for regression boosting (such as [8, 10, 35]) in the first iteration, F1 , is called boosting approxima- is suitable for them. In the following, the met- tion if boosting is performed, and the respective errors hod AdaBoost.R2 will be explained in detail, pro- are then called boosting errors, although boosting ac- posed in [8]. tually does not introduce any modifications in the first Similarly to other adaptive boosting methods, each iteration. of the available pairs (x1 , y1 ), . . . . . . , (xp , yp ) of input The above formulation of the method deals only and output data is in the first iteration of with the case Ēi < 0.5. For Ēi ≥ 0.5, the original AdaBoost.R2 used exactly once. This corresponds to formulation of the method in [8] proposes to stop the re-sampling them according to the uniform probabil- boosting. However, that is not allowed if the stopping ity distribution P1 with P1 (x1 ) = p1 for j = 1, . . . , p. criterion should be based on an independent set of In addition, the weighted average error of the 1st iter- validation data. Indeed, the calculation of Ēi does not ation is set to zero, Ē1 = 0. rely on any such independent data set, but it relies In the subsequent iterations (i ≥ 2), the following solely on the data employed to construct the regres- sequence of steps is performed: sion model. A possible alternative for the case Ēi ≥ 0.5 is reinitialization, i.e., proceeding as in the 1st itera- 1. A sample (ξ1 , η1 ), . . . , (ξp , ηp ) is obtained through tion [1]. re-sampling (x1 , y1 ), . . . , (xp , yp ) according to the In connection with using feed-forward neural net- distribution Pi−1 . works as surrogate models, it is important to be aware 2. Using (ξ1 , η1 ), . . . , (ξp , ηp ) as training data, a re- of the difference between the iterations of boosting gression model Fi is constructed. and the iterations of neural network training. Boost- 3. A [0,1]-valued squared error vector Ei of Fi with ing iterates on a higher level, one iteration of boosting respect to (x1 , y1 ), . . . , (xp , yp ) is calculated as includes a complete training of an ANN, which can proceed for many hundreds of iterations. Nevertheless, Ei = (Ei (1), . . . , Ei (p)) = both kinds of iterations are similar in the sense that ((Fi (x1 ) − y1 )2 , . . . , (Fi (xp ) − yp )2 ) starting with some iteration, over-training is present. = . (1) maxk=1,...,p (Fi (xk ) − yk )2 Therefore, also over-training due to boosting can be reduced through stopping in the iteration after which 4. The weighted average error of the i-th iteration is the error for an independent set of data first time in- calculated as creases. Moreover, cross-validation can be used to find 1X p the iteration most appropriate for stopping. Ēi = Pi (xk , yk )Ei (k). (2) p k=1 4 Case study in materials science 5. Provided Ēi < 0.5 , the probability distribution for re-sampling (x1 , y1 ), . . . , (xp , yp ) is for k = 1, . . . , p The extension of surrogate modelling with boosting updated according to will now be illustrated on a case study using data from the investigation of catalytic materials for the high- Pi (xk , yk ) = temperature synthesis of hydrocyanic acid. That in- ³ ´(1−Ei (k)) vestigation and its results have been recently described Ēi Pi−1 (xk , yk ) 1− Ēi in [24]. It has been performed through high-throughput = ³ ´ . (3) experiments in a circular 48-channel reactor. In most Pp Ēi (1−Ei (k)) P (x i=1 i−1 k k , y ) 1−Ēi of those experiments, the composition of the materials was designed by means of a genetic algorithm devel- 6. The boosting approximation in the i-th iteration is oped specifically for heterogeneous catalysis [44]. More set to the median of the approximations F1 , . . . , Fi precisely, the algorithm was running for 7 generations with respect to the probability distribution of population size 92, and in addition 52 other cata- µ ¶ lysts with manually designed composition were inves- Ē1 Ēi ,..., . (4) tigated. Consequently, data about altogether 696 cat- 1 − Ē1 1 − Ēi alytic materials were gathered. Boosted surrogate models . . . 19 The composition and preparation of the investi- as test data was calculated, and averaged over all the gated catalytic materials and the conditions in which 604 folds. The criterion according to which boosting is they had been tested have been in detail described considered useful to an architecture was: the average in [24]. Here, only the independent and dependent boosting MSE in the 2nd iteration has to be lower than variables are recalled, the latter corresponding to the in the 1st iteration. The iteration till which the aver- considered possible objective functions: age boosting MSE continuously decreased was then taken as the final iteration of boosting. – independent variables: material used as support, According to that criterion, boosting was useful and proportions of the 10 metal additives Y, La, to 9 from the 12 considered architectures with one Mo, Re, Ir, Ni, Pt, Zn, Ag, Au (an 11th metal, Zr, hidden layer and to 65 from the 78 considered archi- was left out due to the fact that the proportions tectures with two hidden layers. To validate the most of all active compounds sum up to 100 %); promising results of the investigation of the useful- – dependent variables, i.e., objective functions: con- ness of boosting in our case study, the data from the versions of CH4 and NH3 , and yield of HCN. 7th generation of the genetic algorithm were used. The As the surrogate model, MLPs were employed, in validation included the 5 architectures that were most accordance with their leading role among nonlinear promising for boosting from the point of view of the regression models in the area of catalytic ma- lowest boosting MSE on test data in the final iteration. terials [2, 16]. Each considered neural network had These were the architectures (14,10,6,3), (14,14,8,3), 14 input neurons: 4 of them coding the material used (14,13,5,3), (14,10,4,3) and (14,11,3), for which the fi- as support, the other 10 corresponding to the propor- nal iterations of boosting were 32, 29, 31, 19 and 3, re- tions of the 10 metal additives belonging to indepen- spectively. For each of them, the validation proceeded dent variables; output neurons were 3, corresponding as follows: to the possible objective functions (Figure 1). 1. In each iteration up to the final boosting iteration The most appropriate MLP architectures were corresponding to the respective architecture, a sin- searched by means of cross-validation, using only data gle MLP was trained with data about the 604 cat- about catalysts from the 1.–6. generation of the ge- alytic materials considered during the architecture netic algorithm and about the 52 catalysts with manu- search. ally designed composition, thus altogether data about 2. Each of those MLPs was employed to approximate 604 catalytic materials. Data about catalysts from the the conversions of CH4 and NH3 and the yield of 7. generation were completely excluded and left out HCN for the 92 materials from the 7. generation for validating the search results. To use as much in- of the genetic algorithm. formation as possible from the available data, cross- 3. In each iteration, the medians with respect to the validation was applied as the extreme 604-fold vari- probability distribution (4) of the approximations ant, i.e., leave-1-out validation. The set of architec- of the two conversions and of the HCN yield ob- tures within which the search was performed was de- tained up to that iteration were used as the boost- limited by means of the heuristic pyramidal condition: ing approximations. the number of neurons in a subsequent layer must not 4. From the conversions and the yield predicted by increase the number of neurons in a previous layer. the boosting approximations, and from the mea- Denote nI , nh and nO the numbers of input, hidden sured values, the boosting MSE and MAE were and output neurons, respectively, and nH1 and nH2 calculated for each MLP. the numbers of neurons in the first and second hid- den layer, respectively. Then the pyramidal condition The boosting errors (MSE and MAE) are sum- reads: marized in Figure 2, whereas Figure 3 compares the (i) for MLPs with 1 hidden layer: nI ≥ nH ≥ nO , in boosting approximations of the conversions of CH4 our case 14 ≥ nH ≥ 3 (12 architectures); and NH3 and of the yield of HCN in the 1st and final (ii) for MLPs with 2 hidden layers: nI ≥ nH1 ≥ iteration with their measured values. The presented nH2 ≥ nO , in our case 14 ≥ nH1 ≥ nH2 ≥ 3 results clearly confirm the usefulness of boosting for (78 architectures). the five considered architectures. For each of them, To investigate the usefulness of boosting in our boosting led to an overall decrease of both considered case study, the same data were used and the same error measures, the MSE and MAE, on new data from set of architectures was considered as for architecture the 7th generation of the genetic algorithm. Moreover, search. In each iteration, a leave-1-out validation was the decrease of the MSE (which is the measure em- performed, in the way briefly outlined in the preceding ployed during the investigation of the usefulness of section: The mean squared error of the performance of boosting) is uninterrupted or nearly uninterrupted till the catalytic materials serving in the individual folds the final boosting iteration. On the other hand, the 20 Martin Holeňa five most promising architectures, boosting leads to an overall decrease of both considered error measures, MSE and MAE, on new data from the 7th generation of the genetic algorithm. Moreover, the decrease of the MSE (which is the boosting error employed during the investigation of the usefulness of boosting) is uninter- rupted or nearly uninterrupted till the final boosting iteration. On the other hand, the scatter plots in Fig- ure 3 do not indicate any apparent difference between the effect of boosting on the three catalyst properties considered as possible objective functions in our case study – conversion of CH4 , conversion of NH3 , and yield of HCN. Hence, the performed validation con- firms the usefulness of boosting irrespectively of which of these objective functions is selected. Fig. 2. History of the boosting MSE and MAE on the data References from the 7th generation of the genetic algorithm for MLPs with the five architectures included in the validation of 1. H. Altinçay: Optimal resampling and classifier proto- boosting. type selection in classifier ensembles using genetic al- gorithms. Pattern Analysis and Applications 7, 2004, 285–295. scatter plots in Figure 3 do not indicate any apparent 2. M. Baerns and M. Holeňa: Combinatorial Develop- difference between the effect of boosting on the three ment of Solid Catalytic Materials. Design of High- properties employed as catalyst performance measures Throughput Experiments, Data Analysis, Data Mining. in our case study – conversion of CH4 , conversion of World Scientific, Singapore, 2009. NH3 , and yield of HCN. Hence, the performed valida- 3. T. Bartz-Beielstein: Experimental Research in Evolu- tion confirms the usefulness of boosting irrespectively tionary Computation. Springer Verlag, Berlin, 2006. of which of those performance measures is considered. 4. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone: Classification and Regression Trees. Wadsworth, Belmont, 1984. 5 Conclusions 5. A.J. Brooker, J. Dennis, P.D. Frank, D.B. Serafini, V. Torczon, and M. Trosset: A rigorous framework The paper dealt with surrogate modelling, a mod- for optimization by surrogates. Structural and Multi- disciplinary Optimization, 17, 1998, 1–13. ern approach to the optimization of empirical objec- 6. D. Büche, N.N. Schraudolph, and P. Koumoutsakos: tive functions, which is particularly attractive in evo- Accelerating evolutionary algorithms with gaussian lutionary optimization. It proposed to extend surro- process fitness function models. IEEE Transactions on gate modelling with regression boosting, to increase Systems, Man, and Cybernetics, Part C: Applications the accuracy of surrogate models, thus also the agree- and Reviews 35, 2005, 183–194. ment between results of surrogate modelling and re- 7. M.D. Buhmann: Radial Basis Functions: Theory and sults of the intended optimization of the original ob- Implementations. Cambridge University Press, Cam- jective function. Needless to say, regression boosting bridge, 2003. is not new, though it is less common than the popular 8. H. Drucker: Improving regression using boosting tech- classification boosting. However, novel is its combina- niques. In A.J.C. Sharkey, editor, Proceedings of the tion with surrogate models, which adds the advantage 14th International Conference on Machine Learning, of increased accuracy to the main advantage of sur- Springer Verlag, London, 1997, 107–115. rogate modelling – decreasing the time and costs of 9. A. Eftaxias, J. Font, A. Fortuny, J. Giralt, A. Fabregat, and F. Stber: Kinetic modelling of catalytic wet air optimization of empirical objective functions. oxidation of phenol by simulated annealing. Applied Theoretical principles of both surrogate modelling Catalysis B: Environmental 33, 2001, 175–190. and boosting are known, therefore the main purpose of 10. J. Friedman: Greedy function approximation: A gra- the reported research was to validate the feasibility of dient boosting machine. Annals of Statistics 29, 2001, the proposed extension of surrogate modelling on sev- 1189–1232. eral sufficiently complex case studies, one of which was 11. D. Goldberg: Genetic Algorithms in Search, Optimiza- sketched in this paper. The presented case study re- tion and Machine Learning. Addison-Wesley, Reading, sults clearly confirm the usefulness of boosting. For the 1989. Boosted surrogate models . . . 21 12. L. Györfi, M. Kohler, A. Krzyzak, and H. Walk: 28. E. Rasmussen and C. Williams: Gaussian Process for A Distribution-Free Theory of Nonparametric Regres- Machine Learning. MIT Press, Cambridge, 2006. sion. Springer Verlag, Berlin, 2002. 29. A. Ratle: Accelerating the convergence of evolution- 13. M.T. Hagan, H.B. Demuth, and M.H. Beale: Neural ary algorithms by fitness landscape approximation. In Network Design. PWS Publishing, Boston, 1996. A.E. Eiben, T. Bäck, M. Schoenauer, and H.-P. Schwe- 14. T.J. Hastie and R.J. Tibshirani: Generalized Additive fel, editors, Parallel Problem Solving from Nature, Models. Chapman & Hall, Boca Raton, 1990. Springer Verlag, Berlin, 1998, 87–96. 15. M. Holeňa: Piecewise-linear neural networks and their 30. A. Ratle: Kriging as a surrogate fitness landscape in relationship to rule extraction from data. Neural Com- evolutionary optimization. Artificial Intelligence for putation 18, 2006, 2813–2853. Engineering Design, Analysis and Manufacturing 15, 16. M. Holeňa and M. Baerns: Computer-aided strategies 2001, 37–49. for catalyst development. In G. Ertl, H. Knözinger, 31. C.R. Reeves and J.E. Rowe: Genetic Algorithms: Prin- F. Schüth, and J. Eitkamp, editors, Handbook of Het- ciples and Perspectives. Kluwer Academic Publishers, erogeneous Catalysis, Wiley-VCH, Weinheim, 2008. Boston, 2003. 17. A. Holzwarth, P. Denton, H. Zanthoff, and 32. R. Schaefer: Foundation of Global Genetic Optimiza- C. Mirodatos: Combinatorial approaches to het- tion. Springer Verlag, Berlin, 2007. erogeneous catalysis: Strategies and perspectives for 33. R. Schapire: The strength of weak learnability. Ma- academic research. Catalysis Today 67, 2001, 309–318. chine Learning 5, 1990, 197–227. 18. K. Hornik: Approximation capabilities of multilayer 34. B. Schölkopf and A.J. Smola: Learning with Kernels. neural networks. Neural Networks 4, 1991, 251–257. MIT Press, Cambridge, 2002. 19. Y. Jin: A comprehensive survery of fitness approxima- 35. D.L. Shrestha: Experiments with AdaBoost.RT, an im- tion in evolutionary computation. Soft Computing 9, proved boosting scheme for regression. Neural Compu- 2005, 3–12. tation 18, 2006, 1678–1710. 20. Y. Jin, M. Hüsken, M. Olhofer, and B. Sendhoff: 36. I. Steinwart and A. Christmann: Support Vector Ma- Neural networks for fitness approximation in evolu- chines. Springer Verlag, New York, 2008. tionary optimization. In Y. Jin, editor, Knowledge 37. A. Tompos, J.L. Margitfalvi, E. Tfirst, L. Végvári, Incorporation in Evolutionary Computation, Springer M.A. Jaloull, H.A. Khalfalla, and M.M. Elgarni: De- Verlag, Berlin, 2005, 281–306. velopment of catalyst libraries for total oxidation of 21. P.C. Kainen, V. Kůrková, and M. Sanguineti: Esti- methane: A case study for combined application of mates of approximation rates by gaussian radial-basis ”holographic research strategy and artificial neural net- functions. In Adaptive and Natural Computing Algo- works” in catalyst library design. Applied Catalysis A: rithms, Springer Verlag, Berlin, 2007, 11–18. General 285, 2005, 65–78. 22. B. Li, P. Sun, Q. Jin, J. Wang, and D. Ding: A 38. A. Tompos, L. Vǵvári, E. Tfirst, and J.L. Margit- simulated annealing study of Si, Al distribution in the falvi: Assessment of predictive ability of artificial omega framework. Journal of Molecular Catalysis A: neural networks using holographic mapping. Combi- Chemical 148, 1999, 189–195. natorial Chemistry and High Throughput Screening 23. A.S. McLeod and L.F. Gladden: Heterogeneous cat- 10, 2007, 121–134. alyst design using stochastic optimization algorithms. 39. H. Ulmer, F. Streichert, and A. Zell: Model-assisted m Journal of Chemical Information and Computer Sci- steady state evolution strategies. In GECCO 2003: Ge- ence 40, 2000, 981–987. netic and Evolutionary Computation, Springer Verlag, 24. S. Möhmel, N. Steinfeldt, S. Endgelschalt, M. Holeňa, Berlin, 2003, 610–621. S. Kolf, U. Dingerdissen, D. Wolf, R. Weber, and 40. H. Ulmer, F. Streichert, and A. Zell: Model assisted M. Bewersdorf: New catalytic materials for the evolution strategies. In Y. Jin, editor, Knowledge In- high-temperature synthesis of hydrocyanic acid from corporation in Evolutionary Computation, Springer methane and ammonia by high-throughput approach. Verlag, Berlin, 2005, 333–355. Applied Catalysis A: General 334, 2008, 73–83. 41. L. Végvári, A. Tompos, S. Göbölös, and J.F. Mar- 25. Y.S. Ong, P.B. Nair, A.J. Keane, and K.W. Wong: gitfalvi: Holographic research strategy for catalyst li- Surrogate-assisted evolutionary optimization frame- brary design: Description of a new powerful optimisa- works for high-fidelity engineering design problems. In tion method. Catalysis Today 81, 2003, 517–527. Y. Jin, editor, Knowledge Incorporation in Evolution- 42. M.D. Vose: The Simple Genetic Algorithm. Founda- ary Computation, Springer Verlag, Berlin, 2005, 307– tions and Theory. MIT Press, Cambridge, 1999. 331. 43. H. White: Artificial Neural Networks: Approxima- 26. A. Pinkus: Approximation theory of the MPL model tion and Learning Theory. Blackwell Publishers, Cam- in neural networks. Acta Numerica 8, 1998, 277–283. bridge, 1992. 27. K. Rasheed, X. Ni, and S. Vattam: Methods for using 44. D. Wolf, O.V. Buyevskaya, and M. Baerns: An evo- surrogate modesl to speed up genetic algorithm oprim- lutionary approach in the combinatorial selection and ization: Informed operators and genetic engineering. optimization of catalytic materials. Applied Catalyst In Y. Jin, editor, Knowledge Incorporation in Evolu- A: General 200, 2000, 63–77. tionary Computation, Springer Verlag, Berlin, 2005, 103–123. 22 Martin Holeňa Fig. 3. Comparison of the boosting approximations of the conversions of CH4 and NH3 and of the yield of HCN in the 1st and final iteration with their measured values for the 92 catalytic materials from the 7th generation of the genetic algorithm.