1. Introduction

Improving Optimization With Gaussian Processes in the Covariance Matrix Adaptation Evolution Strategy

Jiří Tumpach

Jan Koza

Martin Holeňa

1 0 Charles University, Faculty of Mathematics and Physics , Prague , Czech Republic 1 Czech Academy of Sciences, Institute of Computer Science , Prague , Czech Republic 2 Czech Technical University, Faculty of Information Technology , Prague , Czech Republic

This paper explores the use of Gaussian processes (GPs) in the covariance matrix adaptation evolution strategy (CMA-ES) for black-box optimization. GPs are powerful probabilistic models that capture complex relationships, making them suitable for modeling uncertain objective functions. Integrating GPs into the CMA-ES improves exploration and adaptation in the search space, enhancing convergence speed and solution quality. The paper describes a novel implementation framework allowing to use GPs as surrogate models for the CMA-ES. That framework findings encourage further research to advance the application of GPs in black-box optimization.

1. Introduction

black-box optimization were low-order polynomials and artificial neural networks (ANNs), specifically multilayer Black-box optimization is an optimization of objective perceptron (MLP). The former have always remained a functions for which no analytical description is provided. suitable choice in situations when enough evaluations It employs optimization methods that need as input only of the true, black-box objective function are afordable points in the search space paired with respective values of for the approximation properties of polynomials to be in the objective function obtained in a non-analytical way, efect. On the other hand, surrogate modeling for sube.g. from sensors, in experiments or through numerical stantially fewer evaluations of the true objective function simulations. The most frequently used approaches are has undergone further development during the last two evolutionary optimization, such as evolution strategies, decades. MLPs were soon replaced with another kind genetic algorithms, and diferential evolution, or other of ANNs, radial basis function networks (RBFs), which metaheuristics, such as particle swarm optimization. better fit the local peculiarities of an objective function

Because black-box optimization methods receive only landscape. Those networks, however, have since the late information about values of the objective function, they 2000s been superseded by other kinds of surrogate modtypically need many such values. This is a problem in els, primarily Gaussian processes (GPs), but also ranking situations when evaluating the black-box objective func- support vector machines (RSVMs) and random forests tion is time-consuming and/or expensive. That is fre- (RFs). GPs are currently the most successful kind of surquently the case if it is evaluated empirically in experi- rogate models for black-box optimization with a small ments. For example, for the evolutionary optimization evaluation budget of functions with complicated multitasks described in the book [ 1 ], the evaluation of a com- modal landscapes, mainly due to their ability to estimate paratively small generation of a genetic algorithm can the probability distribution of the true objective function sometimes take more than a week and cost more than in a given point. 10000 e. To tackle such problems, an approach called surrogate modeling has emerged more than 20 years ago.

In particular in continuous optimization, surrogate mod- 2. Surrogate Modeling in eling consists in evaluating the true, black-box objective Black-Box Optimization function only in some points and evaluating a suitable regression model in all remaining points. Such a regression model is called surrogate model or metamodel of the objective function. It is trained on points where the true objective function has been evaluated and approximates it in the search space.

The earliest kinds of surrogate models in continuous

Surrogate modeling for black-box optimization relies on the combination and interaction of three components: a regression model serving as a surrogate of the true, blackbox objective function, a black-box optimization method seeking the optimum of that objective function, and a strategy when to evaluate the true objective function and when its surrogate model. In the context of evolutionary black-box optimization, that strategy is usually called evolution control [ 2, 3, 4, 5, 6 ].

The regression models that are the most suitable kind of surrogate models if suficiently many evaluations of characterizing the objective function landscape and the the true, black-box objective function are afordable, are black-box optimization method [35, 24, 38, 10]. Apart low-order polynomials, typically quadratic functions [7, from classification according to the appropriateness of 8, 9, 10, 11]. The suficient number of evaluations de- the surrogate model for the considered data, metalearnpends, according to these cited research works, on the ing can also be used for regression of model error on the black-box function and on the dimension. For substan- combination of values of metafeatures [39]. tially fewer evaluations, the most traditional kind of sur- Finally, evolution control has been since the first surrorogate models were MLPs [5, 12], soon replaced with gate-assisted black-box optimization methods performed RBFs [13, 14, 11, 15, 10], and since the late 2000s with basically in two ways, generation-based and individualGaussian processes (GPs) a.k.a. kriging [2, 4, 16, 17, based. In the generation based, all points are in some 18, 19, 20, 21, 22]. Occasionally, RBFs were used as lo- generations evaluated with the true objective function cal models in combination with GP-based global mod- and in the remaining generations with the model. On the els [23]. Other kinds of surrogate models employed other hand, in every generation of the individual-based during the last decade include decision trees [24], ran- evolution control, based on the evaluation of all points dom forests [25, 26, 24] and ranking support vector ma- with the model, a preselection of points to be evaluated chines [27, 28]. The last one has an exceptional property with the true objective function is performed [5]. In most of invariance with respect to order-preserving transforma- of the surrogate-assisted methods, however, the evolution tions of the objective function. This is important in situ- control is specifically tailored to the respective method. ations when the black-box optimization algorithm possesses such invariance, a frequently encountered prop- 2.1. Surrogate Modeling in Connection erty of evolutionary algorithms. On the other hand, the

With the CMA-ES

surrogate modeling methods proposed in [ 4 ] and [22] use GPs to perform preselection based on a partial ordering Not only the two most important kinds of surrogate modthat is also invariant with respect to order-preserving els, i.e. low-order polynomials [7, 8, 9] and GPs [13, transformations. More importantly, the adaptive func- 4, 17, 21, 22], but also the less common RBFs, RFs and tion value warping approach recently proposed in [29] RSVMs [25, 27, 26, 15] are most often combined with the aims to provide such invariance to any surrogate model. Covariance matrix adaptation evolution strategy (CMA

As to the black-box optimization methods, surrogate ES). That is not surprising because CMA-ES has already models are most often combined with evolutionary op- in the 2000s become a state-of-the-art approach to singletimizers. Their combinations with the most important objective unconstrained continuous black-box optimizaamong them, the state-of-the-art black-box optimization tion [40, 41]. Occasionally, also Bayesian optimization algorithm CMA-ES will be surveyed in some detail below, is combined with CMA-ES. For example in [42], optiin Subsection 2.1. GPs were combined also with other evo- mization switches from the most traditional Bayesian lutionary optimization methods [18, 30], and GPs, poly- optimization method, EGO (Eficient Global Optimizanomials and RBFs were combined with particle swarm tion) [32], to CMA-ES. Finally, CMA-ES has also been optimization [11] and with memetic optimization [14]. combined with a team of surrogate models and the choice Moreover, GPs are used in black-box optimization in of the most appropriate among them based on landscape two diferent ways. In connection with evolutionary and analysis [37, 20]. similar black-box optimization methods, they serve as As to the evolution control of surrogate-assisted varia regression model evaluated instead of the true objec- ants of CMA-ES, the authors of the present paper have tive function. In addition, they also play a key role in been involved into an investigation of the evolution conBayesian optimization. That kind of optimization relies trol of two important polynomial-assisted CMA-ES varion GP-estimates of probability distributions of values of ants lmm-CMA [7, 9] and lq-CMA-ES [8] and of two the true objective function. Those probability distribu- variants of the GP-assisted variant DTS-CMA-ES [ 2, 19 ]. tions enable several ways of searching for optima of that Noteworthy, that investigation included mutually replacobjective function, each of them governed by a specific ing the evolution control of each variant with the evoluacquisition function [31, 32, 33]. The surrogate-assisted tion control of the others. According to its findings, the black-box optimization methods constructing several sur- success of those important surrogate-assisted CMA-ES rogate models simultaneously either aggregate them to variants is definitely not limited to using the respective a team [14, 11] or complement the evolution control by specific tailored evolution control [6]. a classifier selecting the most appropriate among those models. Important examples of classifiers used in this context are ANNs [34, 35, 36] and classification trees [ 37, 20].

Their learning can be viewed as metalearning because it is based on metafeatures, i.e. properties empirically

3. New Framework for a Surrogate-Assisted CMA-ES The most widely used implementation of the CMA-ES

algorithm is the oficial code written by the author of the algorithm Nikolaus Hansen and his team [43]. It is avail- For the CMA-ES algorithm in particular, the steps are able in multiple programming languages, including C, these: C++, Matlab, R, Python, and others. It is being actively de- 1. Sample points veloped, and it contains various versions and extensions 2. Evaluate the objective function () of the algorithm and extensive parameterization options. 3. Select lowest () While the C and C++ versions are the most performant 4. Update the population mean and covariance mafor solving real problems in practice, the most suitable trix for experimentation with the algorithm itself is nowa- 5. Repeat until optimum reached days the Python version. However, the Python CMA-ES version is still based on the original Matlab legacy code These steps correspond to the methods implemented rewritten into Python. It contains very long function def- in the main class of the framework ModularCMAES as initions with multiple nested if statements for diferent shown in the diagram in Figure 1. It also depicts the soalgorithm variants and parameter handling, which makes called ask-and-tell interface provided by the framework it highly inconvenient to experiment with modifications as well. of the core parts of the algorithm.

Therefore we decided to base our code on a diferent implementation by Jacob de Nobel and his colleagues called Modular CMA-ES [44], which is written in a modern modular object-oriented way, allowing to create different variants of the CMA-ES algorithm easily.

3.1. Modular CMA-ES The starting point of our implementation is the library

Modular CMA-ES. Each optimization technique is encapsulated within a modular component, providing independence and flexibility in selecting and combining diferent modules. This modularity enables users to construct tailored optimization strategies by combining multiple modules, thereby expanding the exploration space and enhancing the search capabilities of the CMA-ES algorithm. By integrating previously distant optimization techniques, the library enables combinatorial exploration of diferent strategies within the CMA-ES framework. Users can efortlessly combine modules representing various optimization methods such as population sampling techniques, surrogate modeling, elitism, step size adaptation, restart strategies, and constraint handling mechanisms. This combinatorial exploration empowers researchers to exploit the strengths of diferent techniques, leading to more efective and eficient optimization processes. The Modular CMA-ES library prioritizes ease of use and customization. Moreover, the modular architecture allows for the activation and deactivation of modules during runtime, facilitating dynamic exploration and adaptation during the optimization process.

A general scheme of an evolution strategy can be expressed in the following steps:

1. Generate a new population However, this library does not provide support for surrogate models on its own. That is why we have been developing the framework described in this paper. 3.2. Incorporating Gaussian Processes

We added to the Modular CMA-ES package popular covariance functions such as Matérn, RBF, periodic, and many others [45]. In addition to these individual kernels, the package also provides the flexibility to explore additive and multiplicative combinations of them, cf. Subsection 3.3. This allows users to create more complex and customized GP-based surrogate models by combining multiple kernels together. Furthermore, the framework ofers a search within these kernels. A list of Gaussian process covariance functions that are available in the framework follows.

Included covariance functions [45] • Polynomial Kernels • Parabolic • RBF • Exponential curve • Periodic kernel • Matérn 1 , Matérn 3 and Matérn 5

2 2 2 Included covariance function modifications • Learnable scaling of features • Exponential mapping

3.3. A Systematic Approach to Combining Incorporated Covariance Functions

The works [46] and in more detail [47] present a systematic approach to automating the construction of GP covariance functions. Compositional kernels enable flexible and automatic discovery of the appropriate structure and complexity of a model by allowing the composition of multiple simpler kernels. By combining these kernels, the model can capture a wide range of patterns and structures, adapting to the complexity of the underlying data. Our framework evaluates the performance of each kernel through cross-validated regression, ensuring its efectiveness in capturing the underlying data patterns. Additionally, a complexity-based penalization approach is employed to assess the complexity of each kernel. By incorporating these evaluation methods, the framework enables the automatic selection of the most suitable kernels for optimizing complex problems.

3.4. Included Evolution Control

Evolution control in surrogate CMA-ES involves the management of the surrogate model and the decision-making process of how to update it. The key idea is to balance the exploration of the search space and the exploitation of promising regions guided by the surrogate model’s predictions. The evolution control in surrogate CMA-ES plays a crucial role in leveraging the surrogate model to guide the search. We will briefly outline two diferent evolution controls we implemented in the framework. Doubly Trained S-CMA-ES The DTS-CMA-ES published in [ 2, 19 ] is a successor to the S-CMA-ES algorithm, which it extends with a second round of surrogate model training. The algorithm involves sampling a new population, training a surrogate model on original-evaluated points, selecting points based on the model’s prediction, evaluating those points, retraining the model, and predicting fitness for non-original evaluated points. The key features include sampling from the CMA-ES distribution, utilizing Gaussian process uncertainty estimation for point selection, using recent points for fitness prediction, and maintaining a training set near the CMA-ES distribution mean.

Each generation of this EC can be summarized in the following steps:

1. sample a new population of size (standard CMA

ES ofspring), 2. train the first surrogate model on the originalevaluated points from the archive , 3. select ⌈ ⌉ point(s) wrt. a criterion , which is based on the first model’s prediction, 4. evaluate these point(s) with the original fitness, 5. retrain the surrogate model also using these new point(s), and 6. predict the fitness of the non-original evaluated points with this second model.

Kendall- Rank Test Strategy From lq-CMA-ES

In this evolution strategy developed for the surrogate-as

sisted CMA-ES variant LQ CMA-ES [8], which is based on quadratic polynomials, a queue is utilized to store all evaluated solutions for model building. During each iteration, a limited number of the best solutions based on the model’s performance are chosen from the population. These selected solutions are then evaluated using the true objective function , sorted, and added to the end of the queue (with the best solution being enqueued last). To maintain the queue’s size, the oldest elements are dropped when the maximum capacity is reached. This process continues until the Kendall- rank correlation coeficient between the rankings of function and the model’s rankings exceeds a threshold of 0.85, or until the entire population has been evaluated. At the end of the process, the population is ranked based on surrogate iftness unless all population members have been evaluated using function , in which case the rankings based on function are used. Through using the correlation coeficient, this approach avoids a direct comparison of the model and true objective function.

3.5. IOHprofiler Integration

The use of Modular CMA-ES in conjunction with IOHprofiler [ 48, 49] ofers a powerful approach for analyzing and comparing iterative optimization heuristics. IOHprofiler, a versatile tool for evaluating algorithm performance, provides statistical assessments by analyzing the distribution of fixed-target running time and fixed-budget function values. By integrating modular CMA-ES with IOHprofiler, researchers can gain insights into the algorithm’s behavior, assess its adaptability, and compare its performance against other optimization heuristics. The combination allows for tracking the evolution of algorithm parameters, facilitating the analysis, comparison, and design of self-adaptive algorithms. With IOHproifler’s experimental and post-processing capabilities, researchers can generate and evaluate running time data for benchmark problems, adjust the precision and range of displayed data, and make informed decisions based on the statistical evaluations produced.

4. Conclusion

This paper presented a new framework for support of the state-of-the-art black-box optimization method CMA-ES through GP-based modeling. It is a work-in-progress paper: not all intended functionality described in Section 3 has already been implemented and even some of the implemented is not yet working properly. However, we hope that the situation will be much better at the time of the workshop. Still, we are not aware of any other system that provides such a comprehensive functionality for combining CMA-ES with Gaussian processes.

We have concentrated on Gaussian processes because we consider them to be the most suitable kind of surrogate models for dificult multimodal black-box functions if only a small number of evaluations of the true objective function is available. In the future, however, we intend to extend the developed framework also to other kinds of surrogate models. Most importantly, to low-order polynomials, which are a surrogate-modeling continuation of traditional response surface models [50], and which have always been the most successful kind of surrogate models if a large number of evaluations of the true objective function is available or if that function is easy to ift. In addition, we intend to include also some other of the models recalled above in Section 2, as well as several models that have not yet been employed for surrogate modeling, but we believe that they are worth to be investigated to this end. For various time horizons, we think altogether of the following models: • Deep Gaussian processes, in which an ANN architecture connects individual GPs, similarly to connecting individual recurrence cells in a long short term memory [51, 52]. • MLPs in the neural tangent kernel parametrization [53, 54, 55], which at a suficient width have an ability to mimic GP sampling and to replace traditional acquisition functions in Bayesian optimization. Such behaviour of this kind of ANNs is, according to [53] and [55], a consequence of their asymptotic properties if the number of hidden neurons increases to infinity [56, 54, 57]. • Variational autoencoders, allowing to perform optimization on a latent space of a substantially lower dimension. Such use of a low dimensional latent space has already been investigated in the case of Bayesian optimization [58, 59]. • The generative adversarial networks (GANs) paradigm has been recently shown to be applicable to black-box optimization. More precisely, a generator has to propose samples compatible with the distribution of low values or directly with the distribution of the optimum of the considered black-box function, whereas one or more discriminators have to classify samples according to whether they are governed by that distribution [60, 61].

Acknowledgments This work was supported by the Czech Technical Uni

versity grant SGS23/205/OHK3/3T/18 and by the SVV project number 260 575 of the Charles University. Computational resources were supplied by the project "eInfrastruktura CZ" (e-INFRA LM2018140). proximate ranking, Applied Intelligence 48 (2018) lution control for the surrogate cma-es, in: PPSN, 4288–4204. 2016, pp. 59–68. [5] Y. Jin, M. Olhofer, B. Sendhof, A framework for [20] Z. Pitra, J. Repický, M. Holeňa, Landscape analysis evolutionary optimization with approximate fitness of Gaussian process surrogates for the covariance functions, IEEE Transactions on Evolutionary Com- matrix adaptation evolution strategy, in: GECCO, putation 6 (2002) 481–494. ACM, 2019, pp. 691–699. [6] Z. Pitra, M. Hanuš, J. Koza, J. Tumpach, M. Holeňa, [21] L. Toal, D. Arnold, Simple surrogate model assisted Interaction between model and its evolution control optimization with covariance matrix adaptation, in: in surrogate-assisted CMA evolution strategy, in: PPSN, 2020, pp. 184–197.

GECCO, 2021, p. 358 (paper no.). [22] V. Volz, G. Rudolph, B. Naujoks, Investigating un[7] A. Auger, D. Brockhof, N. Hansen, Benchmark- certainty propagation in surrogate-assisted evoluing the local metamodel cma-es on the noiseless tionary algorithms, in: GECCO, 2017, pp. 881–888. BBOB’2013 test bed, in: GECCO, 2013, pp. 1225– [23] Z. Zhou, Y. Ong, P. Nair, A. Keane, K. Lum, Combin1232. ing global and local surrogate models to accellerate [8] N. Hansen, A global surrogate assisted CMA-ES, evolutionary optimization, IEEE Transactions on in: GECCO, 2019, pp. 664–672. Systems, Man and Cybernetics. Part C: Applications [9] S. Kern, N. Hansen, P. Koumoutsakos, Local meta- and Reviews 37 (2007) 66–76. models for optimization using evolution strategies, [24] B. Saini, M. Lópey-Ibañez, K. Miettinen, Automatic in: PPSN, 2006, pp. 939–948. surrogate modelling technique selection based on [10] H. Yu, C. Sun, Y. Tan, J. Zeng, Y. Jin, An adaptive features of optimization problems, in: GECCO, model selection strategy for surrogate- assisted par- 2019, pp. 1765–1772. ticle swarm optimization algorithm, in: IEEE SCI, [25] N. Belkhir, J. Dréo, P. Savéant, M. Schoenauer, Per 2016, pp. 1–8. instance algorithm configuration of CMA-ES with [11] H. Wang, Y. Jin, J. Doherty, Committee-based active limited budget, in: GECCO, 2017, pp. 681–688. learning for surrogate-assisted particle swarm opti- [26] Z. Pitra, J. Repický, M. Holeňa, Boosted regression mization of expensive problems, IEEE Transactions forest for the doubly trained surrogate covariance on Cybernetics 47 (2017) 2664–2677. matrix adaptation evolution strategy, in: ITAT 2018, [12] M. Papadrakakis, N. Lagaros, Y. Tsompanakis, Struc- 2018, pp. 72–79.

tural optimization using evolution strategies and [27] I. Loshchilov, M. Schoenauer, M. Sebag, Comparineural networks, Computer Methods in Applied son-based optimizers need comparison-based surMechanics and Engineering 156 (1998) 309–333. rogates, in: PPSN, 2010, pp. 364–373. [13] L. Bajer, M. Holeňa, Surrogate model for contin- [28] T. Runarsson, Ordinal regression in evolutionary uous and discrete genetic optimization based on computation, in: PPSN, 2006, pp. 1048–1057. RBF networks, in: Intelligent Data Engineering and [29] A. Abbasnejad, D. V. Arnold, Adaptive function Automated Learning, Springer, 2010, pp. 251–258. value warping for surrogate model assisted evolu[14] D. Lim, Y. Ong, Y. Jin, S. B., A study on metamod- tionary optimization, in: Parallel Problem Solving eling techniques, ensembles, and multi-surrogates from Nature – PPSN XVII: 17th International Conin evolutionary computation, in: GECCO, 2007, pp. ference, PPSN 2022, Dortmund, Germany, 2022, pp. 1288–1295. 76–89. [15] H. Ulmer, F. Streichert, A. Zell, Model-assisted [30] M. Wu, A. Karkar, B. Liu, A. Yakovlev, G. Gielen, steady state evolution strategies, in: GECCO, V. Grout, Network on chip optimization based on Springer, 2003, pp. 610–621. surrogate model assisted evolutionary algorithms, [16] O. Krause, Recombination weight based selection in: IEEE CEC, 2014, pp. 3266–3271.

in the DTS-CMA-ES, in: PPSN, 2022, pp. 295–308. [31] Y. Diouane, V. Picheny, R. Le Riche, A. Di Perro[17] Z. Li, T. Gao, B. Wang, Elite-driven surrogate as- tolo, TREGO: a trust-region framework for eficient sisted CMA-ES algorithm by improved lower confi- global optimization, Journal of Global Optimization dence bound method, in: Engineering with Comput- 85 (2022) 10.1007/s10898–022–01245–w (doi). ers, Springer, 2022, pp. 10.1007/s00366–022–01642– [32] D. Jones, M. Schonlau, W. Welch, Eficient global op5 (doi). timization of expensive black-box functions, Jour[18] L. Na, Q. Feng, Z. Liang, W. Zhong, Gaussian pro- nal of Global Optimization 13 (1998) 455–492. cess assisted coevolutionary estimation of distri- [33] J. Knowles, ParEGO: a hybrid algorithm with onbution algorithm for computationally expensive line landscape approximation for expensive multiproblems, Journal of Central South University of objective optimization problems, IEEE Transactions Technology 19 (2012) 443–452. on Evolutionary Computation 10 (2006) 50–66. [19] Z. Pitra, L. Bajer, M. Holeňa, Doubly trained evo- [34] Y. He, Y. Yuen, Black box algorithm selection by convolutional neural network, in: LOD, 2020, pp. sponse Surface Methodology: Proces and Product 264–280. Optimization Using Designed Experiments, John [35] M. Pikalov, V. Mironovich, Automated parameter Wiley and Sons, Hoboken, 2009. choice with exploratory landscape analysis and ma- [51] T. Bui, D. Hernandez-Lobato, J. Hernandez-Lobato, chine learning, in: GECCO, 2021, pp. 1982–1985. Y. Li, R. Turner, Deep Gaussian processes for regres[36] R. Prager, M. Seiler, H. Trautman, P. Kerschke, To- sion using approximate expectation propagation, wards feature-free automated algorithm selection in: ICML, 2016, pp. 1472–1481. for single-objective continuous black box optimiza- [52] G. Hernández-Muñoz, C. Villacampa-Calvo, D. Hertion, in: IEEE SCI, 2021, pp. 1–8. nández Lobato, Deep Gaussian processes using [37] Z. Pitra, L. Bajer, M. Holeňa, Knowledge-based expectation propagation and Monte Carlo methods, selection of gaussian process surrogates, in: ECML in: ECML PKDD 2020, 2021, pp. 479–494.

Workshop IAL, 2019, pp. 48–63. [53] B. He, B. Lakshminarayanan, Y. Teh, Bayesian deep [38] R. Seiler, M.V.and Prager, P. Kerschke, H. Traut- ensembles via the neural tangent kernel, in: NIPS, mann, A collection of deep learning-based feature- 2020, pp. 1–13. free approaches for characterizing single-objective [54] A. Jacot, F. Gabriel, C. Hongler, Neural tangent continuous fitness landscapes, in: GECCO, 2022, kernel: Convergence and generalization in neural pp. 657–665. networks, in: NIPS, 2018, pp. 1–10. [39] A. Jankovic, G. Popovski, T. Eftimov, C. Doerr, The [55] B. Paria, B. Pòczos, K. Ravikumar, S. J., A. Suggala, impact of hyper-parameter tuning for landscape- et al., Be greedy -– a simple algorithm for blackaware performance regression and algorithm selec- box optimization using neural networks, in: ICML tion, in: GECCO, 2021, pp. 687–696. Workshop on Adaptive Experimental Design and [40] N. Hansen, A. Ostermaier, Completely derandom- Active Learning in the Real World, 2022, pp. 1–27. ized self-adaptation in evolution strategies, Evolu- [56] S. Arora, S. Du, W. Hu, Z. Li, R. Salakhutdinov, tionary Computation 9 (2001) 159–195. et al., On exact computation with an infinitely wide [41] N. Hansen, The CMA evolution strategy: A com- neural net., in: NIPS, 2019, pp. 1–10. paring review, in: Towards a New Evolutionary [57] J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, Computation, Springer, 2006, pp. 75–102. et al., Wide neural networks of any depth evolve [42] H. Mohammadi, R. Riche, E. Touboul, Making EGO as linear models under gradient descent, in: NIPS, and CMA-ES complementary for global optimiza- 2019, pp. 1–10. tion, in: Learning and Intelligent Optimization, [58] S. Kim, P. Lu, C. Lob, J. Smith, J. Snoek, et al., Deep Springer, 2015, pp. 287–292. learning for bayesian optimization of scientific [43] N. Hansen, CMA-ES source code, 2016. https://cma- problems with high-dimensional structure, Transaces.github.io. tions on Machine Learning Research 1 (2023) open[44] J. de Nobel, D. Vermetten, The source code of review tPMQ6Je2rB.

the modular python version of CMA-ES, 2020. [59] A. Tripp, E. Daxberger, J. Hernández-Lobato, Samhttps://github.com/IOHprofiler/ModularCMAES. ple-eficient optimization in the latent space of deep [45] E. Rasmussen, C. Williams, Gaussian Processes for generative models viaweighted retraining, in: NIPS,

Machine Learning, MIT Press, Cambridge, 2006. 2020, pp. 1–14. [46] D. Duvenaud, J. Lloyd, R. Grosse, J. Tenebaum, [60] M. Gillhofer, H. Ramsauer, J. Brandstetter, B. Schäfl, G. Zoubin, Structure discovery in nonparametric S. Hochreiter, A GAN based solver of black-box regression through compositional kernel search, in: inverse problems, in: NIPS, 2019, pp. 1–5. 30th International Conference on Machine Learn- [61] M. Lu, S. Ning, S. Liu, F. Sun, B. Zhang, et al., OPTing, 2013, pp. 1166–1174. GAN: A broad-spectrum global optimizer for black[47] D. Duvenaud, Automatic Model Construction with box problems by learning distribution, 2022. Arxiv Gaussian Processes, Ph.D. thesis, University of Cam- 2102.03888v5.

bridge, 2014. [48] C. Doerr, H. Wang, F. Ye, S. van Rijn, T. Bäck,

IOHprofiler: A benchmarking and profiling tool for iterative optimization heuristics, 2018. Arxiv 1810.05281. [49] C. Doerr, F. Ye, N. Horesh, H. Wnag, O. Shir, et al.,

Benchmarking discrete optimization heuristics with IOHprofiler, Applied Soft Computing Journal 88 (2020) 106027 (paper no.). [50] R. Myers, D. Montgomery, C. Anderson-Cook, Re

[1]

Baerns ,

Holeňa , Combinatorial Development of Solid Catalytic Materials . Design of HighThroughput Experiments, Data Analysis, Data Mining , Imperial College Press / World Scientific, London, 2009 .

[2]

Bajer ,

Pitra ,

Repický ,

Holeňa , Gaussian process surrogate models for the CMA evolution strategy , Evolutionary Computation 27 ( 2019 ) 665 - 697 .

[3]

Büche ,

Schraudolph ,

Koumoutsakos , Accelerating evolutionary algorithms with Gaussian process fitness function models , IEEE Transactions on Systems, Man, and Cybernetics , Part C: Applications and Reviews 35 ( 2005 ) 183 - 194 .

[4]

Huang ,

Radi ,

El Hami ,

Bai , CMA evolution strategy assisted by kriging model and ap-