ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 163–171 http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 J. Repický, L. Bajer, M. Holeňa Traditional Gaussian Process Surrogates in the BBOB Framework Jakub Repický1 , Lukáš Bajer1 , and Martin Holeňa2 1 Faculty of Mathematics and Physics Charles University in Prague Malostranské nám. 25 Prague, Czech Republic ④❥✳r❡♣✐❝❦②✱❜❛❥❡❧✉❦⑥❅❣♠❛✐❧✳❝♦♠ 2 Institute of Computer Science, Czech Academy of Sciences Pod Vodárenskou věží 2 Prague, Czech Republic ♠❛rt✐♥❅❝s✳❝❛s✳❝③ Abstract: Objective function evaluation in continuous op- Still the whole population must be evaluated which timization tasks is often the operation that dominates the might make running the algorithm for a sufficient num- algorithm’s cost. In particular in the case of black-box ber of generations infeasible due to expensive evaluation. functions, i.e. when no analytical description is available, To address this issue, techniques that involve a surrogate and the function is evaluated empirically. In such a situ- regression model of the fitness have been proposed. ation, utilizing information from a surrogate model of the Two major requirements of incorporating a surrogate objective function is a well known technique to accelerate model (also called metamodel in the literature) into evo- the search. In this paper, we review two traditional ap- lutionary strategies are model management and model se- proaches to surrogate modelling based on Gaussian pro- lection. cesses that we have newly reimplemented in MATLAB: Model management is the task to control the surrogate Metamodel Assisted Evolution Strategy using probability model’s impact on algorithm’s convergence by using the of improvement and Gaussian Process Optimization Pro- original fitness alongside its surrogate model in the course cedure. In the research reported in this paper, both ap- of the search. proaches have been for the first time evaluated on Black- In evolution control, a certain fraction of individuals or Box Optimization Benchmarking framework (BBOB), a generations is controlled, i.e. evaluated with the fitness comprehensive benchmark for continuous optimizers. function, while the remainder is evaluated with the surro- gate model [8]. For example, Metamodel-Assisted Evolution Strategy 1 Introduction (MAES) uses a surrogate model to pre-select the most promising individuals before they enter a selection proce- An analytical definition of the objective function in dure of a standard ES [3]. real-world optimization tasks is sometimes hard to ob- In contrast to evolution control, surrogate approach [2] tain. Therefore, neither information about the function’s directly optimizes the model output in an iterative proce- smoothness nor its derivatives are available. Moreover, dure, thus avoiding the issue of determining the correct evaluation of the function is usually expensive as it can fraction of controlled individuals. In each iteration, a fixed only be done empirically, e.g. by measurement, testing, or number of candidate solutions are found by minimizing running a computer simulation. Such functions are called the model with an evolution strategy. These solutions are black-box. thereafter evaluated on the real fitness and the model is One class of optimization algorithms that are suc- updated. cessfully applied to black-box optimization are evolu- Regarding the model selection, Gaussian processes tion strategies. An evolution strategy is an optimization (GPs) are a non-parameterized regression model that is ap- method that works on a population of candidate solutions pealing for the task as it gives its prediction in terms of a using evolutionary operators of selection, mutation and re- Gaussian distribution. The variance of this prediction can combination [10] [11]. In particular, the Covariance Ma- be utilized as a confidence measure that promotes explo- trix Adaptation Evolution Strategy (CMA-ES) [4] is con- ration of insufficiently modelled areas. sidered to be the state-of-the-art continuous black-box op- This paper reviews two traditional algorithms intercon- timizer. It samples each population according to a mul- necting Gaussian process-based surrogate models with the tivariate normal distribution determined by a covariance CMA-ES: Metamodel-Assisted Evolution strategy with matrix. A notable property of the algorithm is the adap- improved pre-selection criterion by Ulmer, Strechert and tation of the covariance matrix along the path of past suc- Zell [13] and Gaussian Process Optimization Procedure cessful search steps. (GPOP) by Büche, Schraudolph and Koumoutsakos [2]. 164 J. Repický, L. Bajer, M. Holeňa The former is a GP-based MAES with probability of For a set tN+1 that includes a new observation tN+1 = improvement (POI) as a pre-selection criterion. f (xN+1 ), we obtain The latter represents the surrogate approach – in each iteration, a local GP model is built and four functions de- exp(− 1 tT C−1 tN+1 ) p(tN+1 | XN+1 ) = p 2 N+1 N+1 . (2) signed to balance predicted value and variance are opti- (2π)N+1 det(CN+1 ) mized. While both algorithms are GP-based, they differ both Using Bayesian rule for conditional probabilities, the pre- in the model-management approach as well as in utilizing diction at a new data point has the density function GP’s confidence. p(tN+1 | XN+1 ) The framework COCO/BBOB (Comparing Continuous p(tN+1 | XN+1 , tN ) = . (3) p(tN | XN ) Optimizers / Black-Box Optimization Benchmarking) [7] provides an experimental setup for benchmarking black- The covariance matrix CN+1 can be written with the use box optimizers. In particular, its noiseless testbed [6] com- of the covariance matrix CN as prises of 24 functions with properties that are to different   extents challenging for continuous optimizers. CN k CN+1 = (4) Both tested methods had been proposed before the kT κ BBOB framework originated. In the research reported in this paper, we evaluated both methods on the noiseless part where k = (C(xi , xN+1 ))N1 is a vector of covariances be- of the BBOB framework. For that purpose a new imple- tween the new point and XN and κ = C(xN+1 , xN+1 ) is the mentation1 was required as the original source codes were new point’s variance. not available to us. Using (1) and (2) together with the fact that the inverse In the following, we first briefly introduce Gaussian pro- C−1 −1 N+1 can also be expressed by the means of CN , (3) can cesses as a suitable surrogate fitness model in Section 2. be simplified to a univariate Gaussian [2] An exposition of the tested methods is given in Section 3 and experimental setup in Section 4. Section 5 presents ! 1 (tN+1 − tˆN+1 )2 experimental results and finally Section 6 concludes the p(tN+1 | XN+1 , tN ) ∝ exp − (5) paper. 2 σt2N+1 with mean and variance given by 2 Gaussian processes Both algorithms under review feature the choice of Gaus- tˆN+1 = kT C−1 N tN , sian processes as a surrogate fitness function model. σt2N+1 = κ − kT C−1 N k. GPs are a probabilistic model with several properties that make it well suited for fitness function modelling: its A comprehensive exposition of Gaussian processes can hyperparameters are comprehensible and limited in num- be found in [9]. ber and it provides a confidence measure given by standard The covariance function plays an important role as it ex- deviation of predicted value at new data points. presses prior assumptions about the shape of the modelled In the following, we define a GP using notation and function. The vector Θ of model’s hyperparameters is usu- equations from Büche [2]. ally fitted with the maximum likelihood method. Consider f : RD → R an unknown real-parameter func- One of the most commonly used covariance functions is tion to be approximated. GP model is specified by a squared exponential covariance function:  N   set XN = xi | xi ∈ RD i=1 of N training data points with 1 (x p − xq )T (x p − xq ) known function values tN = {ti | f (xi ) = ti }Ni=1 . The data C (x p , xq ) = θ exp − (6) 2 r2 are assumed to be a sample of zero-mean multivariate Gaussian distribution with joint probability density where the parameter θ scales the covariance between two points and radius r is the characteristic length scale of the exp(− 12 tTN C−1 N tN ) process. p(tN | XN ) = p (1) (2π)N det(CN ) When two points are close to each other compared to the characteristic length scale r, the covariance is close to one where the covariance matrix CN is defined by means of while it exponentially decays to zero with their distance a covariance function C(xi , x j , Θ) i, j ∈ {1, . . . , N} with growing. a fixed set of hyperparameters Θ. The squared exponential covariance function can be im- proved with automatic relevance determination (ARD): ! 1 The sources are available at ❤tt♣s✿✴✴❣✐t❤✉❜✳❝♦♠✴r❡♣❥❛❦✴ 1 D (x p,i − xq,i )2 C (x p , xq ) = θ exp − ∑ (7) s✉rr♦❣❛t❡✲❝♠❛❡s✴ 2 i=1 ri2 Traditional Gaussian Process Surrogates in the BBOB Framework 165 where the radii ri scale the impact of two points distance Algorithm 1 GP Model-Assisted Evolution Strategy on their correlation separately in each dimension. Input: f – fitness function To address the need to balance the exploitation of a fit- µ – number of parents ted model prediction and the exploration of regions where λ – population size model’s confidence is low, the standard deviation of GP’s λPre – size of pre-selected population prediction can be utilized. Ntr – size of training dataset One possibility are merit functions proposed in [12] χM – the preselection criterion that depends on a GP for the purpose of balancing exploration and exploitation model M, e.g. mean model prediction or POI in engineering design optimization. Consider M to be a 1: Pop ← generate and evaluate λ initial samples trained Gaussian process model. A merit function com- 2: M ← a GP model trained on points from Pop bines the goals of finding a minimum predicted by the 3: while termination criteria not reached do model M and improving the accuracy of M into a single 4: Offspring ← reproduce Pop into λPre new points objective function: 5: Offspring ← mutate Offspring 6: Offspring ← select best λ points according to the fM,α (x) = tˆ(x) − ασ (x) (8) pre-selection criterion χM where tˆ(x) is the mean of the prediction of the function 7: evaluate Offspring with f value in x, σ (x) is the prediction standard deviation and 8: M ← update model M on Ntr points most recently α ≥ 0 is a balancing parameter. evaluated with f Another option is the probability of improvement (POI). 9: Pop ← select µ points best according to f Let us assume Gaussian process prediction to be a ran- 10: end while dom variable Y (x) with mean tˆ(x) and standard deviation σ (x). For a chosen threshold T less than or equal to the so-far best obtained fitness value fmin , the probability of 3.2 Gaussian Process Optimization Procedure improvement in point x is defined as:   T − tˆ(x) Gaussian Process Optimization Procedure is due to Büche, POIT (x) = p(Y (x) ≤ T ) = Φ (9) σ (x) Schraudolph and P. Koumoutsakos [2]. A Gaussian pro- where Φ is the cumulative distribution function of the dis- cess is trained on a subset of already evaluated data and tribution N (0, 1). optimized by an ES instead of the original fitness. As optimization of the surrogate is cheaper than optimization of the original fitness, this can be repeated until reaching 3 Tested Methods some termination criteria. 3.1 GP Model Assisted Evolution Control CMA-ES is used as the evolution strategy, with the number of parents µ set to 2 and the population size λ A standard ES operates on λ offsprings generated from set to 10. µ parents by evolutionary operators of recombination and The pseudocode is given in Figure 2. After generating mutation. After the fitness of the offsprings is evaluated, a an initial population, a local GP model is built and utilized population of µ best individuals is selected to reproduce to in an iterative procedure. Considering possibly low ac- a new offspring in the next iteration. In (µ, λ ) ES, µ best curacy and computational infeasibility of global models, individuals are selected from the λ offsprings, whereas in the training dataset is restricted to NC points closest to the (µ + λ ) ES, µ best individuals are selected from the union current best known solution xbest (step 6) and NR most re- of the offprings and their parents. cently evaluated points (step 7). MAES [3] modifies the standard evolution strategy with producing λPre > λ instead of λ individuals from µ par- If the GP model M has been successfully trained then ents by the same operators of recombination and mutation the CMA-ES optimizes four merit functions fM,α (8) for (steps 4 and 5 in the Algorithm 1). Given a fitted Gaus- each α ∈ {0, 1, 2, 4}. Areas that might be approximated sian process M, individuals xi , i = 1, . . . , λPre are then pre- inaccurately are avoided by bounding the ES to the hyper- selected according to a criterion χM defined for the model cube spanning the set of NC points selected from xbest ’s M to create λ offsprings (step 6). neighborhood (steps 10 and 12). The points that are op- The GP model is trained in every generation on a set of tima of the considered merit functions are evaluated by the Ntr most recently evaluated individuals (step 8). original fitness and added to the dataset of known points. In this paper we consider two pre-selection criteria in In the case that no new solution is found, a random per- accordance with [13]: mean model prediction (MMP) (5) turbation (step 23) is evaluated and added to the dataset. which selects λPre points with the best mean predicted fit- Unfortunately, authors don’t specify the parameter m that ness tˆ(x) and POI (9). The authors of [13] prefer POI to occurs in 23. We set it to the value m = 1. merit functions (8), as it does not depend on finding the The authors used the following covariance function in appropriate value of the scale parameter α. their GP model: 166 J. Repický, L. Bajer, M. Holeňa Benchmarking) [6] [7] is intended for systematic ex- ! perimentation with black-box optimizers. We use the 1 n (x p,i − xq,i )2 noiseless testbed of the BBOB framework, which is C (x p , xq ) = θ1 exp − ∑ 2 i=1 ri2 (10) comprised of 24 real-parameter benchmark functions with different characteristics such as non-separability, + θ2 + δ pq θ3 multi-modality, ill-conditioning etc. where ri , θ1 , θ2 , θ3 > 0 and δ pq is the Kronecker delta. Each function is defined on [−5, 5]D for all D ≥ 2. For The function is the sum of the squared exponential covari- every function and every dimensionality, 15 trials of the ance function with ARD (7), constant shift θ2 and a white optimizer are run. A different instance of the original noise scaled with θ3 . function is used for each trial. We used dimensionalities 2, 3, 5, 10, thus 1440 trials in total were run for each set of Algorithm 2 Gaussian Process Optimization Procedure parameters and each method. Input: NC – number of training points selected according Since source codes were available for neither of the to their distance to xbest tested methods, we implemented them in MATLAB. NR – number of training points selected according to For Gaussian processes, we chose MATLAB’s default most recent time of evaluation implementation, ❢✐tr❣♣, a part of Statistics and Machine f – fitness function Learning Toolbox. The GP hyperparameters fitting was µ – the number of parents for CMA-ES done with a quasi-newton optimizer, which is the default λ – population size for CMA-ES in ❢✐tr❣♣. 1: {x1 , . . . , xNC /2 } ← a set of NC /2 points generated by Parameters of the benchmarked algorithms were set as CMA-ES follows: 2: yi ← f (xi ) i ∈ {1, . . . , NC /2} 3: A ← {(xi , yi ) | i ∈ {1, . . . , NC /2}} CMA-ES. A multi-start version with the population size 4: xbest ← arg minx∈{xi | i∈{1,...,N /2}} ( f (x)) doubled after each restart was used in the tests (MATLAB C 5: while stopping criteria not reached do code v. 3.62.beta). Number of restarts was set to 4, 6: TC ← NC points from {u | ∃z (u, z) ∈ A} closest to while other parameters settings were left default: λ = xbest 4 + ⌊3log10 D⌋, σstart = 38 , IncPopSize = ⌊λ /2⌋. 7: TR ← NR points most recently evaluated 8: M ← a GP model trained on TC ∪ TR MAES. We implemented GP Model Assisted Evolution 9: S ← 0/ Strategy on top of a framework developed for S-CMA-ES 10: d ← (di )D i=1 , di = maxx∈TC (xi ) − minx∈TC (xi ) algorithm, which employs a GP model in conjunction with 11: for all α ∈ {0, 1, 2, 4} do a generation evolution control [1]. The S-CMA-ES imple- 12: B ← x | xbest − d2 ≤ x ≤ xbest + d2 mentation allows to conveniently replace the population 13: x ← optimize fM,α within B by (µ, λ )-CMA-ES sampling step of the CMA-ES with a custom procedure. 14: if 6 ∃z (x, z) ∈ A then In this case, a population intended for pre-selection is sam- 15: y ← f (x) pled and processed as described in Subsection 3.1. The 16: S ← S ∪ {x, y} control is then handed over back to the CMA-ES. 17: end if The number of parents and population size were set to 18: end for correspond with the CMA-ES settings. 19: if S 6= 0/ then Two pre-selection criteria were tested: MMP and the 20: A ← A∪S POI with threshold equal to the so-far best sampled fitness 21: xbest ← arg minx∈{u | ∃z (u,z)∈A} ( f (x)) value fmin . In both cases λPre was set to 3λ . The training 22: else  D set was comprised from 2λ most recently evaluated points. 23: xR ← xibest + 100 zi di m zi ∼ N (0, 1) We used the same covariance function as in the GPOP i=1 case, i.e. (10). 24: yR ← f (x)  25: A ← A ∪ (xR , yR ) GPOP. We adhered to [2] in usage of the proposed covari- 26: end if ance function (10). 27: end while The termination criteria were chosen as follows: 28: return xbest • number of consecutive iterations with no new solu- tion found is larger than 2 while the tolerance on the two points euclidean distance for them to be consid- 4 Experimental Setup ered equal is 10−8 The framework COCO/BBOB (Comparing Con- • the overall change of fitness values during the last 10 tinuous Optimizers / Black-Box Optimization iterations is lower than 10−9 Traditional Gaussian Process Surrogates in the BBOB Framework 167 • the target fitness value is reached 6 Conclusion Training set size parameters NC and NR were chosen in In this paper, we compared the CMA-ES and our im- accordance with empirical results reported in [2], particu- plementation of two traditional methods which improve larly NC = NR = 5 ∗ D. upon the CMA-ES with Gaussian process-based surrogate models: MAES, which extends the CMA-ES with a pre- Although the performance is measured in terms of the selection step, and GPOP, which iteratively optimizes the number of fitness evaluations, other operations such as op- GP model. timizing a surrogate model may also be costly in bench- The benchmarks on the BBOB framework did not marking scenarios. If we consider functions in 10 D, a show any significant speedup of MAES compared to the run of GPOP on one core of a computational grid took ap- CMA-ES. On the other hand, GPOP in many cases out- proximately 27.8 real hours per function on average. For performs all the other methods, especially in early opti- those reasons, we limited the maximum number of fitness mization stages. On some functions, though, it tends to evaluations to 100 ∗ D for all tested methods. get trapped in local minima. This might be explained with the fact that GPOP requires considerably fewer function evaluations per iteration than other methods. However, the 5 Experimental Results model is built locally and might not be sufficient for ex- ploration of the search space in later phases. Results from experiments on all the 24 noiseless BBOB Our MAES implementation relies on a modification of benchmark functions are presented in Figures 1–3. the sampling step in CMA-ES, thereby changing the dis- The expected running time (ERT) depicted in Figure 1 tribution of the sampled population. This might mislead depends on a given target value, ft = fopt + ∆ f , i.e. the CMA-ES a bit and requires further research, in particular true optimum fopt of the respective benchmark function considering the proposal in [5]. raised by a small value ∆ f . The ERT is computed over Another area for further research is the exploration of all relevant trials as the number of the original function various confidence measures across tested methods, espe- evaluations (FE) executed during each trial until the tar- cially in connection with GPOP. get function value ft reached, summed over all trials and divided by the number of successful trials, i.e. trials that actually reached ft : [7] 7 Acknowledgments #FE( fbest ≥ ft ) Access to computing and storage facilities owned by par- ERT( ft ) = (11) #succ ties and projects contributing to the National Grid In- frastructure MetaCentrum, provided under the programme In the graphs for functions 1, 2, 5, 8, 9, 12, 14, GPOP "Projects of Large Research, Development, and Innova- achieved significantly better results compared to all other tions Infrastructures" (CESNET LM2015042), is greatly algorithms for some dimensions. In contrast, the differ- appreciated. ences between MAES and CMA-ES are rather small, re- The research performed by Lukáš Bajer has been sup- gardless of the pre-selection criterion. ported by the Czech Health Research Council project The graphs in Figure 2 summarize the performance NV15-33250A. over subgroups of the benchmark functions for the high- The research performed by Jakub Repický was sup- est tested dimension 10. The graphs show the proportion ported by SVV project number 260 224. of algorithm runs that reached a target value ft ∈ 10[−1..2] (see the figure caption for further details). The GPOP speedup is most eminent on the group of References separable functions (functions 1–5), that is functions, op- timization of which can be reduced to D one-dimensional [1] L. Bajer, Z. Pitra, and M. Holeňa. Benchmarking Gaus- problems [6]. On the other hand, GPOP has the worst re- sian processes and random forests surrogate models on the BBOB noiseless testbed. In Proceedings of the Compan- sults of all methods on multi-modal functions (functions ion Publication of the 2015 Annual Conference on Genetic 15–19). and Evolutionary Computation, GECCO Companion ’15, Similar results may be observed on Figure 3 that for 2015. each function shows the dependence of the relative best fit- [2] D. Büche, N. N. Schraudolph, and P. Koumoutsakos. Ac- ness value on the number of evaluations in 10 D. As can be celerating evolutionary algorithms with Gaussian process seen on the graphs for functions 13 and 24, GPOP in some fitness function models. IEEE Transactions on Systems, cases outperforms all other algorithms in early stages, but Man and Cybernetics, 35(2):183–194, 2005. then gets trapped in a local minimum. [3] M. Emmerich, A. Giotis, M. Özdemir, T. Bäck, and On the graph for functions 21 on Figure 3, MAES with K. Giannakoglou. Metamodel-assisted evolution strategies. MMP as the pre-selection criterion visibly outperforms all In In Parallel Problem Solving from Nature VII, pages 361– other algorithms. 370. Springer, 2002. 168 J. Repický, L. Bajer, M. Holeňa 3 1 Sphere 2 Ellipsoid separable 3 Rastrigin separable 4 Skew Rastrigin-Bueche separ log10 of (ERT / D) 2 1 CMAES MAES-MMP MAES-POI GPOP 0 target Df: 0.1 3 5 Linear slope 6 Attractive sector 7 Step-ellipsoid 8 Rosenbrock original log10 of (ERT / D) 2 1 0 4 9 Rosenbrock rotated 10 Ellipsoid 11 Discus 12 Bent cigar log10 of (ERT / D) 3 2 1 0 4 13 Sharp ridge 14 Sum of different powers 15 Rastrigin 16 Weierstrass log10 of (ERT / D) 3 2 1 0 4 17 Schaffer F7, condition 10 18 Schaffer F7, condition 1000 19 Griewank-Rosenbrock F8F2 20 Schwefel x*sin(x) log10 of (ERT / D) 3 2 1 0 4 21 Gallagher 101 peaks 22 Gallagher 21 peaks 23 Katsuuras 24 Lunacek bi-Rastrigin log10 of (ERT / D) 3 2 CMAES 1 MAES-MMP MAES-POI GPOP 0 target Df: 0.1 2 3 5 10 2 3 5 10 2 3 5 10 2 3 5 10 dimension dimension dimension dimension Figure 1: Expected running time (ERT as log10 of the number of f -evaluations) for target function value 0.1 divided by dimension, versus dimension. Different symbols correspond to different algorithms given in the legend of f1 and f24 . Values are plotted up to the highest dimensionality with at least one successful trial. Light symbols give the maximum number of function evaluations from the longest trial divided by dimension. Black stars indicate a statistically better result compared to all other algorithms with p < 0.05 using Hommel correction. Traditional Gaussian Process Surrogates in the BBOB Framework 169 separable functions moderate functions Proportion of function+target pairs 1.0 GPOP MAES-MMP bbob - f1-f5, 10-D bbob - f6-f9, 10-D 0.8 CMAES MAES-POI 0.6 0.4 MAES-POI CMAES 0.2 MAES-MMP GPOP 0.0 ill-conditioned functions multi-modal functions Proportion of function+target pairs 1.0 GPOP MAES-POI bbob - f10-f14, 10-D bbob - f15-f19, 10-D 0.8 MAES-MMP MAES-MMP 0.6 0.4 CMAES CMAES 0.2 MAES-POI GPOP 0.0 weakly structured multi-modal functions all functions Proportion of function+target pairs 1.0 GPOP GPOP bbob - f20-f24, 10-D bbob - f1-f24, 10-D 0.8 MAES-POI MAES-MMP 0.6 0.4 MAES-MMP MAES-POI 0.2 CMAES CMAES 0.0 -1 0 1 2 3 -1 0 1 2 3 log10 of (# f-evals / dimension) log10 of (# f-evals / dimension) Figure 2: Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by dimension (FE/D) for 61 targets with target precision in 10[−1..2] for different subgroups of BBOB benchmark functions in 10 D. 170 J. Repický, L. Bajer, M. Holeňa 1 Sphere 2 Ellipsoid separable 3 Rastrigin separable 4 Skew Rastrigin-Bueche separ 2 10 3 3 CMA-ES 0 MAES-MMP MAES-POI 2.5 2.5 log10 of Df GPOP 5 -2 2 2 -4 0 1.5 1.5 -6 -8 -5 1 1 5 Linear slope 6 Attractive sector 7 Step-ellipsoid 8 Rosenbrock original 5 6 3 6 log10 of Df 0 4 2 4 -5 2 1 2 -10 0 0 0 9 Rosenbrock rotated 10 Ellipsoid 11 Discus 12 Bent cigar 5 7 5 10 4 8 6 4 log10 of Df 3 6 5 3 2 4 4 2 1 2 0 3 1 0 13 Sharp ridge 14 Sum of different powers 15 Rastrigin 16 Weierstrass 4 2 3 2 1 3 1.5 log10 of Df 2.5 0 2 1 -1 2 1 0.5 -2 0 -3 1.5 0 17 Schaffer F7, condition 10 18 Schaffer F7, condition 1000 19 Griewank-Rosenbrock F8F2 20 Schwefel x*sin(x) 2 2 1.5 5 1.5 4 1 log10 of Df 1 3 0 1 0.5 2 -1 0 1 -2 -0.5 0.5 0 21 Gallagher 101 peaks 22 Gallagher 21 peaks 23 Katsuuras 24 Lunacek bi-Rastrigin 2 2 1 2.4 CMA-ES MAES-MMP 0.8 2.2 MAES-POI log10 of Df 1.5 1.5 GPOP 0.6 2 1 1 0.4 1.8 0.5 0.5 0.2 1.6 0 50 100 0 50 100 0 50 100 0 50 100 number of evaluations / D number of evaluations / D number of evaluations / D number of evaluations / D Figure 3: Relative fitness of best individual as log10 value versus number of evaluations divided by dimension for all functions in 10 D. Best fitness value at each number of evaluations is computed as the median of all trials. Traditional Gaussian Process Surrogates in the BBOB Framework 171 [4] N. Hansen. The CMA evolution strategy: a compar- ing review. In I. Inza J. A. Lozano, P. Larrañaga and E. Bengoetxea, editors, Towards a new evolutionary com- putation. Advances on estimation of distribution algo- rithms, pages 75–102. Springer, 2006. [5] N. Hansen. Injecting external solutions into CMA-ES. CoRR, abs/1110.4181, 2011. [6] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions. technical report rr-6829. Technical report, INRIA, 2009. Updated February 2010. [7] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameter black-box optimization benchmarking 2012: Experimental setup. technical report. Technical report, INRIA, 2012. [8] Y. Jin and B. Sendhoff. Fitness approximation in evolution- ary computation-a survey. In GECCO 2002 Proceedings of Genetic and Evolutionary Computation Conference, pages 1105–1111, 2002. [9] C. E. Rassmusen and C. K. I. Williams. Gaussian Pro- cesses for Machine Learning. Adaptive computation and machine learning series. MIT Press, 2006. [10] I. Rechenberg. Evolutionsstrategie – Optimierung technis- cher Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog, Stuttgart, 1973. [11] H.-P. Schwefel. Numerische Optimierung von Computer- Modellen. Birkhäuser, Basel, 1977. [12] V. Torczon and M. W. Trosset. Using approximations to ac- celerate engineering design optimization. In Proceedings of the 7th AIAA/USAF/NASA/ISSMO Multidisciplinary Anal- ysis & Optimization Symposium (Held at Saint Louis, Mis- souri), paper 98-4800, pages 507–512, 1998. [13] H. Ulmer, F. Streichert, and A. Zell. Evolution strategies assisted by Gaussian processes with improved pre-selection criterion. IEEE Congress on Evolutionary Computation, pages 692–699, 2003.