=Paper= {{Paper |id=Vol-1649/163 |storemode=property |title=Traditional Gaussian Process Surrogates in the BBOB Framework |pdfUrl=https://ceur-ws.org/Vol-1649/163.pdf |volume=Vol-1649 |authors=Jakub Repický, Lukáš Bajer, Martin Holeňa |dblpUrl=https://dblp.org/rec/conf/itat/RepickyBH16 }} ==Traditional Gaussian Process Surrogates in the BBOB Framework== https://ceur-ws.org/Vol-1649/163.pdf

ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 163–171
http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 J. Repický, L. Bajer, M. Holeňa

Traditional Gaussian Process Surrogates in the BBOB Framework

Jakub Repický1 , Lukáš Bajer1 , and Martin Holeňa2
1 Faculty of Mathematics and Physics
Charles University in Prague
Malostranské nám. 25
Prague, Czech Republic
④❥✳r❡♣✐❝❦②✱❜❛❥❡❧✉❦⑥❅❣♠❛✐❧✳❝♦♠
2 Institute of Computer Science,

Czech Academy of Sciences
Pod Vodárenskou věží 2
Prague, Czech Republic
♠❛rt✐♥❅❝s✳❝❛s✳❝③

Abstract: Objective function evaluation in continuous op- Still the whole population must be evaluated which
timization tasks is often the operation that dominates the might make running the algorithm for a sufficient num-
algorithm’s cost. In particular in the case of black-box ber of generations infeasible due to expensive evaluation.
functions, i.e. when no analytical description is available, To address this issue, techniques that involve a surrogate
and the function is evaluated empirically. In such a situ- regression model of the fitness have been proposed.
ation, utilizing information from a surrogate model of the Two major requirements of incorporating a surrogate
objective function is a well known technique to accelerate model (also called metamodel in the literature) into evo-
the search. In this paper, we review two traditional ap- lutionary strategies are model management and model se-
proaches to surrogate modelling based on Gaussian pro- lection.
cesses that we have newly reimplemented in MATLAB: Model management is the task to control the surrogate
Metamodel Assisted Evolution Strategy using probability model’s impact on algorithm’s convergence by using the
of improvement and Gaussian Process Optimization Pro- original fitness alongside its surrogate model in the course
cedure. In the research reported in this paper, both ap- of the search.
proaches have been for the first time evaluated on Black- In evolution control, a certain fraction of individuals or
Box Optimization Benchmarking framework (BBOB), a generations is controlled, i.e. evaluated with the fitness
comprehensive benchmark for continuous optimizers. function, while the remainder is evaluated with the surro-
gate model [8].
For example, Metamodel-Assisted Evolution Strategy
1 Introduction (MAES) uses a surrogate model to pre-select the most
promising individuals before they enter a selection proce-
An analytical definition of the objective function in dure of a standard ES [3].
real-world optimization tasks is sometimes hard to ob- In contrast to evolution control, surrogate approach [2]
tain. Therefore, neither information about the function’s directly optimizes the model output in an iterative proce-
smoothness nor its derivatives are available. Moreover, dure, thus avoiding the issue of determining the correct
evaluation of the function is usually expensive as it can fraction of controlled individuals. In each iteration, a fixed
only be done empirically, e.g. by measurement, testing, or number of candidate solutions are found by minimizing
running a computer simulation. Such functions are called the model with an evolution strategy. These solutions are
black-box. thereafter evaluated on the real fitness and the model is
One class of optimization algorithms that are suc- updated.
cessfully applied to black-box optimization are evolu- Regarding the model selection, Gaussian processes
tion strategies. An evolution strategy is an optimization (GPs) are a non-parameterized regression model that is ap-
method that works on a population of candidate solutions pealing for the task as it gives its prediction in terms of a
using evolutionary operators of selection, mutation and re- Gaussian distribution. The variance of this prediction can
combination [10] [11]. In particular, the Covariance Ma- be utilized as a confidence measure that promotes explo-
trix Adaptation Evolution Strategy (CMA-ES) [4] is con- ration of insufficiently modelled areas.
sidered to be the state-of-the-art continuous black-box op- This paper reviews two traditional algorithms intercon-
timizer. It samples each population according to a mul- necting Gaussian process-based surrogate models with the
tivariate normal distribution determined by a covariance CMA-ES: Metamodel-Assisted Evolution strategy with
matrix. A notable property of the algorithm is the adap- improved pre-selection criterion by Ulmer, Strechert and
tation of the covariance matrix along the path of past suc- Zell [13] and Gaussian Process Optimization Procedure
cessful search steps. (GPOP) by Büche, Schraudolph and Koumoutsakos [2].
164 J. Repický, L. Bajer, M. Holeňa

The former is a GP-based MAES with probability of For a set tN+1 that includes a new observation tN+1 =
improvement (POI) as a pre-selection criterion. f (xN+1 ), we obtain
The latter represents the surrogate approach – in each
iteration, a local GP model is built and four functions de- exp(− 1 tT C−1 tN+1 )
p(tN+1 | XN+1 ) = p 2 N+1 N+1 . (2)
signed to balance predicted value and variance are opti- (2π)N+1 det(CN+1 )
mized.
While both algorithms are GP-based, they differ both Using Bayesian rule for conditional probabilities, the pre-
in the model-management approach as well as in utilizing diction at a new data point has the density function
GP’s confidence. p(tN+1 | XN+1 )
The framework COCO/BBOB (Comparing Continuous p(tN+1 | XN+1 , tN ) = . (3)
p(tN | XN )
Optimizers / Black-Box Optimization Benchmarking) [7]
provides an experimental setup for benchmarking black- The covariance matrix CN+1 can be written with the use
box optimizers. In particular, its noiseless testbed [6] com- of the covariance matrix CN as
prises of 24 functions with properties that are to different
extents challenging for continuous optimizers. CN k
CN+1 = (4)
Both tested methods had been proposed before the kT κ
BBOB framework originated. In the research reported in
this paper, we evaluated both methods on the noiseless part where k = (C(xi , xN+1 ))N1 is a vector of covariances be-
of the BBOB framework. For that purpose a new imple- tween the new point and XN and κ = C(xN+1 , xN+1 ) is the
mentation1 was required as the original source codes were new point’s variance.
not available to us. Using (1) and (2) together with the fact that the inverse
In the following, we first briefly introduce Gaussian pro- C−1 −1
N+1 can also be expressed by the means of CN , (3) can
cesses as a suitable surrogate fitness model in Section 2. be simplified to a univariate Gaussian [2]
An exposition of the tested methods is given in Section 3
and experimental setup in Section 4. Section 5 presents !
1 (tN+1 − tˆN+1 )2
experimental results and finally Section 6 concludes the p(tN+1 | XN+1 , tN ) ∝ exp − (5)
paper. 2 σt2N+1

with mean and variance given by
2 Gaussian processes

Both algorithms under review feature the choice of Gaus- tˆN+1 = kT C−1
N tN ,
sian processes as a surrogate fitness function model. σt2N+1 = κ − kT C−1
N k.
GPs are a probabilistic model with several properties
that make it well suited for fitness function modelling: its A comprehensive exposition of Gaussian processes can
hyperparameters are comprehensible and limited in num- be found in [9].
ber and it provides a confidence measure given by standard The covariance function plays an important role as it ex-
deviation of predicted value at new data points. presses prior assumptions about the shape of the modelled
In the following, we define a GP using notation and function. The vector Θ of model’s hyperparameters is usu-
equations from Büche [2]. ally fitted with the maximum likelihood method.
Consider f : RD → R an unknown real-parameter func- One of the most commonly used covariance functions is
tion to be approximated. GP model is specified by a squared exponential covariance function:
N
set XN = xi | xi ∈ RD i=1 of N training data points with 1 (x p − xq )T (x p − xq )
known function values tN = {ti | f (xi ) = ti }Ni=1 . The data C (x p , xq ) = θ exp − (6)
2 r2
are assumed to be a sample of zero-mean multivariate
Gaussian distribution with joint probability density where the parameter θ scales the covariance between two
points and radius r is the characteristic length scale of the
exp(− 12 tTN C−1
N tN ) process.
p(tN | XN ) = p (1)
(2π)N det(CN ) When two points are close to each other compared to the
characteristic length scale r, the covariance is close to one
where the covariance matrix CN is defined by means of while it exponentially decays to zero with their distance
a covariance function C(xi , x j , Θ) i, j ∈ {1, . . . , N} with growing.
a fixed set of hyperparameters Θ. The squared exponential covariance function can be im-
proved with automatic relevance determination (ARD):
!
1 The sources are available at ❤tt♣s✿✴✴❣✐t❤✉❜✳❝♦♠✴r❡♣❥❛❦✴
1 D (x p,i − xq,i )2
C (x p , xq ) = θ exp − ∑ (7)
s✉rr♦❣❛t❡✲❝♠❛❡s✴ 2 i=1 ri2
Traditional Gaussian Process Surrogates in the BBOB Framework 165

where the radii ri scale the impact of two points distance Algorithm 1 GP Model-Assisted Evolution Strategy
on their correlation separately in each dimension. Input: f – fitness function
To address the need to balance the exploitation of a fit- µ – number of parents
ted model prediction and the exploration of regions where λ – population size
model’s confidence is low, the standard deviation of GP’s λPre – size of pre-selected population
prediction can be utilized. Ntr – size of training dataset
One possibility are merit functions proposed in [12] χM – the preselection criterion that depends on a GP
for the purpose of balancing exploration and exploitation model M, e.g. mean model prediction or POI
in engineering design optimization. Consider M to be a 1: Pop ← generate and evaluate λ initial samples
trained Gaussian process model. A merit function com- 2: M ← a GP model trained on points from Pop
bines the goals of finding a minimum predicted by the 3: while termination criteria not reached do
model M and improving the accuracy of M into a single 4: Offspring ← reproduce Pop into λPre new points
objective function: 5: Offspring ← mutate Offspring
6: Offspring ← select best λ points according to the
fM,α (x) = tˆ(x) − ασ (x) (8)
pre-selection criterion χM
where tˆ(x) is the mean of the prediction of the function 7: evaluate Offspring with f
value in x, σ (x) is the prediction standard deviation and 8: M ← update model M on Ntr points most recently
α ≥ 0 is a balancing parameter. evaluated with f
Another option is the probability of improvement (POI). 9: Pop ← select µ points best according to f
Let us assume Gaussian process prediction to be a ran- 10: end while
dom variable Y (x) with mean tˆ(x) and standard deviation
σ (x). For a chosen threshold T less than or equal to the
so-far best obtained fitness value fmin , the probability of 3.2 Gaussian Process Optimization Procedure
improvement in point x is defined as:

T − tˆ(x) Gaussian Process Optimization Procedure is due to Büche,
POIT (x) = p(Y (x) ≤ T ) = Φ (9)
σ (x) Schraudolph and P. Koumoutsakos [2]. A Gaussian pro-
where Φ is the cumulative distribution function of the dis- cess is trained on a subset of already evaluated data and
tribution N (0, 1). optimized by an ES instead of the original fitness. As
optimization of the surrogate is cheaper than optimization
of the original fitness, this can be repeated until reaching
3 Tested Methods some termination criteria.
3.1 GP Model Assisted Evolution Control CMA-ES is used as the evolution strategy, with the
number of parents µ set to 2 and the population size λ
A standard ES operates on λ offsprings generated from set to 10.
µ parents by evolutionary operators of recombination and The pseudocode is given in Figure 2. After generating
mutation. After the fitness of the offsprings is evaluated, a an initial population, a local GP model is built and utilized
population of µ best individuals is selected to reproduce to in an iterative procedure. Considering possibly low ac-
a new offspring in the next iteration. In (µ, λ ) ES, µ best curacy and computational infeasibility of global models,
individuals are selected from the λ offsprings, whereas in the training dataset is restricted to NC points closest to the
(µ + λ ) ES, µ best individuals are selected from the union current best known solution xbest (step 6) and NR most re-
of the offprings and their parents. cently evaluated points (step 7).
MAES [3] modifies the standard evolution strategy with
producing λPre > λ instead of λ individuals from µ par- If the GP model M has been successfully trained then
ents by the same operators of recombination and mutation the CMA-ES optimizes four merit functions fM,α (8) for
(steps 4 and 5 in the Algorithm 1). Given a fitted Gaus- each α ∈ {0, 1, 2, 4}. Areas that might be approximated
sian process M, individuals xi , i = 1, . . . , λPre are then pre- inaccurately are avoided by bounding the ES to the hyper-
selected according to a criterion χM defined for the model cube spanning the set of NC points selected from xbest ’s
M to create λ offsprings (step 6). neighborhood (steps 10 and 12). The points that are op-
The GP model is trained in every generation on a set of tima of the considered merit functions are evaluated by the
Ntr most recently evaluated individuals (step 8). original fitness and added to the dataset of known points.
In this paper we consider two pre-selection criteria in In the case that no new solution is found, a random per-
accordance with [13]: mean model prediction (MMP) (5) turbation (step 23) is evaluated and added to the dataset.
which selects λPre points with the best mean predicted fit- Unfortunately, authors don’t specify the parameter m that
ness tˆ(x) and POI (9). The authors of [13] prefer POI to occurs in 23. We set it to the value m = 1.
merit functions (8), as it does not depend on finding the The authors used the following covariance function in
appropriate value of the scale parameter α. their GP model:
166 J. Repický, L. Bajer, M. Holeňa

Benchmarking) [6] [7] is intended for systematic ex-
! perimentation with black-box optimizers. We use the
1 n (x p,i − xq,i )2 noiseless testbed of the BBOB framework, which is
C (x p , xq ) = θ1 exp − ∑
2 i=1 ri2 (10) comprised of 24 real-parameter benchmark functions
with different characteristics such as non-separability,
+ θ2 + δ pq θ3
multi-modality, ill-conditioning etc.
where ri , θ1 , θ2 , θ3 > 0 and δ pq is the Kronecker delta. Each function is defined on [−5, 5]D for all D ≥ 2. For
The function is the sum of the squared exponential covari- every function and every dimensionality, 15 trials of the
ance function with ARD (7), constant shift θ2 and a white optimizer are run. A different instance of the original
noise scaled with θ3 . function is used for each trial. We used dimensionalities
2, 3, 5, 10, thus 1440 trials in total were run for each set of
Algorithm 2 Gaussian Process Optimization Procedure parameters and each method.
Input: NC – number of training points selected according Since source codes were available for neither of the
to their distance to xbest tested methods, we implemented them in MATLAB.
NR – number of training points selected according to For Gaussian processes, we chose MATLAB’s default
most recent time of evaluation implementation, ❢✐tr❣♣, a part of Statistics and Machine
f – fitness function Learning Toolbox. The GP hyperparameters fitting was
µ – the number of parents for CMA-ES done with a quasi-newton optimizer, which is the default
λ – population size for CMA-ES in ❢✐tr❣♣.
1: {x1 , . . . , xNC /2 } ← a set of NC /2 points generated by
Parameters of the benchmarked algorithms were set as
CMA-ES follows:
2: yi ← f (xi ) i ∈ {1, . . . , NC /2}
3: A ← {(xi , yi ) | i ∈ {1, . . . , NC /2}} CMA-ES. A multi-start version with the population size
4: xbest ← arg minx∈{xi | i∈{1,...,N /2}} ( f (x)) doubled after each restart was used in the tests (MATLAB
C
5: while stopping criteria not reached do code v. 3.62.beta). Number of restarts was set to 4,
6: TC ← NC points from {u | ∃z (u, z) ∈ A} closest to while other parameters settings were left default: λ =
xbest 4 + ⌊3log10 D⌋, σstart = 38 , IncPopSize = ⌊λ /2⌋.
7: TR ← NR points most recently evaluated
8: M ← a GP model trained on TC ∪ TR MAES. We implemented GP Model Assisted Evolution
9: S ← 0/ Strategy on top of a framework developed for S-CMA-ES
10: d ← (di )D i=1 , di = maxx∈TC (xi ) − minx∈TC (xi )
algorithm, which employs a GP model in conjunction with
11: for all α ∈ {0, 1, 2, 4} do a generation evolution control [1]. The S-CMA-ES imple-
12: B ← x | xbest − d2 ≤ x ≤ xbest + d2 mentation allows to conveniently replace the population
13: x ← optimize fM,α within B by (µ, λ )-CMA-ES sampling step of the CMA-ES with a custom procedure.
14: if 6 ∃z (x, z) ∈ A then In this case, a population intended for pre-selection is sam-
15: y ← f (x) pled and processed as described in Subsection 3.1. The
16: S ← S ∪ {x, y} control is then handed over back to the CMA-ES.
17: end if The number of parents and population size were set to
18: end for correspond with the CMA-ES settings.
19: if S 6= 0/ then Two pre-selection criteria were tested: MMP and the
20: A ← A∪S POI with threshold equal to the so-far best sampled fitness
21: xbest ← arg minx∈{u | ∃z (u,z)∈A} ( f (x)) value fmin . In both cases λPre was set to 3λ . The training
22: else D set was comprised from 2λ most recently evaluated points.
23: xR ← xibest + 100 zi di
m zi ∼ N (0, 1) We used the same covariance function as in the GPOP
i=1 case, i.e. (10).
24: yR ← f (x)

25: A ← A ∪ (xR , yR )
GPOP. We adhered to [2] in usage of the proposed covari-
26: end if
ance function (10).
27: end while
The termination criteria were chosen as follows:
28: return xbest
• number of consecutive iterations with no new solu-
tion found is larger than 2 while the tolerance on the
two points euclidean distance for them to be consid-
4 Experimental Setup ered equal is 10−8

The framework COCO/BBOB (Comparing Con- • the overall change of fitness values during the last 10
tinuous Optimizers / Black-Box Optimization iterations is lower than 10−9
Traditional Gaussian Process Surrogates in the BBOB Framework 167

• the target fitness value is reached 6 Conclusion
Training set size parameters NC and NR were chosen in In this paper, we compared the CMA-ES and our im-
accordance with empirical results reported in [2], particu- plementation of two traditional methods which improve
larly NC = NR = 5 ∗ D. upon the CMA-ES with Gaussian process-based surrogate
models: MAES, which extends the CMA-ES with a pre-
Although the performance is measured in terms of the selection step, and GPOP, which iteratively optimizes the
number of fitness evaluations, other operations such as op- GP model.
timizing a surrogate model may also be costly in bench- The benchmarks on the BBOB framework did not
marking scenarios. If we consider functions in 10 D, a show any significant speedup of MAES compared to the
run of GPOP on one core of a computational grid took ap- CMA-ES. On the other hand, GPOP in many cases out-
proximately 27.8 real hours per function on average. For performs all the other methods, especially in early opti-
those reasons, we limited the maximum number of fitness mization stages. On some functions, though, it tends to
evaluations to 100 ∗ D for all tested methods. get trapped in local minima. This might be explained with
the fact that GPOP requires considerably fewer function
evaluations per iteration than other methods. However, the
5 Experimental Results model is built locally and might not be sufficient for ex-
ploration of the search space in later phases.
Results from experiments on all the 24 noiseless BBOB Our MAES implementation relies on a modification of
benchmark functions are presented in Figures 1–3. the sampling step in CMA-ES, thereby changing the dis-
The expected running time (ERT) depicted in Figure 1 tribution of the sampled population. This might mislead
depends on a given target value, ft = fopt + ∆ f , i.e. the CMA-ES a bit and requires further research, in particular
true optimum fopt of the respective benchmark function considering the proposal in [5].
raised by a small value ∆ f . The ERT is computed over Another area for further research is the exploration of
all relevant trials as the number of the original function various confidence measures across tested methods, espe-
evaluations (FE) executed during each trial until the tar- cially in connection with GPOP.
get function value ft reached, summed over all trials and
divided by the number of successful trials, i.e. trials that
actually reached ft : [7] 7 Acknowledgments
#FE( fbest ≥ ft ) Access to computing and storage facilities owned by par-
ERT( ft ) = (11)
#succ ties and projects contributing to the National Grid In-
frastructure MetaCentrum, provided under the programme
In the graphs for functions 1, 2, 5, 8, 9, 12, 14, GPOP "Projects of Large Research, Development, and Innova-
achieved significantly better results compared to all other tions Infrastructures" (CESNET LM2015042), is greatly
algorithms for some dimensions. In contrast, the differ- appreciated.
ences between MAES and CMA-ES are rather small, re- The research performed by Lukáš Bajer has been sup-
gardless of the pre-selection criterion. ported by the Czech Health Research Council project
The graphs in Figure 2 summarize the performance NV15-33250A.
over subgroups of the benchmark functions for the high- The research performed by Jakub Repický was sup-
est tested dimension 10. The graphs show the proportion ported by SVV project number 260 224.
of algorithm runs that reached a target value ft ∈ 10[−1..2]
(see the figure caption for further details).
The GPOP speedup is most eminent on the group of References
separable functions (functions 1–5), that is functions, op-
timization of which can be reduced to D one-dimensional [1] L. Bajer, Z. Pitra, and M. Holeňa. Benchmarking Gaus-
problems [6]. On the other hand, GPOP has the worst re- sian processes and random forests surrogate models on the
BBOB noiseless testbed. In Proceedings of the Compan-
sults of all methods on multi-modal functions (functions
ion Publication of the 2015 Annual Conference on Genetic
15–19).
and Evolutionary Computation, GECCO Companion ’15,
Similar results may be observed on Figure 3 that for 2015.
each function shows the dependence of the relative best fit-
[2] D. Büche, N. N. Schraudolph, and P. Koumoutsakos. Ac-
ness value on the number of evaluations in 10 D. As can be celerating evolutionary algorithms with Gaussian process
seen on the graphs for functions 13 and 24, GPOP in some fitness function models. IEEE Transactions on Systems,
cases outperforms all other algorithms in early stages, but Man and Cybernetics, 35(2):183–194, 2005.
then gets trapped in a local minimum. [3] M. Emmerich, A. Giotis, M. Özdemir, T. Bäck, and
On the graph for functions 21 on Figure 3, MAES with K. Giannakoglou. Metamodel-assisted evolution strategies.
MMP as the pre-selection criterion visibly outperforms all In In Parallel Problem Solving from Nature VII, pages 361–
other algorithms. 370. Springer, 2002.
168 J. Repický, L. Bajer, M. Holeňa

3 1 Sphere 2 Ellipsoid separable 3 Rastrigin separable 4 Skew Rastrigin-Bueche separ
log10 of (ERT / D)

1 CMAES
MAES-MMP
MAES-POI
GPOP
0 target Df: 0.1

3 5 Linear slope 6 Attractive sector 7 Step-ellipsoid 8 Rosenbrock original
log10 of (ERT / D)

4 9 Rosenbrock rotated 10 Ellipsoid 11 Discus 12 Bent cigar
log10 of (ERT / D)

4 13 Sharp ridge 14 Sum of different powers 15 Rastrigin 16 Weierstrass
log10 of (ERT / D)

4 17 Schaffer F7, condition 10 18 Schaffer F7, condition 1000 19 Griewank-Rosenbrock F8F2 20 Schwefel x*sin(x)
log10 of (ERT / D)

4 21 Gallagher 101 peaks 22 Gallagher 21 peaks 23 Katsuuras 24 Lunacek bi-Rastrigin
log10 of (ERT / D)

2
CMAES
1 MAES-MMP
MAES-POI
GPOP
0 target Df: 0.1
2 3 5 10 2 3 5 10 2 3 5 10 2 3 5 10
dimension dimension dimension dimension

Figure 1: Expected running time (ERT as log10 of the number of f -evaluations) for target function value 0.1 divided
by dimension, versus dimension. Different symbols correspond to different algorithms given in the legend of f1 and f24 .
Values are plotted up to the highest dimensionality with at least one successful trial. Light symbols give the maximum
number of function evaluations from the longest trial divided by dimension. Black stars indicate a statistically better result
compared to all other algorithms with p < 0.05 using Hommel correction.
Traditional Gaussian Process Surrogates in the BBOB Framework 169

separable functions moderate functions
Proportion of function+target pairs

1.0 GPOP MAES-MMP
bbob - f1-f5, 10-D bbob - f6-f9, 10-D

0.8

CMAES MAES-POI
0.6

0.4
MAES-POI CMAES

0.2

MAES-MMP GPOP
0.0

ill-conditioned functions multi-modal functions
Proportion of function+target pairs

1.0 GPOP MAES-POI
bbob - f10-f14, 10-D bbob - f15-f19, 10-D

0.8

MAES-MMP MAES-MMP
0.6

0.4
CMAES CMAES

0.2

MAES-POI GPOP
0.0

weakly structured multi-modal functions all functions
Proportion of function+target pairs

1.0 GPOP GPOP
bbob - f20-f24, 10-D bbob - f1-f24, 10-D

0.8

MAES-POI MAES-MMP
0.6

0.4
MAES-MMP MAES-POI

0.2

CMAES CMAES
0.0
-1 0 1 2 3 -1 0 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)

Figure 2: Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by
dimension (FE/D) for 61 targets with target precision in 10[−1..2] for different subgroups of BBOB benchmark functions
in 10 D.
170 J. Repický, L. Bajer, M. Holeňa

1 Sphere 2 Ellipsoid separable 3 Rastrigin separable 4 Skew Rastrigin-Bueche separ
2 10 3 3
CMA-ES
0 MAES-MMP
MAES-POI 2.5 2.5
log10 of Df

GPOP 5
-2
2 2
-4
0
1.5 1.5
-6

-8 -5 1 1

5 Linear slope 6 Attractive sector 7 Step-ellipsoid 8 Rosenbrock original
5 6 3 6
log10 of Df

0 4 2 4

-5 2 1 2

-10 0 0 0

9 Rosenbrock rotated 10 Ellipsoid 11 Discus 12 Bent cigar
5 7 5 10

4 8
6 4
log10 of Df

3 6
5 3
2 4
4 2
1 2

0 3 1 0

13 Sharp ridge 14 Sum of different powers 15 Rastrigin 16 Weierstrass
4 2 3 2

1
3 1.5
log10 of Df

2.5
0
2 1
-1
2
1 0.5
-2

0 -3 1.5 0

17 Schaffer F7, condition 10 18 Schaffer F7, condition 1000 19 Griewank-Rosenbrock F8F2 20 Schwefel x*sin(x)
2 2 1.5 5

1.5 4
1
log10 of Df

1 3
0 1
0.5 2
-1
0 1

-2 -0.5 0.5 0

21 Gallagher 101 peaks 22 Gallagher 21 peaks 23 Katsuuras 24 Lunacek bi-Rastrigin
2 2 1 2.4
CMA-ES
MAES-MMP
0.8 2.2 MAES-POI
log10 of Df

1.5 1.5 GPOP
0.6 2
1 1
0.4 1.8

0.5 0.5 0.2 1.6
0 50 100 0 50 100 0 50 100 0 50 100
number of evaluations / D number of evaluations / D number of evaluations / D number of evaluations / D

Figure 3: Relative fitness of best individual as log10 value versus number of evaluations divided by dimension for all
functions in 10 D. Best fitness value at each number of evaluations is computed as the median of all trials.
Traditional Gaussian Process Surrogates in the BBOB Framework 171

[4] N. Hansen. The CMA evolution strategy: a compar-
ing review. In I. Inza J. A. Lozano, P. Larrañaga and
E. Bengoetxea, editors, Towards a new evolutionary com-
putation. Advances on estimation of distribution algo-
rithms, pages 75–102. Springer, 2006.
[5] N. Hansen. Injecting external solutions into CMA-ES.
CoRR, abs/1110.4181, 2011.
[6] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameter
black-box optimization benchmarking 2009: Noiseless
functions definitions. technical report rr-6829. Technical
report, INRIA, 2009. Updated February 2010.
[7] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameter
black-box optimization benchmarking 2012: Experimental
setup. technical report. Technical report, INRIA, 2012.
[8] Y. Jin and B. Sendhoff. Fitness approximation in evolution-
ary computation-a survey. In GECCO 2002 Proceedings of
Genetic and Evolutionary Computation Conference, pages
1105–1111, 2002.
[9] C. E. Rassmusen and C. K. I. Williams. Gaussian Pro-
cesses for Machine Learning. Adaptive computation and
machine learning series. MIT Press, 2006.
[10] I. Rechenberg. Evolutionsstrategie – Optimierung technis-
cher Systeme nach Prinzipien der biologischen Evolution.
Frommann-Holzboog, Stuttgart, 1973.
[11] H.-P. Schwefel. Numerische Optimierung von Computer-
Modellen. Birkhäuser, Basel, 1977.
[12] V. Torczon and M. W. Trosset. Using approximations to ac-
celerate engineering design optimization. In Proceedings of
the 7th AIAA/USAF/NASA/ISSMO Multidisciplinary Anal-
ysis & Optimization Symposium (Held at Saint Louis, Mis-
souri), paper 98-4800, pages 507–512, 1998.
[13] H. Ulmer, F. Streichert, and A. Zell. Evolution strategies
assisted by Gaussian processes with improved pre-selection
criterion. IEEE Congress on Evolutionary Computation,
pages 692–699, 2003.