=Paper=
{{Paper
|id=None
|storemode=property
|title=Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES
|pdfUrl=https://ceur-ws.org/Vol-1422/186.pdf
|volume=Vol-1422
|dblpUrl=https://dblp.org/rec/conf/itat/PitraBH15
}}
==Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES==
J. Yaghob (Ed.): ITAT 2015 pp. 186–193
Charles University in Prague, Prague, 2015
Comparing SVM, Gaussian Process and Random Forest
Surrogate Models for the CMA-ES
Zbyněk Pitra1,2 , Lukáš Bajer3,4 , and Martin Holeňa3
1
National Institute of Mental Health
Topolová 748, 250 67 Klecany, Czech Republic
z.pitra@gmail.com
2
Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague
Břehová 7, 115 19 Prague 1, Czech Republic
3
Institute of Computer Science, Academy of Sciences of the Czech Republic
Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic
{bajer,holena}@cs.cas.cz
4
Faculty of Mathematics and Physics, Charles University in Prague
Malostranské nám. 25, 118 00 Prague 1, Czech Republic
Abstract: In practical optimization tasks, it is more and population size. This issue resulted in development of sev-
more frequent that the objective function is black-box eral restart strategies [12], such as IPOP-CMA-ES [1] and
which means that it cannot be described mathematically. BIPOP-CMA-ES [6] performing restarts with population
Such functions can be evaluated only empirically, usually size successively increased, or aCMA-ES [9] using also
through some costly or time-consuming measurement, nu- unsuccessful individuals for covariance matrix adaptation.
merical simulation or experimental testing. Therefore, an Furthermore, the CMA-ES often requires more fitness
important direction of research is the approximation of function evaluations to find the optimum than many real-
these objective functions with a suitable regression model, world experiments can offer. In order to decrease the num-
also called surrogate model of the objective functions. ber of evaluations in evolutionary algorithms, it is conve-
This paper evaluates two different approaches to the con- nient to periodically train a surrogate model of the fitness
tinuous black-box optimization which both integrates sur- function and use it for evaluation of new points instead
rogate models with the state-of-the-art optimizer CMA- of the original function. The second option is to use the
ES. The first Ranking SVM surrogate model estimates the model for selection of the most promising points to be
ordering of the sampled points as the CMA-ES utilizes evaluated by the original fitness.
only the ranking of the fitness values. However, we show Loshchilov’s surrogate-model-based algorithm
s∗
that continuous Gaussian processes model provides in the ACM-ES [13] utilizes the former approach: it esti-
early states of the optimization comparable results. mates the ordering of the fitness values required by the
CMA-ES using Ranking Support Vector Machines (SVM)
as an ordinal regression model. Moreover, it has been
1 Introduction shown [13] that model parameters (hyperparametres)
used to construct Ranking SVM model can be optimized
Optimization of an expensive objective or fitness function during the search by the pure CMA-ES algorithm.
plays an important role in many engineering and research Later proposed s∗ACM-ES extensions, referred to as
s∗
tasks. For such functions, it is sometimes difficult to find ACM-ES-k [15] and BIPOP-s∗ACM-ES-k [14], use
an exact analytical formula, or to obtain any derivatives or a more intensive exploitation of the surrogate model by
information about smoothness. Instead, values for a given increasing population size in generations evaluated by the
input are possible to be obtained only through expen- model.
sive and time-consuming measurements and experiments. More recently, a similar algorithm based on regres-
Those functions are called black-box, and because of the sion surrogate model called S-CMA-ES [3] has been pre-
evaluation costs, the primary criterion for assessment of sented. As opposed to the former algorithm, S-CMA-
the black-box optimizers is the number of fitness function ES is performing continuous regression by Gaussian pro-
evaluations necessary to achieve the optimal value. cesses (GP) [17] and random forests (RF) [4].
The Covariance Matrix Adaptation Evolution Strategy This paper compares the two mentioned surrogate
(CMA-ES) [5] is considered to be the state-of-the-art of CMA-ES algorithms, s∗ACM-ES-k and S-CMA-ES, and
the black-box continuous optimization. The important the original CMA-ES itself. We benchmark these al-
property of the CMA-ES is that it advances through the gorithms on the BBOB/COCO testing set [7, 8] not
search space only according to the ordering of the function only in their one population IPOP-CMA-ES version,
values in current population. Hence, the search of the algo- but also in combination with the two-population-size
rithm is rather local which predisposes it to premature con- BIPOP-CMA-ES.
vergence in local optima if not used with sufficiently large
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES 187
s∗
The remainder of the paper is organized as follows. 2.3 ACM-ES-k
The next chapter briefly describes tested algorithms: the
CMA-ES, the BIPOP-CMA-ES, the s∗ACM-ES-k, and the Loshchilov’s version of the CMA-ES using the ordinal re-
S-CMA-ES. Section 3 contains experimental setup and gression by Ranking SVM as surrogate model in specific
results, and Section 4 concludes the paper and suggests generations instead of the original function is referred to as
s∗
further research directions. ACM-ES [13], and its extension using a more intensive
exploitation is called s∗ACM-ES-k [15].
Before the main loop starts, the s∗ACM-ES-k evalu-
ates gstart generations by the original function, then it
2 Algorithms repeats the following steps: First, the surrogate model
is constructed using hyperparameters θ , and the original
2.1 The CMA-ES function-evaluated points from previous generations. Sec-
ond, the surrogate model is optimized by the CMA-ES
In each generation g, the CMA-ES [5] generates λ new for gm generations with population size λ = kλ λdefault and
candidate solutions xk ∈ RD , where k = 1,...,λ , from the number of best points µ = kµ µdefault , where kλ ,kµ ≥ 1.
a multivariate normal distribution N(m(g) ,σ 2 (g) C(g) ), Third, the following generation is evaluated by the origi-
where m(g) is the mean interpretable as the current best nal function using λ = λdefault and µ = µdefault . To avoid a
estimate of the optimum, σ 2 (g) the step size, representing potential divergence when gm fluctuate between 0 and 1,
the overall standard deviation, and C(g) the D × D covari- kλ > 1 is used only in the case of gm ≥ gm λ , where gm λ de-
ance matrix. The algorithm selects the µ points with the notes the number of generations suitable for effective ex-
lowest function value from λ generated candidates to ad- ploitation using the model. Then the model error is calcu-
just distribution parameters for the next generation. lated according to the comparison of ranking between the
original and model evaluation of the last generation. After
The CMA-ES uses restart strategies to deal with mul-
that, the gm is adjusted in accordance with the model error.
timodal fitness landscapes and to avoid being trapped
As the last step, the s∗ACM-ES-k searches a hyperparam-
in local optima. A multi-start strategy where the pop-
eter space by one generation of the CMA-ES minimizing
ulation size is doubled in each restart is referred to as
the model error to find the most suitable hyperparameter
IPOP-CMA-ES [1].
settings θnew for the next model-evaluated generations.
The s∗ACM-ES-k version using BIPOP-CMA-ES pro-
posed in [14] is called BIPOP-s∗ACM-ES-k.
2.2 BIPOP-CMA-ES
The BIPOP-CMA-ES [6], unlike IPOP-CMA-ES, consid- 2.4 S-CMA-ES
ers two different restart strategies. In the first one, cor-
As opposed to the former algorithms, a different ap-
responding to the IPOP-CMA-ES, the population size is
proach to surrogate model usage is incorporated in the
doubled in each restart irestart using a constant initial step-
= σdefault
0 0 S-CMA-ES [3]. The algorithm is a modification of
size σlarge :
CMA-ES where the original evaluating and sampling
phases are substituted by the Algorithm 1 at the beginning
λlarge = 2irestart λdefault . (1) of each CMA-ES generation.
In order to avoid the false convergence of the algorithm
In the second one, the smaller population size λsmall is in the BBOB benchmarking toolbox, the model-predicted
computed as values are adapted to never be lower then the so far mini-
mum of the original function (see the step 17 in the pseu-
⎢ U[0,1]2 ⎥
⎢ ⎥
⎢ ⎥
1 λlarge docode).
λsmall = ⎢λdefault ( ) ⎥,
⎢ ⎥
(2) The main difference between the S-CMA-ES and the
⎢ 2 λ ⎥
⎣ ⎦
default s∗
ACM-ES-k is in the manner how the CMA-ES is uti-
lized. Considering S-CMA-ES, the model prediction
where U[0,1] denotes the uniform distribution in [0,1]. or training is performed within each generation of the
The initial step-size is also randomly drawn as CMA-ES. On the contrary in the s∗ACM-ES-k, individual
generations of the CMA-ES are started to optimize either
0
σsmall = σdefault
0
× 10−2U[0,1] . (3) original fitness, surrogate fitness, or model itself.
The BIPOP-CMA-ES performs the first run using
the default population size λdefault and the initial step-
0
size σdefault . In the following restarts, the strategy with
less function evaluations summed over all algorithm runs
is selected.
188 Z. Pitra, L. Bajer, M. Holeňa
3 Experimental Evaluation defined for any dimension D ≥ 2; the dimensions used for
our tests are 2, 5, 10, and 20. The set of functions com-
The core of this paper lies in a systematic comparison of prises, among others, well-known continuous optimiza-
the two mentioned approaches to using surrogate mod- tion benchmarks like ellipsoid, Rosenbrock’s, Rastrigin’s,
els with the CMA-ES and the original CMA-ES algo- Schweffel’s or Weierstrass’ function.
rithm itself. The first group of surrogate-based algo- The framework calls the optimizers on 15 different in-
rithms is formed by the S-CMA-ES algorithms using stances for each function and dimension, meaning that
Gaussian processes and random forests models, and the 1440 optimization runs were called for each of the eight
other group is formed by the s∗ACM-ES algorithm. These considered algorithms. The graphs at the end of the pa-
four algorithms (CMA-ES, GP-CMA-ES, RF-CMA-ES, per show detailed results in a per-function and per-group-
s∗ of-function manner. The following paragraphs summarize
ACM-ES) are tested in their IPOP version (based on
IPOP-CMA-ES) [1] and in the bi-population restart strat- the parameters of the algorithms.
egy version (based on BIPOP-CMA-ES and its deriva-
tives) [6]. The CMA-ES. The original CMA-ES was used in its
IPOP-CMA-ES version (Matlab code v. 3.61) with num-
ber of restarts = 4, IncPopSize = 2, σstart = 83 , λ = 4 +
3.1 Experimental Setup ⌊3logD⌋. The remainder settings were left default.
The experimental evaluation is performed through the s∗
ACM-ES. We have used Loshchilov’s GECCO 2013
noiseless part of the COCO/BBOB framework (COm- Matlab code xacmes.m [14] in its s∗ACM-ES version, set-
paring Continuous Optimizers / Black-Box Optimization ting the parameters CMAactive = 1, newRestartRules = 0
Benchmarking) [7, 8]. It is a collection of 24 benchmark and withSurr = 1, modelType = 1, withModelEnsembles =
functions with different degree of smoothness, uni-/multi- 0, withModelOptimization = 1, hyper_lambda = 20, λMult
modality, separability, conditionality etc. Each function is = 1, µMult = 1 and ΛminIter = 4.
Algorithm 1 Surrogate CMA-ES Algorithm [3] S-CMA-ES: GP5-CMA-ES and RF5-CMA-ES. The num-
ber after the GP/RF in the names of the algorithms denotes
Input: g (generation), gm (number of model generations),
the number of model-evaluated generations gm , which are
σ , λ , m, C (CMA-ES internal variables),
evaluated by the model in row. All considered S-CMA-ES
r (maximal distance between training points and m),
versions use the distance r = 8 (see algorithm 1). For the
nREQ (minimal number of points for model training), ν=5/2
nMAX (maximal number of points for model training), GP model, KMatérn covariance function with starting val-
A (archive), fM (model), f (original fitness function) ues (σn2 ,l,σ 2f ) = log(0.01,2,0.5) has been used (see [3]
1: xk ∼ N(m,σ 2 C) k = 1,...,λ {CMA-ES sampling} for the details). We have tested RF comprising 100 regres-
2: if g is original-evaluated then sion trees, each containing at least two training points in
3: yk ← f (xk ) k = 1,...,λ {fitness evaluation} each leaf. The CMA-ES parameters (IPOP version, σstart ,
4: A = A ∪ {(xk ,yk )}λk=1 λ , IncPopSize etc.) were used the same as in the pure
5: (Xtr ,ytr ) ← {(x,y)∈A∣(m−x)⊺ σ C−1/2 (m−x) ≤ r} CMA-ES experiments. All S-CMA-ES parameter values
6: if ∣Xtr ∣ ≥ nREQ then were chosen according to preliminary testing on several
7: (Xtr ,ytr ) ← choose nMAX points if ∣Xtr ∣ > nMAX functions from the COCO/BBOB framework.
8: {transformation to the eigenvector basis:}
Xtr ← {(σ C−1/2 )⊺ xtr for each xtr ∈ Xtr } BIPOP version of the algorithms. The bi-population ver-
9: fM ← trainModel(Xtr ,ytr ) sions BIPOP-CMA-ES and BIPOP-s∗ACM-ES use the
10: mark (g + 1) as model-evaluated same Loshchilov’s Matlab code xacmes.m with the pa-
11: else rameter BIPOP = 1. The BIPOP-GP5-CMA-ES and
12: mark (g + 1) as original-evaluated BIPOP-RF5-CMA-ES algorithms are constructed in the
13: end if same manner as the S-CMA-ES was transformed from the
14: else
CMA-ES – by integration of the Algorithm 1 into every
15: xk ← (σ C−1/2 )⊺ xk k = 1,...,λ generation of the BIPOP-s∗ACM-ES.
16: yk ← fM (xk ) k = 1,...,λ {model evaluation}
17: {shift yk values if (minyk ) < best y from A} 3.2 Results
yk = yk + max{0, minA y − minyk } k = 1,...,λ
18: if gm model generations passed then The performance of the algorithms is compared in the
19: mark (g + 1) as original-evaluated graphs placed in Figures 1–3. The graphs in Figure 1 de-
end if pict the expected running time (ERT), which depends on
a given target function value ft = fopt + ∆ f – the true opti-
20:
21: end if
Output: fM , A, (yk )λk=1 mum fopt of the respective benchmark function raised by
a small value ∆ f . The ERT is computed over all relevant
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES 189
4 1 Sphere 4 2 Ellipsoid separable 4 3 Rastrigin separable 4 4 Skew Rastrigin-Bueche separ
3 3 3 3
2 2 2 2
1 1 1 1
0 target RL/dim: 10 0 target RL/dim: 10 0 target RL/dim: 10 0 target RL/dim: 10
2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40
3 5 Linear slope 4 6 Attractive sector 4 7 Step-ellipsoid 4 8 Rosenbrock original
3 3 3
2
2 2 2
1
1 1 1
0 0 target RL/dim: 10 0 target RL/dim: 10 0 target RL/dim: 10
target RL/dim: 10
2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40
4 9 Rosenbrock rotated 4 10 Ellipsoid 4 11 Discus 4 12 Bent cigar
3 3 3 3
2 2 2 2
1 1 1 1
0 target RL/dim: 10 0 target RL/dim: 10 0 target RL/dim: 10 0 target RL/dim: 10
2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40
4 13 Sharp ridge 4 14 Sum of different powers 3 15 Rastrigin 3 16 Weierstrass
3 3
2 2
2 2
1 1
1 1
0 target RL/dim: 10 0 target RL/dim: 10 0 0
target RL/dim: 10 target RL/dim: 10
2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40
3 17 Schaffer F7, condition 10 3 18 Schaffer F7, condition 1000 5 19 Griewank-Rosenbrock F8F2 4 20 Schwefel x*sin(x)
4
3
2 2
3
2
2
1 1
1
1
0 0 0 target RL/dim: 10 0 target RL/dim: 10
target RL/dim: 10 target RL/dim: 10
2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40
3 21 Gallagher 101 peaks 3 22 Gallagher 21 peaks 4 23 Katsuuras 4 24 Lunacek bi-Rastrigin
3 3
2 2
2 2
1 1
1 1
0 0 0 target RL/dim: 10 0 target RL/dim: 10
target RL/dim: 10 target RL/dim: 10
2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40 2 3 5 10 20 40
Figure 1: Expected running time (ERT in number of f -evaluations as log10 value) divided by dimension versus dimension.
The target function value is chosen such that the bestGECCO2009 artificial algorithm just failed to achieve an ERT of
10 × DIM. Different symbols correspond to different algorithms given in the legend of f1 and f24 . Light symbols give the
maximum number of function evaluations from the longest trial divided by dimension. Black stars indicate a statistically
better result compared to all other algorithms with p < 0.01 and Bonferroni correction number of dimensions (six). Legend:
○:BIPOP-CMAES, ▽:BIPOP-GP5, ⋆:BIPOP-RF5, ◻:BIPOP-saACMES, △:CMA-ES, ♢:GP5-CMAES, 9:RF5-CMAES,
D:saACMES
190 Z. Pitra, L. Bajer, M. Holeňa
separable fcts moderate fcts
1.0 f1-5,5-D best 2009
best 2009 1.0 f6-9,5-D best 2009
best 2009
Proportion of function+target pairs
Proportion of function+target pairs
BIPOP-s
BIPOP-saACMES BIPOP-s
BIPOP-saACMES
0.8 0.8
saACMES
saACMES saACMES
saACMES
GP5-CMA
GP5-CMAES BIPOP-C
BIPOP-CMAES
0.6 0.6
BIPOP-C
BIPOP-CMAES CMA-ES
CMA-ES
0.4 CMA-ES
CMA-ES 0.4 GP5-CMA
GP5-CMAES
BIPOP-G
BIPOP-GP5 BIPOP-G
BIPOP-GP5
0.2 RF5-CMA
0.2 RF5-CMA
RF5-CMAES RF5-CMAES
0.00 BIPOP-RF5
BIPOP-R 0.00 BIPOP-RF5
BIPOP-R
1 2 3 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)
ill-conditioned fcts multi-modal fcts
1.0 f10-14,5-D best 2009
best 2009 1.0 f15-19,5-D best 2009
best 2009
Proportion of function+target pairs
Proportion of function+target pairs
saACMES
saACMES saACMES
saACMES
0.8 0.8
BIPOP-s
BIPOP-saACMES BIPOP-s
BIPOP-saACMES
BIPOP-C
BIPOP-CMAES CMA-ES
CMA-ES
0.6 0.6
BIPOP-G
BIPOP-GP5 BIPOP-G
BIPOP-GP5
0.4 CMA-ES
CMA-ES 0.4 BIPOP-C
BIPOP-CMAES
GP5-CMA
GP5-CMAES GP5-CMA
GP5-CMAES
0.2 RF5-CMA
0.2 RF5-CMA
RF5-CMAES RF5-CMAES
0.00 BIPOP-RF5
BIPOP-R 0.00 BIPOP-RF5
BIPOP-R
1 2 3 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)
weakly structured multi-modal fcts all functions
1.0 f20-24,5-D best 2009
best 2009 1.0 f1-24,5-D best 2009
best 2009
Proportion of function+target pairs
Proportion of function+target pairs
BIPOP-C
BIPOP-CMAES saACMES
saACMES
0.8 0.8
BIPOP-s
BIPOP-saACMES BIPOP-s
BIPOP-saACMES
saACMES
saACMES BIPOP-C
BIPOP-CMAES
0.6 0.6
GP5-CMA
GP5-CMAES CMA-ES
CMA-ES
0.4 CMA-ES
CMA-ES 0.4 GP5-CMA
GP5-CMAES
BIPOP-G
BIPOP-GP5 BIPOP-G
BIPOP-GP5
0.2 RF5-CMA
0.2 RF5-CMA
RF5-CMAES RF5-CMAES
0.00 BIPOP-RF5
BIPOP-R 0.00 BIPOP-RF5
BIPOP-R
1 2 3 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)
Figure 2: Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by
dimension (FEvals/DIM) for all functions and subgroups in 5-D. The targets are chosen from 10[−8..2] such that the
bestGECCO2009 artificial algorithm just not reached them within a given budget of k × DIM, with k ∈ {0.5,1.2,3,10,50}.
The “best 2009” line corresponds to the best ERT observed during BBOB 2009 for each selected target.
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES 191
separable fcts moderate fcts
1.0 f1-5,20-D best 2009
best 2009 1.0 f6-9,20-D saACMES
saACMES
Proportion of function+target pairs
Proportion of function+target pairs
BIPOP-s
BIPOP-saACMES BIPOP-s
BIPOP-saACMES
0.8 0.8
saACMES
saACMES best 2009
best 2009
BIPOP-C
BIPOP-CMAES BIPOP-C
BIPOP-CMAES
0.6 0.6
CMA-ES
CMA-ES CMA-ES
CMA-ES
0.4 BIPOP-G
BIPOP-GP5 0.4 GP5-CMA
GP5-CMAES
GP5-CMA
GP5-CMAES BIPOP-G
BIPOP-GP5
0.2 RF5-CMA
0.2 RF5-CMA
RF5-CMAES RF5-CMAES
0.00 BIPOP-RF5
BIPOP-R 0.00 BIPOP-RF5
BIPOP-R
1 2 3 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)
ill-conditioned fcts multi-modal fcts
1.0 f10-14,20-D saACMES
saACMES 1.0 f15-19,20-D best 2009
best 2009
Proportion of function+target pairs
Proportion of function+target pairs
best 2009
best 2009 saACMES
saACMES
0.8 0.8
BIPOP-s
BIPOP-saACMES BIPOP-s
BIPOP-saACMES
BIPOP-C
BIPOP-CMAES BIPOP-C
BIPOP-CMAES
0.6 0.6
CMA-ES
CMA-ES GP5-CMA
GP5-CMAES
0.4 GP5-CMA
GP5-CMAES 0.4 BIPOP-G
BIPOP-GP5
BIPOP-G
BIPOP-GP5 RF5-CMA
RF5-CMAES
0.2 BIPOP-R
0.2 CMA-ES
BIPOP-RF5 CMA-ES
0.00 RF5-CMAES
RF5-CMA 0.00 BIPOP-RF5
BIPOP-R
1 2 3 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)
weakly structured multi-modal fcts all functions
1.0 f20-24,20-D best 2009
best 2009 1.0 f1-24,20-D best 2009
best 2009
Proportion of function+target pairs
Proportion of function+target pairs
GP5-CMA
GP5-CMAES BIPOP-s
BIPOP-saACMES
0.8 0.8
BIPOP-C
BIPOP-CMAES saACMES
saACMES
CMA-ES
CMA-ES BIPOP-C
BIPOP-CMAES
0.6 0.6
BIPOP-R
BIPOP-RF5 CMA-ES
CMA-ES
0.4 saACMES
saACMES 0.4 GP5-CMA
GP5-CMAES
BIPOP-s
BIPOP-saACMES BIPOP-G
BIPOP-GP5
0.2 BIPOP-G
0.2 BIPOP-R
BIPOP-GP5 BIPOP-RF5
0.00 RF5-CMAES
RF5-CMA 0.00 RF5-CMAES
RF5-CMA
1 2 3 1 2 3
log10 of (# f-evals / dimension) log10 of (# f-evals / dimension)
Figure 3: Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by
dimension (FEvals/DIM) for all functions and subgroups in 20-D. The targets are chosen from 10[−8..2] such that the
bestGECCO2009 artificial algorithm just not reached them within a given budget of k × DIM, with k ∈ {0.5,1.2,3,10,50}.
The “best 2009” line corresponds to the best ERT observed during BBOB 2009 for each selected target.
192 Z. Pitra, L. Bajer, M. Holeňa
trials as the number of the original function evaluations example, hybrid surrogate models combining both kinds
(FEs) executed during each trial until the best function of regression will be attempted.
value reached ft , summed over all trials and divided by
the number of trials that actually reached ft [7].
As we can see in Figure 1, the 24 functions can be
Acknowledgements
roughly divided into two groups according to the algo-
This work was supported by the Czech Science Founda-
rithm which performed the best (at least in 10D and 20D).
tion (GAČR) grant P103/13-17187S, by the Grant Agency
The first group of functions where the CMA-ES performed
of the Czech Technical University in Prague with its grant
best consists of functions 1, 3, 4, 6, and 20 while on func-
No. SGS14/205/OHK4/3T/14, and by the project “Na-
tions 2, 5, 7, 10, 11, 13–16, 18, 21, 23, and 24, GP5-CMA-
tional Institute of Mental Health (NIMH-CZ)”, grant num-
ES is usually better. The usage of the BIPOP versions
ber CZ.1.05/2.1.00/03.0078 (and the European Regional
generally leads to no improvement or even to performance
Development Fund.). Further, access to computing and
decrease.
storage facilities owned by parties and projects contribut-
The graphs in Figures 2 and 3 summarize the perfor- ing to the National Grid Infrastructure MetaCentrum, pro-
mance over subgroups of the benchmark functions and vided under the programme “Projects of Large Infras-
show the proportion of algorithm runs that reached the tructure for Research, Development, and Innovations”
target value ft ∈ 10[−8..2] indeed ( ft was actually differ- (LM2010005), is greatly appreciated.
ent for each respective function, see the figures captions).
Roughly speaking, the higher the colored line, the better
the performance of the algorithm is for the number of the References
original evaluations given on the horizontal axis.
Thus we can see that our GP5-CMA-ES usually out- [1] Auger, A., Hansen, N.: A restart CMA evolution strat-
performs the other algorithms when we consider the eval- egy with increasing population size. In: The 2005 IEEE
uations budget FEs ≤ 101.5 D, i.e. FEs ≤ 150 for 5D and
Congress on Evolutionary Computation 2, 1769–1776,
FEs ≤ 600 for 20D. However, as the number of the consid-
IEEE, Sept. 2005
[2] Baerns, M., Holeňa, M.:. Combinatorial development of
ered original evaluations rises, the original CMA-ES or the
s∗ solid catalytic materials. Design of high-throughput ex-
ACM-ES usually performs better. This fact can be sum- periments, data analysis, data mining. Imperial College
marized that our GP5-CMA-ES is convenient especially Press / World Scientific, London, 2009
for the applications where a very low number of function [3] Bajer, L., Pitra, Z., Holeňa, M.: Benchmarking gaus-
evaluations is available, such as in [2]. sian processes and random forests surrogate models on
the BBOB noiseless testbed. In: Proceedings of the
17th GECCO Conference Companion, Madrid, July 2015,
4 Conclusions & Future Work ACM, New York
[4] Breiman, L.: Classification and regression trees. Chapman
In this paper, we have compared the surrogate-assisted S- & Hall/CRC, 1984
CMA-ES, which uses GP and RF continuous regression [5] Hansen, N.: The CMA evolution strategy: A comparing
models, with s∗ACM-ES-k algorithm based on ordinal re- review. In: J. A. Lozano, P. Larranaga, I. Inza, E. Ben-
gression by Ranking SVM, and the original CMA-ES, all goetxea, (eds), Towards a New Evolutionary Computation,
in their IPOP and BIPOP versions. The comparison shows 192 in Studies in Fuzziness and Soft Computing, 75–102,
that Gaussian process S-CMA-ES usually outperforms the Springer Berlin Heidelberg, Jan. 2006
ordinal-based s∗ACM-ES-k in early stages of the algo- [6] Hansen, N.: Benchmarking a BI-population CMA-ES on
rithm search, especially on multimodal functions (BBOB the BBOB-2009 function testbed. In: Proceedings of the
functions 15–24). However, the algorithms and surrogate 11th Annual GECCO Conference Companion: Late Break-
models should be further analyzed and compared since, ing Papers, GECCO’09, 2389–2396, New York, NY, USA,
2009, ACM
for example, the NEWUOA [16] or SMAC [10, 11] al-
gorithms spend a considerably lower number of function [7] Hansen, N., Auger, A., Finck, S., Ros, R.: Real-parameter
black-box optimization benchmarking 2012: Experimental
evaluations than the CMA-ES in these early optimization
setup. Technical Report, INRIA, 2012
phases. The BIPOP versions of the algorithms did not in-
[8] Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter
creased performances of appropriate IPOP versions except
black-box optimization benchmarking 2009: Noiseless
BBOB function 5.
functions definitions. Technical Report RR-6829, INRIA,
A natural perspective of improving S-CMA-ES is to 2009, updated February 2010
make the number of model-evaluated generations self-
[9] Hansen, N., Ros, R.: Benchmarking a weighted nega-
adaptive. We will additionally investigate different prop- tive covariance matrix update on the BBOB-2010 noise-
erties of continuous and ordinal regression in view of their less testbed. In: Proceedings of the 12th Annual Confer-
applicability as regression models. Different cases and ence Companion on Genetic and Evolutionary Computa-
benchmarks where the ordinal regression is clearly supe- tion, GECCO’10, 1673–1680, New York, NY, USA, 2010,
rior to continuous regression will be further identified. For ACM
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES 193
[10] Hutter, F., Hoos, H., Leyton-Brown, K.: Sequential model-
based optimization for general algorithm configuration. In:
C. Coello, (ed.), Learning and Intelligent Optimization
2011, volume 6683 of Lecture Notes in Computer Science,
507–523, Springer Berlin Heidelberg, 2011
[11] Hutter, F., Hoos, H., Leyton-Brown, K.: An evaluation of
sequential model-based optimization for expensive black-
box functions. In: Proceedings of the 15th Annual Confer-
ence Companion on Genetic and Evolutionary Computa-
tion, GECCO’13 Companion, 1209–1216, New York, NY,
USA, 2013, ACM
[12] Loshchilov, I., Schoenauer, M., Sebag, M.: Alterna-
tive restart strategies for CMA-ES. In: C. A. C. Coello,
V. Cutello, K. Deb, S. Forrest, G. Nicosia, and M. Pavone,
(eds), PPSN (1), volume 7491 of Lecture Notes in Com-
puter Science, 296–305, Springer, 2012
[13] Loshchilov, I., Schoenauer, M., Sebag, M.: Self-adaptive
surrogate-assisted covariance matrix adaptation evolution
strategy. In: Proceedings of the 14th GECCO, GECCO ’12,
321–328, New York, NY, USA, 2012, ACM
[14] Loshchilov, I., Schoenauer, M., Sebag, M.: BI-population
CMA-ES algorithms with surrogate models and line
searches. In: Genetic and Evolutionary Computation Con-
ference (GECCO Companion), 1177–1184, ACM Press,
July 2013
[15] Loshchilov, I., Schoenauer, M., Sebag, M.: Intensive surro-
gate model exploitation in self-adaptive surrogate-assisted
CMA-ES (saACM-ES). In: Genetic and Evolutionary
Computation Conference (GECCO), 439–446, ACM Press,
July 2013
[16] Powell, M. J. D.: The NEWUOA software for uncon-
strained optimization without derivatives. In: G. D. Pillo,
M. Roma, (eds), Large-Scale Nonlinear Optimization,
number 83 in Nonconvex Optimization and Its Applica-
tions, 255–297, Springer US, 2006
[17] Rasmussen, C. E., Williams, C. K. I.: Gaussian processes
for machine learning. Adaptative Computation and Ma-
chine Learning Series, MIT Press, 2006