=Paper= {{Paper |id=None |storemode=property |title=Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES |pdfUrl=https://ceur-ws.org/Vol-1422/186.pdf |volume=Vol-1422 |dblpUrl=https://dblp.org/rec/conf/itat/PitraBH15 }} ==Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES== https://ceur-ws.org/Vol-1422/186.pdf
J. Yaghob (Ed.): ITAT 2015 pp. 186–193
Charles University in Prague, Prague, 2015



                    Comparing SVM, Gaussian Process and Random Forest
                            Surrogate Models for the CMA-ES

                                       Zbyněk Pitra1,2 , Lukáš Bajer3,4 , and Martin Holeňa3
                                                1
                                                    National Institute of Mental Health
                                             Topolová 748, 250 67 Klecany, Czech Republic
                                                          z.pitra@gmail.com
                    2
                      Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague
                                              Břehová 7, 115 19 Prague 1, Czech Republic
                            3
                               Institute of Computer Science, Academy of Sciences of the Czech Republic
                                       Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic
                                                     {bajer,holena}@cs.cas.cz
                                 4
                                    Faculty of Mathematics and Physics, Charles University in Prague
                                        Malostranské nám. 25, 118 00 Prague 1, Czech Republic

Abstract: In practical optimization tasks, it is more and            population size. This issue resulted in development of sev-
more frequent that the objective function is black-box               eral restart strategies [12], such as IPOP-CMA-ES [1] and
which means that it cannot be described mathematically.              BIPOP-CMA-ES [6] performing restarts with population
Such functions can be evaluated only empirically, usually            size successively increased, or aCMA-ES [9] using also
through some costly or time-consuming measurement, nu-               unsuccessful individuals for covariance matrix adaptation.
merical simulation or experimental testing. Therefore, an               Furthermore, the CMA-ES often requires more fitness
important direction of research is the approximation of              function evaluations to find the optimum than many real-
these objective functions with a suitable regression model,          world experiments can offer. In order to decrease the num-
also called surrogate model of the objective functions.              ber of evaluations in evolutionary algorithms, it is conve-
This paper evaluates two different approaches to the con-            nient to periodically train a surrogate model of the fitness
tinuous black-box optimization which both integrates sur-            function and use it for evaluation of new points instead
rogate models with the state-of-the-art optimizer CMA-               of the original function. The second option is to use the
ES. The first Ranking SVM surrogate model estimates the              model for selection of the most promising points to be
ordering of the sampled points as the CMA-ES utilizes                evaluated by the original fitness.
only the ranking of the fitness values. However, we show                Loshchilov’s        surrogate-model-based      algorithm
                                                                     s∗
that continuous Gaussian processes model provides in the               ACM-ES [13] utilizes the former approach: it esti-
early states of the optimization comparable results.                 mates the ordering of the fitness values required by the
                                                                     CMA-ES using Ranking Support Vector Machines (SVM)
                                                                     as an ordinal regression model. Moreover, it has been
1    Introduction                                                    shown [13] that model parameters (hyperparametres)
                                                                     used to construct Ranking SVM model can be optimized
Optimization of an expensive objective or fitness function           during the search by the pure CMA-ES algorithm.
plays an important role in many engineering and research             Later proposed s∗ACM-ES extensions, referred to as
                                                                     s∗
tasks. For such functions, it is sometimes difficult to find           ACM-ES-k [15] and BIPOP-s∗ACM-ES-k [14], use
an exact analytical formula, or to obtain any derivatives or         a more intensive exploitation of the surrogate model by
information about smoothness. Instead, values for a given            increasing population size in generations evaluated by the
input are possible to be obtained only through expen-                model.
sive and time-consuming measurements and experiments.                   More recently, a similar algorithm based on regres-
Those functions are called black-box, and because of the             sion surrogate model called S-CMA-ES [3] has been pre-
evaluation costs, the primary criterion for assessment of            sented. As opposed to the former algorithm, S-CMA-
the black-box optimizers is the number of fitness function           ES is performing continuous regression by Gaussian pro-
evaluations necessary to achieve the optimal value.                  cesses (GP) [17] and random forests (RF) [4].
   The Covariance Matrix Adaptation Evolution Strategy                  This paper compares the two mentioned surrogate
(CMA-ES) [5] is considered to be the state-of-the-art of             CMA-ES algorithms, s∗ACM-ES-k and S-CMA-ES, and
the black-box continuous optimization. The important                 the original CMA-ES itself. We benchmark these al-
property of the CMA-ES is that it advances through the               gorithms on the BBOB/COCO testing set [7, 8] not
search space only according to the ordering of the function          only in their one population IPOP-CMA-ES version,
values in current population. Hence, the search of the algo-         but also in combination with the two-population-size
rithm is rather local which predisposes it to premature con-         BIPOP-CMA-ES.
vergence in local optima if not used with sufficiently large
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES                                            187

                                                                        s∗
   The remainder of the paper is organized as follows.            2.3     ACM-ES-k
The next chapter briefly describes tested algorithms: the
CMA-ES, the BIPOP-CMA-ES, the s∗ACM-ES-k, and the                 Loshchilov’s version of the CMA-ES using the ordinal re-
S-CMA-ES. Section 3 contains experimental setup and               gression by Ranking SVM as surrogate model in specific
results, and Section 4 concludes the paper and suggests           generations instead of the original function is referred to as
                                                                  s∗
further research directions.                                        ACM-ES [13], and its extension using a more intensive
                                                                  exploitation is called s∗ACM-ES-k [15].
                                                                     Before the main loop starts, the s∗ACM-ES-k evalu-
                                                                  ates gstart generations by the original function, then it
2     Algorithms                                                  repeats the following steps: First, the surrogate model
                                                                  is constructed using hyperparameters θ , and the original
2.1 The CMA-ES                                                    function-evaluated points from previous generations. Sec-
                                                                  ond, the surrogate model is optimized by the CMA-ES
In each generation g, the CMA-ES [5] generates λ new              for gm generations with population size λ = kλ λdefault and
candidate solutions xk ∈ RD , where k = 1,...,λ , from            the number of best points µ = kµ µdefault , where kλ ,kµ ≥ 1.
a multivariate normal distribution N(m(g) ,σ 2 (g) C(g) ),        Third, the following generation is evaluated by the origi-
where m(g) is the mean interpretable as the current best          nal function using λ = λdefault and µ = µdefault . To avoid a
estimate of the optimum, σ 2 (g) the step size, representing      potential divergence when gm fluctuate between 0 and 1,
the overall standard deviation, and C(g) the D × D covari-        kλ > 1 is used only in the case of gm ≥ gm λ , where gm λ de-
ance matrix. The algorithm selects the µ points with the          notes the number of generations suitable for effective ex-
lowest function value from λ generated candidates to ad-          ploitation using the model. Then the model error is calcu-
just distribution parameters for the next generation.             lated according to the comparison of ranking between the
                                                                  original and model evaluation of the last generation. After
   The CMA-ES uses restart strategies to deal with mul-
                                                                  that, the gm is adjusted in accordance with the model error.
timodal fitness landscapes and to avoid being trapped
                                                                  As the last step, the s∗ACM-ES-k searches a hyperparam-
in local optima. A multi-start strategy where the pop-
                                                                  eter space by one generation of the CMA-ES minimizing
ulation size is doubled in each restart is referred to as
                                                                  the model error to find the most suitable hyperparameter
IPOP-CMA-ES [1].
                                                                  settings θnew for the next model-evaluated generations.
                                                                     The s∗ACM-ES-k version using BIPOP-CMA-ES pro-
                                                                  posed in [14] is called BIPOP-s∗ACM-ES-k.
2.2   BIPOP-CMA-ES

The BIPOP-CMA-ES [6], unlike IPOP-CMA-ES, consid-                 2.4   S-CMA-ES
ers two different restart strategies. In the first one, cor-
                                                                  As opposed to the former algorithms, a different ap-
responding to the IPOP-CMA-ES, the population size is
                                                                  proach to surrogate model usage is incorporated in the
doubled in each restart irestart using a constant initial step-
            = σdefault
       0       0                                                  S-CMA-ES [3]. The algorithm is a modification of
size σlarge            :
                                                                  CMA-ES where the original evaluating and sampling
                                                                  phases are substituted by the Algorithm 1 at the beginning
                    λlarge = 2irestart λdefault .          (1)    of each CMA-ES generation.
                                                                     In order to avoid the false convergence of the algorithm
  In the second one, the smaller population size λsmall is        in the BBOB benchmarking toolbox, the model-predicted
computed as                                                       values are adapted to never be lower then the so far mini-
                                                                  mum of the original function (see the step 17 in the pseu-
                    ⎢                       U[0,1]2 ⎥
                    ⎢                               ⎥
                    ⎢                               ⎥
                                1 λlarge                          docode).
           λsmall = ⎢λdefault (            )        ⎥,
                    ⎢                               ⎥
                                                           (2)       The main difference between the S-CMA-ES and the
                    ⎢           2 λ                 ⎥
                    ⎣                               ⎦
                                   default                        s∗
                                                                    ACM-ES-k is in the manner how the CMA-ES is uti-
                                                                  lized. Considering S-CMA-ES, the model prediction
where U[0,1] denotes the uniform distribution in [0,1].           or training is performed within each generation of the
The initial step-size is also randomly drawn as                   CMA-ES. On the contrary in the s∗ACM-ES-k, individual
                                                                  generations of the CMA-ES are started to optimize either
                 0
                σsmall = σdefault
                          0
                                  × 10−2U[0,1] .           (3)    original fitness, surrogate fitness, or model itself.

   The BIPOP-CMA-ES performs the first run using
the default population size λdefault and the initial step-
       0
size σdefault . In the following restarts, the strategy with
less function evaluations summed over all algorithm runs
is selected.
188                                                                                                     Z. Pitra, L. Bajer, M. Holeňa


3     Experimental Evaluation                                      defined for any dimension D ≥ 2; the dimensions used for
                                                                   our tests are 2, 5, 10, and 20. The set of functions com-
The core of this paper lies in a systematic comparison of          prises, among others, well-known continuous optimiza-
the two mentioned approaches to using surrogate mod-               tion benchmarks like ellipsoid, Rosenbrock’s, Rastrigin’s,
els with the CMA-ES and the original CMA-ES algo-                  Schweffel’s or Weierstrass’ function.
rithm itself. The first group of surrogate-based algo-                The framework calls the optimizers on 15 different in-
rithms is formed by the S-CMA-ES algorithms using                  stances for each function and dimension, meaning that
Gaussian processes and random forests models, and the              1440 optimization runs were called for each of the eight
other group is formed by the s∗ACM-ES algorithm. These             considered algorithms. The graphs at the end of the pa-
four algorithms (CMA-ES, GP-CMA-ES, RF-CMA-ES,                     per show detailed results in a per-function and per-group-
s∗                                                                 of-function manner. The following paragraphs summarize
  ACM-ES) are tested in their IPOP version (based on
IPOP-CMA-ES) [1] and in the bi-population restart strat-           the parameters of the algorithms.
egy version (based on BIPOP-CMA-ES and its deriva-
tives) [6].                                                        The CMA-ES. The original CMA-ES was used in its
                                                                   IPOP-CMA-ES version (Matlab code v. 3.61) with num-
                                                                   ber of restarts = 4, IncPopSize = 2, σstart = 83 , λ = 4 +
3.1   Experimental Setup                                           ⌊3logD⌋. The remainder settings were left default.
The experimental evaluation is performed through the               s∗
                                                                     ACM-ES. We have used Loshchilov’s GECCO 2013
noiseless part of the COCO/BBOB framework (COm-                    Matlab code xacmes.m [14] in its s∗ACM-ES version, set-
paring Continuous Optimizers / Black-Box Optimization              ting the parameters CMAactive = 1, newRestartRules = 0
Benchmarking) [7, 8]. It is a collection of 24 benchmark           and withSurr = 1, modelType = 1, withModelEnsembles =
functions with different degree of smoothness, uni-/multi-         0, withModelOptimization = 1, hyper_lambda = 20, λMult
modality, separability, conditionality etc. Each function is       = 1, µMult = 1 and ΛminIter = 4.

Algorithm 1 Surrogate CMA-ES Algorithm [3]                         S-CMA-ES: GP5-CMA-ES and RF5-CMA-ES. The num-
                                                                   ber after the GP/RF in the names of the algorithms denotes
Input: g (generation), gm (number of model generations),
                                                                   the number of model-evaluated generations gm , which are
    σ , λ , m, C (CMA-ES internal variables),
                                                                   evaluated by the model in row. All considered S-CMA-ES
    r (maximal distance between training points and m),
                                                                   versions use the distance r = 8 (see algorithm 1). For the
    nREQ (minimal number of points for model training),                             ν=5/2
    nMAX (maximal number of points for model training),            GP model, KMatérn covariance function with starting val-
    A (archive), fM (model), f (original fitness function)         ues (σn2 ,l,σ 2f ) = log(0.01,2,0.5) has been used (see [3]
 1: xk ∼ N(m,σ 2 C) k = 1,...,λ {CMA-ES sampling}                  for the details). We have tested RF comprising 100 regres-
 2: if g is original-evaluated then                                sion trees, each containing at least two training points in
 3:    yk ← f (xk )       k = 1,...,λ      {fitness evaluation}    each leaf. The CMA-ES parameters (IPOP version, σstart ,
 4:    A = A ∪ {(xk ,yk )}λk=1                                     λ , IncPopSize etc.) were used the same as in the pure
 5:    (Xtr ,ytr ) ← {(x,y)∈A∣(m−x)⊺ σ C−1/2 (m−x) ≤ r}            CMA-ES experiments. All S-CMA-ES parameter values
 6:    if ∣Xtr ∣ ≥ nREQ then                                       were chosen according to preliminary testing on several
 7:        (Xtr ,ytr ) ← choose nMAX points if ∣Xtr ∣ > nMAX       functions from the COCO/BBOB framework.
 8:        {transformation to the eigenvector basis:}
           Xtr ← {(σ C−1/2 )⊺ xtr for each xtr ∈ Xtr }             BIPOP version of the algorithms. The bi-population ver-
 9:        fM ← trainModel(Xtr ,ytr )                              sions BIPOP-CMA-ES and BIPOP-s∗ACM-ES use the
10:        mark (g + 1) as model-evaluated                         same Loshchilov’s Matlab code xacmes.m with the pa-
11:    else                                                        rameter BIPOP = 1. The BIPOP-GP5-CMA-ES and
12:        mark (g + 1) as original-evaluated                      BIPOP-RF5-CMA-ES algorithms are constructed in the
13:    end if                                                      same manner as the S-CMA-ES was transformed from the
14: else
                                                                   CMA-ES – by integration of the Algorithm 1 into every
15:    xk ← (σ C−1/2 )⊺ xk         k = 1,...,λ                     generation of the BIPOP-s∗ACM-ES.
16:    yk ← fM (xk ) k = 1,...,λ           {model evaluation}
17:    {shift yk values if (minyk ) < best y from A}               3.2 Results
       yk = yk + max{0, minA y − minyk }             k = 1,...,λ
18:    if gm model generations passed then                         The performance of the algorithms is compared in the
19:        mark (g + 1) as original-evaluated                      graphs placed in Figures 1–3. The graphs in Figure 1 de-
       end if                                                      pict the expected running time (ERT), which depends on
                                                                   a given target function value ft = fopt + ∆ f – the true opti-
20:
21: end if
Output: fM , A, (yk )λk=1                                          mum fopt of the respective benchmark function raised by
                                                                   a small value ∆ f . The ERT is computed over all relevant
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES                                                                                                                     189




    4                       1 Sphere                4               2 Ellipsoid separable             4               3 Rastrigin separable            4         4 Skew Rastrigin-Bueche separ

    3                                               3                                                 3                                                3

    2                                               2                                                 2                                                2

    1                                               1                                                 1                                                1

    0 target RL/dim: 10                             0 target RL/dim: 10                               0 target RL/dim: 10                              0 target RL/dim: 10
       2      3       5         10       20    40      2      3       5           10        20   40      2      3       5           10       20   40      2      3       5       10        20    40
    3                   5 Linear slope              4                 6 Attractive sector             4                  7 Step-ellipsoid              4              8 Rosenbrock original

                                                    3                                                 3                                                3
    2

                                                    2                                                 2                                                2
    1
                                                    1                                                 1                                                1

    0                                               0 target RL/dim: 10                               0 target RL/dim: 10                              0 target RL/dim: 10
        target RL/dim: 10
         2      3       5       10       20    40      2      3       5           10        20   40      2      3       5           10       20   40      2      3       5       10        20    40
    4               9 Rosenbrock rotated            4                       10 Ellipsoid              4                        11 Discus               4                   12 Bent cigar

    3                                               3                                                 3                                                3

    2                                               2                                                 2                                                2

    1                                               1                                                 1                                                1

    0 target RL/dim: 10                             0 target RL/dim: 10                               0 target RL/dim: 10                              0 target RL/dim: 10
       2      3       5         10       20    40      2      3       5           10        20   40      2      3       5           10       20   40      2      3       5       10        20    40
    4                   13 Sharp ridge              4           14 Sum of different powers            3                       15 Rastrigin             3                  16 Weierstrass

    3                                               3
                                                                                                      2                                                2

    2                                               2
                                                                                                      1                                                1
    1                                               1

    0 target RL/dim: 10                             0 target RL/dim: 10                               0                                                0
                                                                                                          target RL/dim: 10                                target RL/dim: 10
       2      3       5         10       20    40      2      3       5           10        20   40        2      3       5         10       20   40        2      3       5     10        20    40
    3           17 Schaffer F7, condition 10        3         18 Schaffer F7, condition 1000          5         19 Griewank-Rosenbrock F8F2            4               20 Schwefel x*sin(x)

                                                                                                      4
                                                                                                                                                       3
    2                                               2
                                                                                                      3
                                                                                                                                                       2
                                                                                                      2
    1                                               1
                                                                                                                                                       1
                                                                                                      1

    0                                               0                                                 0 target RL/dim: 10                              0 target RL/dim: 10
        target RL/dim: 10                               target RL/dim: 10
         2      3       5       10       20    40        2      3       5         10        20   40      2      3       5           10       20   40      2      3       5       10        20    40
    3             21 Gallagher 101 peaks            3              22 Gallagher 21 peaks              4                   23 Katsuuras                 4             24 Lunacek bi-Rastrigin

                                                                                                      3                                                3
    2                                               2

                                                                                                      2                                                2
    1                                               1
                                                                                                      1                                                1

    0                                               0                                                 0 target RL/dim: 10                              0 target RL/dim: 10
        target RL/dim: 10                               target RL/dim: 10
         2      3       5       10       20    40        2      3       5         10        20   40      2      3       5           10       20   40      2      3       5       10        20    40


Figure 1: Expected running time (ERT in number of f -evaluations as log10 value) divided by dimension versus dimension.
The target function value is chosen such that the bestGECCO2009 artificial algorithm just failed to achieve an ERT of
10 × DIM. Different symbols correspond to different algorithms given in the legend of f1 and f24 . Light symbols give the
maximum number of function evaluations from the longest trial divided by dimension. Black stars indicate a statistically
better result compared to all other algorithms with p < 0.01 and Bonferroni correction number of dimensions (six). Legend:
○:BIPOP-CMAES, ▽:BIPOP-GP5, ⋆:BIPOP-RF5, ◻:BIPOP-saACMES, △:CMA-ES, ♢:GP5-CMAES, 9:RF5-CMAES,
D:saACMES
190                                                                                                                                                                                Z. Pitra, L. Bajer, M. Holeňa




                                                              separable fcts                                                                                          moderate fcts
                                        1.0 f1-5,5-D                                      best 2009
                                                                                          best 2009                                             1.0 f6-9,5-D                                         best 2009
                                                                                                                                                                                                     best 2009
  Proportion of function+target pairs




                                                                                                          Proportion of function+target pairs
                                                                                          BIPOP-s
                                                                                          BIPOP-saACMES                                                                                              BIPOP-s
                                                                                                                                                                                                     BIPOP-saACMES
                                        0.8                                                                                                     0.8
                                                                                          saACMES
                                                                                          saACMES                                                                                                    saACMES
                                                                                                                                                                                                     saACMES
                                                                                          GP5-CMA
                                                                                          GP5-CMAES                                                                                                  BIPOP-C
                                                                                                                                                                                                     BIPOP-CMAES
                                        0.6                                                                                                     0.6
                                                                                          BIPOP-C
                                                                                          BIPOP-CMAES                                                                                                CMA-ES
                                                                                                                                                                                                     CMA-ES
                                        0.4                                               CMA-ES
                                                                                          CMA-ES                                                0.4                                                  GP5-CMA
                                                                                                                                                                                                     GP5-CMAES
                                                                                          BIPOP-G
                                                                                          BIPOP-GP5                                                                                                  BIPOP-G
                                                                                                                                                                                                     BIPOP-GP5
                                        0.2                                               RF5-CMA
                                                                                                                                                0.2                                                  RF5-CMA
                                                                                          RF5-CMAES                                                                                                  RF5-CMAES

                                        0.00                                              BIPOP-RF5
                                                                                          BIPOP-R                                               0.00                                                 BIPOP-RF5
                                                                                                                                                                                                     BIPOP-R
                                                          1              2            3                                                                           1              2            3
                                                       log10 of (# f-evals / dimension)                                                                        log10 of (# f-evals / dimension)
                                                           ill-conditioned fcts                                                                                     multi-modal fcts
                                        1.0 f10-14,5-D                                    best 2009
                                                                                          best 2009                                             1.0 f15-19,5-D                                       best 2009
                                                                                                                                                                                                     best 2009
  Proportion of function+target pairs




                                                                                                          Proportion of function+target pairs
                                                                                          saACMES
                                                                                          saACMES                                                                                                    saACMES
                                                                                                                                                                                                     saACMES
                                        0.8                                                                                                     0.8
                                                                                          BIPOP-s
                                                                                          BIPOP-saACMES                                                                                              BIPOP-s
                                                                                                                                                                                                     BIPOP-saACMES
                                                                                          BIPOP-C
                                                                                          BIPOP-CMAES                                                                                                CMA-ES
                                                                                                                                                                                                     CMA-ES
                                        0.6                                                                                                     0.6
                                                                                          BIPOP-G
                                                                                          BIPOP-GP5                                                                                                  BIPOP-G
                                                                                                                                                                                                     BIPOP-GP5
                                        0.4                                               CMA-ES
                                                                                          CMA-ES                                                0.4                                                  BIPOP-C
                                                                                                                                                                                                     BIPOP-CMAES
                                                                                          GP5-CMA
                                                                                          GP5-CMAES                                                                                                  GP5-CMA
                                                                                                                                                                                                     GP5-CMAES
                                        0.2                                               RF5-CMA
                                                                                                                                                0.2                                                  RF5-CMA
                                                                                          RF5-CMAES                                                                                                  RF5-CMAES

                                        0.00                                              BIPOP-RF5
                                                                                          BIPOP-R                                               0.00                                                 BIPOP-RF5
                                                                                                                                                                                                     BIPOP-R
                                                          1              2            3                                                                           1              2            3
                                                       log10 of (# f-evals / dimension)                                                                        log10 of (# f-evals / dimension)
                                                weakly structured multi-modal fcts                                                                                     all functions
                                        1.0 f20-24,5-D                                    best 2009
                                                                                          best 2009                                             1.0 f1-24,5-D                                        best 2009
                                                                                                                                                                                                     best 2009
  Proportion of function+target pairs




                                                                                                          Proportion of function+target pairs




                                                                                          BIPOP-C
                                                                                          BIPOP-CMAES                                                                                                saACMES
                                                                                                                                                                                                     saACMES
                                        0.8                                                                                                     0.8
                                                                                          BIPOP-s
                                                                                          BIPOP-saACMES                                                                                              BIPOP-s
                                                                                                                                                                                                     BIPOP-saACMES
                                                                                          saACMES
                                                                                          saACMES                                                                                                    BIPOP-C
                                                                                                                                                                                                     BIPOP-CMAES
                                        0.6                                                                                                     0.6
                                                                                          GP5-CMA
                                                                                          GP5-CMAES                                                                                                  CMA-ES
                                                                                                                                                                                                     CMA-ES
                                        0.4                                               CMA-ES
                                                                                          CMA-ES                                                0.4                                                  GP5-CMA
                                                                                                                                                                                                     GP5-CMAES
                                                                                          BIPOP-G
                                                                                          BIPOP-GP5                                                                                                  BIPOP-G
                                                                                                                                                                                                     BIPOP-GP5
                                        0.2                                               RF5-CMA
                                                                                                                                                0.2                                                  RF5-CMA
                                                                                          RF5-CMAES                                                                                                  RF5-CMAES

                                        0.00                                              BIPOP-RF5
                                                                                          BIPOP-R                                               0.00                                                 BIPOP-RF5
                                                                                                                                                                                                     BIPOP-R
                                                          1              2            3                                                                           1              2            3
                                                       log10 of (# f-evals / dimension)                                                                        log10 of (# f-evals / dimension)

Figure 2: Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by
dimension (FEvals/DIM) for all functions and subgroups in 5-D. The targets are chosen from 10[−8..2] such that the
bestGECCO2009 artificial algorithm just not reached them within a given budget of k × DIM, with k ∈ {0.5,1.2,3,10,50}.
The “best 2009” line corresponds to the best ERT observed during BBOB 2009 for each selected target.
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES                                                                                                                  191




                                                            separable fcts                                                                                      moderate fcts
                                        1.0 f1-5,20-D                                  best 2009
                                                                                       best 2009                                             1.0 f6-9,20-D                                  saACMES
                                                                                                                                                                                            saACMES
  Proportion of function+target pairs




                                                                                                       Proportion of function+target pairs
                                                                                       BIPOP-s
                                                                                       BIPOP-saACMES                                                                                        BIPOP-s
                                                                                                                                                                                            BIPOP-saACMES
                                        0.8                                                                                                  0.8
                                                                                       saACMES
                                                                                       saACMES                                                                                              best 2009
                                                                                                                                                                                            best 2009
                                                                                       BIPOP-C
                                                                                       BIPOP-CMAES                                                                                          BIPOP-C
                                                                                                                                                                                            BIPOP-CMAES
                                        0.6                                                                                                  0.6
                                                                                       CMA-ES
                                                                                       CMA-ES                                                                                               CMA-ES
                                                                                                                                                                                            CMA-ES
                                        0.4                                            BIPOP-G
                                                                                       BIPOP-GP5                                             0.4                                            GP5-CMA
                                                                                                                                                                                            GP5-CMAES
                                                                                       GP5-CMA
                                                                                       GP5-CMAES                                                                                            BIPOP-G
                                                                                                                                                                                            BIPOP-GP5
                                        0.2                                            RF5-CMA
                                                                                                                                             0.2                                            RF5-CMA
                                                                                       RF5-CMAES                                                                                            RF5-CMAES

                                        0.00                                           BIPOP-RF5
                                                                                       BIPOP-R                                               0.00                                           BIPOP-RF5
                                                                                                                                                                                            BIPOP-R
                                                       1              2            3                                                                        1              2            3
                                                    log10 of (# f-evals / dimension)                                                                     log10 of (# f-evals / dimension)
                                                          ill-conditioned fcts                                                                                 multi-modal fcts
                                        1.0 f10-14,20-D                                saACMES
                                                                                       saACMES                                               1.0 f15-19,20-D                                best 2009
                                                                                                                                                                                            best 2009
  Proportion of function+target pairs




                                                                                                       Proportion of function+target pairs
                                                                                       best 2009
                                                                                       best 2009                                                                                            saACMES
                                                                                                                                                                                            saACMES
                                        0.8                                                                                                  0.8
                                                                                       BIPOP-s
                                                                                       BIPOP-saACMES                                                                                        BIPOP-s
                                                                                                                                                                                            BIPOP-saACMES
                                                                                       BIPOP-C
                                                                                       BIPOP-CMAES                                                                                          BIPOP-C
                                                                                                                                                                                            BIPOP-CMAES
                                        0.6                                                                                                  0.6
                                                                                       CMA-ES
                                                                                       CMA-ES                                                                                               GP5-CMA
                                                                                                                                                                                            GP5-CMAES
                                        0.4                                            GP5-CMA
                                                                                       GP5-CMAES                                             0.4                                            BIPOP-G
                                                                                                                                                                                            BIPOP-GP5
                                                                                       BIPOP-G
                                                                                       BIPOP-GP5                                                                                            RF5-CMA
                                                                                                                                                                                            RF5-CMAES
                                        0.2                                            BIPOP-R
                                                                                                                                             0.2                                            CMA-ES
                                                                                       BIPOP-RF5                                                                                            CMA-ES

                                        0.00                                           RF5-CMAES
                                                                                       RF5-CMA                                               0.00                                           BIPOP-RF5
                                                                                                                                                                                            BIPOP-R
                                                       1              2            3                                                                        1              2            3
                                                    log10 of (# f-evals / dimension)                                                                     log10 of (# f-evals / dimension)
                                                weakly structured multi-modal fcts                                                                               all functions
                                        1.0 f20-24,20-D                                best 2009
                                                                                       best 2009                                             1.0 f1-24,20-D                                 best 2009
                                                                                                                                                                                            best 2009
  Proportion of function+target pairs




                                                                                                       Proportion of function+target pairs




                                                                                       GP5-CMA
                                                                                       GP5-CMAES                                                                                            BIPOP-s
                                                                                                                                                                                            BIPOP-saACMES
                                        0.8                                                                                                  0.8
                                                                                       BIPOP-C
                                                                                       BIPOP-CMAES                                                                                          saACMES
                                                                                                                                                                                            saACMES
                                                                                       CMA-ES
                                                                                       CMA-ES                                                                                               BIPOP-C
                                                                                                                                                                                            BIPOP-CMAES
                                        0.6                                                                                                  0.6
                                                                                       BIPOP-R
                                                                                       BIPOP-RF5                                                                                            CMA-ES
                                                                                                                                                                                            CMA-ES
                                        0.4                                            saACMES
                                                                                       saACMES                                               0.4                                            GP5-CMA
                                                                                                                                                                                            GP5-CMAES
                                                                                       BIPOP-s
                                                                                       BIPOP-saACMES                                                                                        BIPOP-G
                                                                                                                                                                                            BIPOP-GP5
                                        0.2                                            BIPOP-G
                                                                                                                                             0.2                                            BIPOP-R
                                                                                       BIPOP-GP5                                                                                            BIPOP-RF5

                                        0.00                                           RF5-CMAES
                                                                                       RF5-CMA                                               0.00                                           RF5-CMAES
                                                                                                                                                                                            RF5-CMA
                                                       1              2            3                                                                        1              2            3
                                                    log10 of (# f-evals / dimension)                                                                     log10 of (# f-evals / dimension)

Figure 3: Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by
dimension (FEvals/DIM) for all functions and subgroups in 20-D. The targets are chosen from 10[−8..2] such that the
bestGECCO2009 artificial algorithm just not reached them within a given budget of k × DIM, with k ∈ {0.5,1.2,3,10,50}.
The “best 2009” line corresponds to the best ERT observed during BBOB 2009 for each selected target.
192                                                                                                  Z. Pitra, L. Bajer, M. Holeňa


trials as the number of the original function evaluations       example, hybrid surrogate models combining both kinds
(FEs) executed during each trial until the best function        of regression will be attempted.
value reached ft , summed over all trials and divided by
the number of trials that actually reached ft [7].
   As we can see in Figure 1, the 24 functions can be
                                                                Acknowledgements
roughly divided into two groups according to the algo-
                                                                This work was supported by the Czech Science Founda-
rithm which performed the best (at least in 10D and 20D).
                                                                tion (GAČR) grant P103/13-17187S, by the Grant Agency
The first group of functions where the CMA-ES performed
                                                                of the Czech Technical University in Prague with its grant
best consists of functions 1, 3, 4, 6, and 20 while on func-
                                                                No. SGS14/205/OHK4/3T/14, and by the project “Na-
tions 2, 5, 7, 10, 11, 13–16, 18, 21, 23, and 24, GP5-CMA-
                                                                tional Institute of Mental Health (NIMH-CZ)”, grant num-
ES is usually better. The usage of the BIPOP versions
                                                                ber CZ.1.05/2.1.00/03.0078 (and the European Regional
generally leads to no improvement or even to performance
                                                                Development Fund.). Further, access to computing and
decrease.
                                                                storage facilities owned by parties and projects contribut-
   The graphs in Figures 2 and 3 summarize the perfor-          ing to the National Grid Infrastructure MetaCentrum, pro-
mance over subgroups of the benchmark functions and             vided under the programme “Projects of Large Infras-
show the proportion of algorithm runs that reached the          tructure for Research, Development, and Innovations”
target value ft ∈ 10[−8..2] indeed ( ft was actually differ-    (LM2010005), is greatly appreciated.
ent for each respective function, see the figures captions).
Roughly speaking, the higher the colored line, the better
the performance of the algorithm is for the number of the       References
original evaluations given on the horizontal axis.
   Thus we can see that our GP5-CMA-ES usually out-              [1] Auger, A., Hansen, N.: A restart CMA evolution strat-
performs the other algorithms when we consider the eval-             egy with increasing population size. In: The 2005 IEEE
uations budget FEs ≤ 101.5 D, i.e. FEs ≤ 150 for 5D and
                                                                     Congress on Evolutionary Computation 2, 1769–1776,
FEs ≤ 600 for 20D. However, as the number of the consid-
                                                                     IEEE, Sept. 2005
                                                                 [2] Baerns, M., Holeňa, M.:. Combinatorial development of
ered original evaluations rises, the original CMA-ES or the
s∗                                                                   solid catalytic materials. Design of high-throughput ex-
  ACM-ES usually performs better. This fact can be sum-              periments, data analysis, data mining. Imperial College
marized that our GP5-CMA-ES is convenient especially                 Press / World Scientific, London, 2009
for the applications where a very low number of function         [3] Bajer, L., Pitra, Z., Holeňa, M.: Benchmarking gaus-
evaluations is available, such as in [2].                            sian processes and random forests surrogate models on
                                                                     the BBOB noiseless testbed. In: Proceedings of the
                                                                     17th GECCO Conference Companion, Madrid, July 2015,
4 Conclusions & Future Work                                          ACM, New York
                                                                 [4] Breiman, L.: Classification and regression trees. Chapman
In this paper, we have compared the surrogate-assisted S-            & Hall/CRC, 1984
CMA-ES, which uses GP and RF continuous regression               [5] Hansen, N.: The CMA evolution strategy: A comparing
models, with s∗ACM-ES-k algorithm based on ordinal re-               review. In: J. A. Lozano, P. Larranaga, I. Inza, E. Ben-
gression by Ranking SVM, and the original CMA-ES, all                goetxea, (eds), Towards a New Evolutionary Computation,
in their IPOP and BIPOP versions. The comparison shows               192 in Studies in Fuzziness and Soft Computing, 75–102,
that Gaussian process S-CMA-ES usually outperforms the               Springer Berlin Heidelberg, Jan. 2006
ordinal-based s∗ACM-ES-k in early stages of the algo-            [6] Hansen, N.: Benchmarking a BI-population CMA-ES on
rithm search, especially on multimodal functions (BBOB               the BBOB-2009 function testbed. In: Proceedings of the
functions 15–24). However, the algorithms and surrogate              11th Annual GECCO Conference Companion: Late Break-
models should be further analyzed and compared since,                ing Papers, GECCO’09, 2389–2396, New York, NY, USA,
                                                                     2009, ACM
for example, the NEWUOA [16] or SMAC [10, 11] al-
gorithms spend a considerably lower number of function           [7] Hansen, N., Auger, A., Finck, S., Ros, R.: Real-parameter
                                                                     black-box optimization benchmarking 2012: Experimental
evaluations than the CMA-ES in these early optimization
                                                                     setup. Technical Report, INRIA, 2012
phases. The BIPOP versions of the algorithms did not in-
                                                                 [8] Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter
creased performances of appropriate IPOP versions except
                                                                     black-box optimization benchmarking 2009: Noiseless
BBOB function 5.
                                                                     functions definitions. Technical Report RR-6829, INRIA,
   A natural perspective of improving S-CMA-ES is to                 2009, updated February 2010
make the number of model-evaluated generations self-
                                                                 [9] Hansen, N., Ros, R.: Benchmarking a weighted nega-
adaptive. We will additionally investigate different prop-           tive covariance matrix update on the BBOB-2010 noise-
erties of continuous and ordinal regression in view of their         less testbed. In: Proceedings of the 12th Annual Confer-
applicability as regression models. Different cases and              ence Companion on Genetic and Evolutionary Computa-
benchmarks where the ordinal regression is clearly supe-             tion, GECCO’10, 1673–1680, New York, NY, USA, 2010,
rior to continuous regression will be further identified. For        ACM
Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES   193


[10] Hutter, F., Hoos, H., Leyton-Brown, K.: Sequential model-
     based optimization for general algorithm configuration. In:
     C. Coello, (ed.), Learning and Intelligent Optimization
     2011, volume 6683 of Lecture Notes in Computer Science,
     507–523, Springer Berlin Heidelberg, 2011
[11] Hutter, F., Hoos, H., Leyton-Brown, K.: An evaluation of
     sequential model-based optimization for expensive black-
     box functions. In: Proceedings of the 15th Annual Confer-
     ence Companion on Genetic and Evolutionary Computa-
     tion, GECCO’13 Companion, 1209–1216, New York, NY,
     USA, 2013, ACM
[12] Loshchilov, I., Schoenauer, M., Sebag, M.: Alterna-
     tive restart strategies for CMA-ES. In: C. A. C. Coello,
     V. Cutello, K. Deb, S. Forrest, G. Nicosia, and M. Pavone,
     (eds), PPSN (1), volume 7491 of Lecture Notes in Com-
     puter Science, 296–305, Springer, 2012
[13] Loshchilov, I., Schoenauer, M., Sebag, M.: Self-adaptive
     surrogate-assisted covariance matrix adaptation evolution
     strategy. In: Proceedings of the 14th GECCO, GECCO ’12,
     321–328, New York, NY, USA, 2012, ACM
[14] Loshchilov, I., Schoenauer, M., Sebag, M.: BI-population
     CMA-ES algorithms with surrogate models and line
     searches. In: Genetic and Evolutionary Computation Con-
     ference (GECCO Companion), 1177–1184, ACM Press,
     July 2013
[15] Loshchilov, I., Schoenauer, M., Sebag, M.: Intensive surro-
     gate model exploitation in self-adaptive surrogate-assisted
     CMA-ES (saACM-ES). In: Genetic and Evolutionary
     Computation Conference (GECCO), 439–446, ACM Press,
     July 2013
[16] Powell, M. J. D.: The NEWUOA software for uncon-
     strained optimization without derivatives. In: G. D. Pillo,
     M. Roma, (eds), Large-Scale Nonlinear Optimization,
     number 83 in Nonconvex Optimization and Its Applica-
     tions, 255–297, Springer US, 2006
[17] Rasmussen, C. E., Williams, C. K. I.: Gaussian processes
     for machine learning. Adaptative Computation and Ma-
     chine Learning Series, MIT Press, 2006