<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing SVM, Gaussian Process and Random Forest Surrogate Models for the CMA-ES</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zbyneˇk Pitra</string-name>
          <email>z.pitra@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukáš Bajer</string-name>
          <email>bajer@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇ a</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Physics, Charles University in Prague Malostranské nám.</institution>
          <addr-line>25, 118 00 Prague 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague Brˇehová 7</institution>
          ,
          <addr-line>115 19 Prague 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2</institution>
          ,
          <addr-line>182 07 Prague 8</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Institute of Mental Health Topolová 748</institution>
          ,
          <addr-line>250 67 Klecany</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>186</fpage>
      <lpage>193</lpage>
      <abstract>
        <p>In practical optimization tasks, it is more and more frequent that the objective function is black-box which means that it cannot be described mathematically. Such functions can be evaluated only empirically, usually through some costly or time-consuming measurement, numerical simulation or experimental testing. Therefore, an important direction of research is the approximation of these objective functions with a suitable regression model, also called surrogate model of the objective functions. This paper evaluates two different approaches to the continuous black-box optimization which both integrates surrogate models with the state-of-the-art optimizer CMAES. The first Ranking SVM surrogate model estimates the ordering of the sampled points as the CMA-ES utilizes only the ranking of the fitness values. However, we show that continuous Gaussian processes model provides in the early states of the optimization comparable results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Optimization of an expensive objective or fitness function
plays an important role in many engineering and research
tasks. For such functions, it is sometimes difficult to find
an exact analytical formula, or to obtain any derivatives or
information about smoothness. Instead, values for a given
input are possible to be obtained only through
expensive and time-consuming measurements and experiments.
Those functions are called black-box, and because of the
evaluation costs, the primary criterion for assessment of
the black-box optimizers is the number of fitness function
evaluations necessary to achieve the optimal value.</p>
      <p>
        The Covariance Matrix Adaptation Evolution Strategy
(CMA-ES) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is considered to be the state-of-the-art of
the black-box continuous optimization. The important
property of the CMA-ES is that it advances through the
search space only according to the ordering of the function
values in current population. Hence, the search of the
algorithm is rather local which predisposes it to premature
convergence in local optima if not used with sufficiently large
population size. This issue resulted in development of
several restart strategies [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], such as IPOP-CMA-ES [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
BIPOP-CMA-ES [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] performing restarts with population
size successively increased, or aCMA-ES [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] using also
unsuccessful individuals for covariance matrix adaptation.
      </p>
      <p>Furthermore, the CMA-ES often requires more fitness
function evaluations to find the optimum than many
realworld experiments can offer. In order to decrease the
number of evaluations in evolutionary algorithms, it is
convenient to periodically train a surrogate model of the fitness
function and use it for evaluation of new points instead
of the original function. The second option is to use the
model for selection of the most promising points to be
evaluated by the original fitness.</p>
      <p>
        Loshchilov’s surrogate-model-based algorithm
s∗ACM-ES [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] utilizes the former approach: it
estimates the ordering of the fitness values required by the
CMA-ES using Ranking Support Vector Machines (SVM)
as an ordinal regression model. Moreover, it has been
shown [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] that model parameters (hyperparametres)
used to construct Ranking SVM model can be optimized
during the search by the pure CMA-ES algorithm.
Later proposed s∗ACM-ES extensions, referred to as
s∗ACM-ES-k [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and BIPOP-s∗ACM-ES-k [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], use
a more intensive exploitation of the surrogate model by
increasing population size in generations evaluated by the
model.
      </p>
      <p>
        More recently, a similar algorithm based on
regression surrogate model called S-CMA-ES [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has been
presented. As opposed to the former algorithm,
S-CMAES is performing continuous regression by Gaussian
processes (GP) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and random forests (RF) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        This paper compares the two mentioned surrogate
CMA-ES algorithms, s∗ACM-ES-k and S-CMA-ES, and
the original CMA-ES itself. We benchmark these
algorithms on the BBOB/COCO testing set [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] not
only in their one population IPOP-CMA-ES version,
but also in combination with the two-population-size
      </p>
    </sec>
    <sec id="sec-2">
      <title>BIPOP-CMA-ES.</title>
      <p>The remainder of the paper is organized as follows.
The next chapter briefly describes tested algorithms: the
CMA-ES, the BIPOP-CMA-ES, the s∗ACM-ES-k, and the
S-CMA-ES. Section 3 contains experimental setup and
results, and Section 4 concludes the paper and suggests
further research directions.
2</p>
      <sec id="sec-2-1">
        <title>Algorithms</title>
        <p>2.1</p>
        <p>
          The CMA-ES
In each generation g, the CMA-ES [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] generates λ new
candidate solutions xk ∈ RD, where k = 1, . . . , λ , from
a multivariate normal distribution N(m g , σ 2( )
( ) g C(g)),
where m g is the mean interpretable as the current best
( )
estimate of the optimum, σ 2(g) the step size, representing
the overall standard deviation, and C(g) the D × D
covariance matrix. The algorithm selects the μ points with the
lowest function value from λ generated candidates to
adjust distribution parameters for the next generation.
        </p>
        <p>The CMA-ES uses restart strategies to deal with
multimodal fitness landscapes and to avoid being trapped
in local optima. A multi-start strategy where the
population size is doubled in each restart is referred to as</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>IPOP-CMA-ES [1].</title>
      <p>2.2</p>
      <sec id="sec-3-1">
        <title>BIPOP-CMA-ES</title>
        <p>
          size σl0arge = σd0efault:
The BIPOP-CMA-ES [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], unlike IPOP-CMA-ES,
considers two different restart strategies. In the first one,
corresponding to the IPOP-CMA-ES, the population size is
doubled in each restart irestart using a constant initial
stepλlarge = 2irestart λdefault .
        </p>
        <p>In the second one, the smaller population size λsmall is
computed as
⎢
⎢ 1 λlarge
λsmall = ⎢⎢⎢λdefault  2 λdefault 
⎢
⎣</p>
        <p>
          U[
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ]2 ⎥⎥
⎥ ,
⎥
⎥
⎥
⎦
where U[
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] denotes the uniform distribution in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The initial step-size is also randomly drawn as</title>
      <p>
        σsmall = σdefault × 10−2U[
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ].
      </p>
      <p>0 0</p>
      <p>The BIPOP-CMA-ES performs the first run using
the default population size λdefault and the initial
stepsize σd0efault. In the following restarts, the strategy with
less function evaluations summed over all algorithm runs
is selected.
(1)
(2)
(3)
2.3</p>
      <p>
        s∗ACM-ES-k
Loshchilov’s version of the CMA-ES using the ordinal
regression by Ranking SVM as surrogate model in specific
generations instead of the original function is referred to as
s∗ACM-ES [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and its extension using a more intensive
exploitation is called s∗ACM-ES-k [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Before the main loop starts, the s∗ACM-ES-k
evaluates gstart generations by the original function, then it
repeats the following steps: First, the surrogate model
is constructed using hyperparameters θ , and the original
function-evaluated points from previous generations.
Second, the surrogate model is optimized by the CMA-ES
for gm generations with population size λ = kλ λdefault and
the number of best points μ = kμ μdefault, where kλ , kμ ≥ 1.
Third, the following generation is evaluated by the
original function using λ = λdefault and μ = μdefault. To avoid a
potential divergence when gm fluctuate between 0 and 1,
kλ &gt; 1 is used only in the case of gm ≥ gmλ , where gmλ
denotes the number of generations suitable for effective
exploitation using the model. Then the model error is
calculated according to the comparison of ranking between the
original and model evaluation of the last generation. After
that, the gm is adjusted in accordance with the model error.
As the last step, the s∗ACM-ES-k searches a
hyperparameter space by one generation of the CMA-ES minimizing
the model error to find the most suitable hyperparameter
settings θnew for the next model-evaluated generations.</p>
      <p>
        The s∗ACM-ES-k version using BIPOP-CMA-ES
proposed in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is called BIPOP-s∗ACM-ES-k.
2.4
      </p>
      <p>
        S-CMA-ES
As opposed to the former algorithms, a different
approach to surrogate model usage is incorporated in the
S-CMA-ES [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The algorithm is a modification of
CMA-ES where the original evaluating and sampling
phases are substituted by the Algorithm 1 at the beginning
of each CMA-ES generation.
      </p>
      <p>In order to avoid the false convergence of the algorithm
in the BBOB benchmarking toolbox, the model-predicted
values are adapted to never be lower then the so far
minimum of the original function (see the step 17 in the
pseudocode).</p>
      <p>The main difference between the S-CMA-ES and the
s∗ACM-ES-k is in the manner how the CMA-ES is
utilized. Considering S-CMA-ES, the model prediction
or training is performed within each generation of the
CMA-ES. On the contrary in the s∗ACM-ES-k, individual
generations of the CMA-ES are started to optimize either
original fitness, surrogate fitness, or model itself.</p>
      <sec id="sec-4-1">
        <title>Experimental Evaluation</title>
        <p>
          The core of this paper lies in a systematic comparison of
the two mentioned approaches to using surrogate
models with the CMA-ES and the original CMA-ES
algorithm itself. The first group of surrogate-based
algorithms is formed by the S-CMA-ES algorithms using
Gaussian processes and random forests models, and the
other group is formed by the s∗ACM-ES algorithm. These
four algorithms (CMA-ES, GP-CMA-ES, RF-CMA-ES,
s∗ACM-ES) are tested in their IPOP version (based on
IPOP-CMA-ES) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and in the bi-population restart
strategy version (based on BIPOP-CMA-ES and its
derivatives) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
3.1
        </p>
        <sec id="sec-4-1-1">
          <title>Experimental Setup</title>
          <p>
            The experimental evaluation is performed through the
noiseless part of the COCO/BBOB framework
(COmparing Continuous Optimizers / Black-Box Optimization
Benchmarking) [
            <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
            ]. It is a collection of 24 benchmark
functions with different degree of smoothness,
uni-/multimodality, separability, conditionality etc. Each function is
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Algorithm 1 Surrogate CMA-ES Algorithm [3]</title>
      <p>Input: g (generation), gm (number of model generations),
σ , λ , m, C (CMA-ES internal variables),
r (maximal distance between training points and m),
nREQ (minimal number of points for model training),
nMAX (maximal number of points for model training),
A (archive), fM (model), f (original fitness function)
1: xk ∼ N m, σ 2C k = 1, . . . , λ {CMA-ES sampling}
2: if g is original-evaluated then
3: yk ← f (xk) k = 1, . . . , λ {fitness evaluation }
4: A = A ∪ {(xk, yk)}kλ= 1
5: (Xtr, ytr) ← {(x, y) ∈ A S (m−x)⊺σ C−1~2(m−x) ≤ r}
6: if SXtrS ≥ nREQ then
7: (Xtr, ytr) ← choose nMAX points if SXtrS &gt; nMAX
8: {transformation to the eigenvector basis:}</p>
      <p>Xtr ← {(σ C−1~2)⊺xtr for each xtr ∈ Xtr}
9: fM ← trainModel(Xtr, ytr)
10: mark (g + 1) as model-evaluated
11: else
12: mark (g + 1) as original-evaluated
13: end if
14: else
15:
16:
17:
xk ← (σ C−1~2)⊺xk k = 1, . . . , λ
yk ← fM(xk) k = 1, . . . , λ {model evaluation}
{shift yk values if (min yk) &lt; best y from A}
yk = yk + max{0, minA y − min yk} k = 1, . . . , λ
18: if gm model generations passed then
19: mark (g + 1) as original-evaluated
20: end if
21: end if</p>
      <p>λ
Output: fM, A, (yk)k= 1
defined for any dimensionD ≥ 2; the dimensions used for
our tests are 2, 5, 10, and 20. The set of functions
comprises, among others, well-known continuous
optimization benchmarks like ellipsoid, Rosenbrock’s, Rastrigin’s,</p>
    </sec>
    <sec id="sec-6">
      <title>Schweffel’s or Weierstrass’ function.</title>
      <p>The framework calls the optimizers on 15 different
instances for each function and dimension, meaning that
1440 optimization runs were called for each of the eight
considered algorithms. The graphs at the end of the
paper show detailed results in a per-function and
per-groupof-function manner. The following paragraphs summarize
the parameters of the algorithms.</p>
      <p>
        The CMA-ES. The original CMA-ES was used in its
IPOP-CMA-ES version (Matlab code v. 3.61) with
number of restarts = 4, IncPopSize = 2, σstart = 38 , λ = 4 +
⌊3 log D⌋. The remainder settings were left default.
s∗ACM-ES. We have used Loshchilov’s GECCO 2013
Matlab code xacmes.m [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] in its s∗ACM-ES version,
setting the parameters CMAactive = 1, newRestartRules = 0
and withSurr = 1, modelType = 1, withModelEnsembles =
0, withModelOptimization = 1, hyper_lambda = 20, λMult
= 1, μMult = 1 and ΛminIter = 4.
      </p>
      <p>
        S-CMA-ES: GP5-CMA-ES and RF5-CMA-ES. The
number after the GP/RF in the names of the algorithms denotes
the number of model-evaluated generations gm, which are
evaluated by the model in row. All considered S-CMA-ES
versions use the distance r = 8 (see algorithm 1). For the
GP model, KMν= a5te´r2n covariance function with starting
val~
ues (σn2, l, σ 2f) = log(0.01, 2, 0.5) has been used (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
for the details). We have tested RF comprising 100
regression trees, each containing at least two training points in
each leaf. The CMA-ES parameters (IPOP version, σstart ,
λ , IncPopSize etc.) were used the same as in the pure
CMA-ES experiments. All S-CMA-ES parameter values
were chosen according to preliminary testing on several
functions from the COCO/BBOB framework.
      </p>
      <p>BIPOP version of the algorithms. The bi-population
versions BIPOP-CMA-ES and BIPOP-s∗ACM-ES use the
same Loshchilov’s Matlab code xacmes.m with the
parameter BIPOP = 1. The BIPOP-GP5-CMA-ES and
BIPOP-RF5-CMA-ES algorithms are constructed in the
same manner as the S-CMA-ES was transformed from the
CMA-ES – by integration of the Algorithm 1 into every
generation of the BIPOP-s∗ACM-ES.
3.2</p>
      <sec id="sec-6-1">
        <title>Results</title>
        <p>The performance of the algorithms is compared in the
graphs placed in Figures 1–3. The graphs in Figure 1
depict the expected running time (ERT), which depends on
a given target function value ft = fopt + Δ f – the true
optimum fopt of the respective benchmark function raised by
a small value Δ f . The ERT is computed over all relevant
1 Sphere
2 Ellipsoid separable
3 Rastrigin separable
4 Skew Rastrigin-Bueche separ
10
5 Linear slope
20
40
0ta2rget R3L/dim: 150 10
6 Attractive sector
20
40
20</p>
        <p>40
0ta2rget R3L/dim: 150 10
7 Step-ellipsoid
0ta2rget R3L/dim: 150 10 20
8 Rosenbrock original
40
0ta2rget R3L/dim: 150
0ta2rget R3L/dim: 150 10 20</p>
        <p>9 Rosenbrock rotated
4
3
2
1
3
2
1
4
3
2
1
4
3
2
1
3
2
1
3
2
1
40
40
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
3
2
1
3
2
1
40
40
40
4
3
2
1
4
3
2
1
4
3
2
1
3
2
1
5
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
3
2
1
4
3
2
1
4
3
2
1
40
40
40
0ta2rget R3L/dim: 150
20
40
0ta2rget R3L/dim: 150
20
40</p>
        <p>0ta2rget R3L/dim: 150
10
10 Ellipsoid
separable fcts
log10 of (# f-evals / dimension)
ill-conditioned fcts
BIPOP-s
BIPOP-saACMES
BIPOP-G
BIPOP-GP5
RF5-CMA
RF5-CMAES
saACMES
saACMES
BBIIPPOOPP--RRF5
s
r
i
g
r
a
t
o
i
t
u
f
o
r
s
r
i
g
r
a
t
o
i
t
u
f
f10-14,5-D
f15-19,5-D
a
a
e
e
c
c
0.8
p
t
g
r
a
t
o
i
t
u
f
0.4
o
r
0.0
1.0
s
r
i
g
r
a
t
o
i
t
u
f
0.4
o
r
0.0
1.0
s
r
i
g
r
a
t
o
i
t
dimension (FEvals/DIM) for all functions and subgroups in 5-D. The targets are chosen from 10[−
8..2
bestGECCO2009 artificial algorithm just not reached them within a given budget ofk</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>DIM, with k</title>
      <p>×
∈ {
separable fcts
log10 of (# f-evals / dimension)
ill-conditioned fcts
1.0
s
r
i
g
r
a
t
o
i
t
u
f
0.4
o
r
0.0
1.0
s
r
i
g
r
a
t
o
i
t
u
f
0.4
o
r
0.0
1.0
s
r
i
g
r
a
t
o
i
t
dimension (FEvals/DIM) for all functions and subgroups in 20-D. The targets are chosen from 10[−
8..2
bestGECCO2009 artificial algorithm just not reached them within a given budget ofk</p>
    </sec>
    <sec id="sec-8">
      <title>DIM, with k</title>
      <p>
        ×
∈ {
saACMES
saACMES
BBIIPPOOPP--RRF5
saACMES
saACMES
GP5-CMA
GP5-CMAES
BIPOP-G
BIPOP-GP5
BIPOP-R
BIPOP-RF5
RRFF55--CCMMAAES
s
r
i
g
r
a
t
o
i
t
u
f
o
r
s
r
i
g
r
a
t
o
i
t
u
f
BIPOP-s
BIPOP-saACMES
BIPOP-G
BIPOP-GP5
RF5-CMA
RF5-CMAES
trials as the number of the original function evaluations
(FEs) executed during each trial until the best function
value reached ft, summed over all trials and divided by
the number of trials that actually reached ft [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>As we can see in Figure 1, the 24 functions can be
roughly divided into two groups according to the
algorithm which performed the best (at least in 10D and 20D).
The first group of functions where the CMA-ES performed
best consists of functions 1, 3, 4, 6, and 20 while on
functions 2, 5, 7, 10, 11, 13–16, 18, 21, 23, and 24,
GP5-CMAES is usually better. The usage of the BIPOP versions
generally leads to no improvement or even to performance
decrease.</p>
      <p>The graphs in Figures 2 and 3 summarize the
performance over subgroups of the benchmark functions and
show the proportion of algorithm runs that reached the
target value ft ∈ 10[−8..2] indeed ( ft was actually
different for each respective function, see the figures captions).
Roughly speaking, the higher the colored line, the better
the performance of the algorithm is for the number of the
original evaluations given on the horizontal axis.</p>
      <p>
        Thus we can see that our GP5-CMA-ES usually
outperforms the other algorithms when we consider the
evaluations budget FEs ≤ 101.5D, i.e. FEs ≤ 150 for 5D and
FEs ≤ 600 for 20D. However, as the number of the
considered original evaluations rises, the original CMA-ES or the
s∗ACM-ES usually performs better. This fact can be
summarized that our GP5-CMA-ES is convenient especially
for the applications where a very low number of function
evaluations is available, such as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
4
      </p>
      <sec id="sec-8-1">
        <title>Conclusions &amp; Future Work</title>
        <p>
          In this paper, we have compared the surrogate-assisted
SCMA-ES, which uses GP and RF continuous regression
models, with s∗ACM-ES-k algorithm based on ordinal
regression by Ranking SVM, and the original CMA-ES, all
in their IPOP and BIPOP versions. The comparison shows
that Gaussian process S-CMA-ES usually outperforms the
ordinal-based s∗ACM-ES-k in early stages of the
algorithm search, especially on multimodal functions (BBOB
functions 15–24). However, the algorithms and surrogate
models should be further analyzed and compared since,
for example, the NEWUOA [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] or SMAC [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ]
algorithms spend a considerably lower number of function
evaluations than the CMA-ES in these early optimization
phases. The BIPOP versions of the algorithms did not
increased performances of appropriate IPOP versions except
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>BBOB function 5.</title>
      <p>A natural perspective of improving S-CMA-ES is to
make the number of model-evaluated generations
selfadaptive. We will additionally investigate different
properties of continuous and ordinal regression in view of their
applicability as regression models. Different cases and
benchmarks where the ordinal regression is clearly
superior to continuous regression will be further identified. For
example, hybrid surrogate models combining both kinds
of regression will be attempted.</p>
      <sec id="sec-9-1">
        <title>Acknowledgements</title>
        <p>This work was supported by the Czech Science
Foundation (GACˇ R) grant P103/13-17187S, by the Grant Agency
of the Czech Technical University in Prague with its grant
No. SGS14/205/OHK4/3T/14, and by the project
“National Institute of Mental Health (NIMH-CZ)”, grant
number CZ.1.05/2.1.00/03.0078 (and the European Regional
Development Fund.). Further, access to computing and
storage facilities owned by parties and projects
contributing to the National Grid Infrastructure MetaCentrum,
provided under the programme “Projects of Large
Infrastructure for Research, Development, and Innovations”
(LM2010005), is greatly appreciated.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Auger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hansen</surname>
          </string-name>
          , N.:
          <article-title>A restart CMA evolution strategy with increasing population size</article-title>
          .
          <source>In: The 2005 IEEE Congress on Evolutionary Computation</source>
          <volume>2</volume>
          ,
          <fpage>1769</fpage>
          -
          <lpage>1776</lpage>
          , IEEE, Sept. 2005
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Baerns</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holenˇa</surname>
          </string-name>
          , M.:.
          <article-title>Combinatorial development of solid catalytic materials. Design of high-throughput experiments, data analysis, data mining</article-title>
          . Imperial College Press / World Scientific, London,
          <year>2009</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bajer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pitra</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Holenˇa, M.:
          <article-title>Benchmarking gaussian processes and random forests surrogate models on the BBOB noiseless testbed</article-title>
          .
          <source>In: Proceedings of the 17th GECCO Conference Companion</source>
          , Madrid,
          <year>July 2015</year>
          , ACM, New York
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Classification and regression trees</article-title>
          .
          <source>Chapman &amp; Hall/CRC</source>
          , 1984
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>The CMA evolution strategy: A comparing review</article-title>
          . In: J. A.
          <string-name>
            <surname>Lozano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Larranaga</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Inza</surname>
          </string-name>
          , E. Bengoetxea, (eds),
          <source>Towards a New Evolutionary Computation, 192 in Studies in Fuzziness and Soft Computing</source>
          ,
          <fpage>75</fpage>
          -
          <lpage>102</lpage>
          , Springer Berlin Heidelberg, Jan. 2006
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed</article-title>
          .
          <source>In: Proceedings of the 11th Annual GECCO Conference Companion: Late Breaking Papers</source>
          , GECCO'
          <volume>09</volume>
          ,
          <fpage>2389</fpage>
          -
          <lpage>2396</lpage>
          , New York, NY, USA,
          <year>2009</year>
          , ACM
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ros</surname>
          </string-name>
          , R.:
          <article-title>Real-parameter black-box optimization benchmarking 2012: Experimental setup</article-title>
          .
          <source>Technical Report</source>
          , INRIA,
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions</article-title>
          .
          <source>Technical Report RR-6829</source>
          , INRIA,
          <year>2009</year>
          , updated February 2010
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ros</surname>
          </string-name>
          , R.:
          <article-title>Benchmarking a weighted negative covariance matrix update on the BBOB-2010 noiseless testbed</article-title>
          .
          <source>In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation</source>
          , GECCO'
          <volume>10</volume>
          ,
          <fpage>1673</fpage>
          -
          <lpage>1680</lpage>
          , New York, NY, USA,
          <year>2010</year>
          , ACM
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
          </string-name>
          , K.:
          <article-title>Sequential modelbased optimization for general algorithm configuration</article-title>
          . In: C. Coello, (ed.),
          <source>Learning and Intelligent Optimization</source>
          <year>2011</year>
          , volume
          <volume>6683</volume>
          of Lecture Notes in Computer Science,
          <volume>507</volume>
          -
          <fpage>523</fpage>
          , Springer Berlin Heidelberg,
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An evaluation of sequential model-based optimization for expensive blackbox functions</article-title>
          .
          <source>In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation</source>
          ,
          <source>GECCO'13 Companion</source>
          ,
          <fpage>1209</fpage>
          -
          <lpage>1216</lpage>
          , New York, NY, USA,
          <year>2013</year>
          , ACM
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Alternative restart strategies for CMA-ES</article-title>
          . In: C.
          <string-name>
            <surname>A. C. Coello</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Cutello</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Deb</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Forrest</surname>
            , G. Nicosia, and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pavone</surname>
          </string-name>
          , (eds),
          <source>PPSN (1)</source>
          , volume
          <volume>7491</volume>
          of Lecture Notes in Computer Science,
          <volume>296</volume>
          -
          <fpage>305</fpage>
          , Springer,
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy</article-title>
          .
          <source>In: Proceedings of the 14th GECCO</source>
          , GECCO '
          <volume>12</volume>
          ,
          <fpage>321</fpage>
          -
          <lpage>328</lpage>
          , New York, NY, USA,
          <year>2012</year>
          , ACM
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>BI-population CMA-ES algorithms with surrogate models and line searches</article-title>
          .
          <source>In: Genetic and Evolutionary Computation Conference (GECCO Companion)</source>
          ,
          <fpage>1177</fpage>
          -
          <lpage>1184</lpage>
          , ACM Press,
          <year>July 2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Intensive surrogate model exploitation in self-adaptive surrogate-assisted CMA-ES (saACM-ES)</article-title>
          .
          <source>In: Genetic and Evolutionary Computation Conference (GECCO)</source>
          ,
          <fpage>439</fpage>
          -
          <lpage>446</lpage>
          , ACM Press,
          <year>July 2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Powell</surname>
            ,
            <given-names>M. J. D.</given-names>
          </string-name>
          :
          <article-title>The NEWUOA software for unconstrained optimization without derivatives</article-title>
          . In: G. D.
          <string-name>
            <surname>Pillo</surname>
          </string-name>
          , M. Roma, (eds),
          <source>Large-Scale Nonlinear Optimization, number 83 in Nonconvex Optimization and Its Applications</source>
          ,
          <fpage>255</fpage>
          -
          <lpage>297</lpage>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          ,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Rasmussen</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C. K. I.</given-names>
          </string-name>
          :
          <article-title>Gaussian processes for machine learning</article-title>
          .
          <source>Adaptative Computation and Machine Learning Series</source>
          , MIT Press,
          <year>2006</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>