<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Testing Gaussian Process Surrogates on CEC'2013 multi-modal benchmark</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikita Orekhov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukáš Bajer</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <email>martin@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Academy of Sciences</institution>
          ,
          <addr-line>Prague</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University</institution>
          ,
          <addr-line>Prague</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Mathematics and Physics, Charles University</institution>
          ,
          <addr-line>Prague</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>138</fpage>
      <lpage>146</lpage>
      <abstract>
        <p>This paper compares several Gaussian-processbased surrogate modeling methods applied to black-box optimization by means of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is considered state-of-the-art in the area of continuous black-box optimization. Among the compared methods are the Modelassisted CMA-ES, the Robust Kriging Metamodel CMAES, and the Surrogate CMA-ES. In addition, a very successful surrogate-assisted self-adaptive CMA-ES, which is not based on Gaussian processes, but on ordinary regression by means of support vector machines has been included into the comparison. Those methods have been benchmarked using CEC'2013 testing functions. We show that the surrogate CMA-ES achieves best results at the beginning and later phases of optimization process, conceding in the middle to surrogate-assisted CMA-ES.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Evolutionary computation has been successfully applied
to a wide spectrum of engineering problems. The
randomized exploration guided by a set of candidate solutions
(population) may be resistant against most optimization
obstacles such as noise or multi-modality. It makes
evolutionary algorithms (EAs) suitable for black-box
optimization where an analytic form of the objective function is
not known. In such cases, values of the objective function
evaluations are the only available information to optimize
the function.</p>
      <p>
        The evaluation of the objective function can often be
costly, i.e., it takes a significant amount of time and/or
money. Typically, a black-box objective function is
evaluated empirically during some experiment such as gas
turbine profiles optimization [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or protein folding stability
optimization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Today, a state-of-the-art among evolutionary black-box
optimizers is the Covariance Matrix Adaptation evolution
strategy (CMA-ES), which was initially introduced in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
However, as any other evolutionary optimization methods,
CMA-ES may suffer from insufficient convergence speed
w.r.t. the budget of expensive function evaluations. This
can be avoided by the introduction of so-called surrogate
models (aka metamodels), which provide regression-based
predictions of the expensive objective function values.
      </p>
      <p>
        Past works on surrogate-assisted optimization showed
that models based on Gaussian Processes (GP) can lead to
a significant reduction as to the number of evaluations of
the original objective function [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The aim of this
paper is to compare several GP-based surrogate-modeling
techniques against another successful surrogate
modeling method used in connection with CMA-ES. In
particular, we examine the POI-based approach [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the
robust kriging metamodel [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the surrogate CMA-ES
(SCMA-ES) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and compare them against the support
vector machines-based method called s∗ACM-ES [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the
basic CMA-ES without a surrogate model. The
comparison has been performed on the multi-modal benchmark
from CEC’2013.
      </p>
      <p>The remainder of the paper is organized as follows.
Section 2 introduces the necessary background of the
Covariance Matrix Adaptation ES and Gaussian processes in the
context of black-box optimization. Section 3 briefly
recalls tested surrogate models, while section 4 describes the
performed experiments and comments on obtained results.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Surrogate modeling in black-box optimization
In the context of black-box optimization by CMA-ES, a
surrogate model is built using some points sampled by the
CMA-ES and their objective values. Then, the model is
used to predict the quality of the sampled points in order
to reduce number of function evaluations. However, the
points should be carefully selected. Generally, there are
two main ways to select them. The first approach proposed
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] means that in each iteration of the overall
optimization algorithm, one replaces the original objective function
by its surrogate model, estimates the model’s global
optimum and evaluates it with the original objective function.
      </p>
      <p>
        The other way consists in selecting a controlled fraction
of individuals that should be evaluated on the original
objective function. Such approach is thus called evolution
control (EC) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. More specifically, costly evaluating the
original objective for the entire population only at certain
generations is called generation-based EC, while
evaluating it only for a part of the population at each generation
is called individual-based EC.
      </p>
      <p>Using a surrogate model to find the global optimum, one
exploits model’s knowledge regarding the unknown
objective function. However, this can potentially prevent the
optimization algorithm from convergence to an optimal
solution due to unexplored regions that were incorrectly
modeled. To protect the model from making such predictions
there is a need for some mechanism which cares about the
exploration of new regions in the solution space. The merit
functions were introduced precisely for this reason: they
incorporate both exploration and exploitation patterns into
the decisions performed during the surrogate modeling.</p>
      <p>The variance σ 2 of the model predictions may be used
as a merit function. Thus, the larger the variance, the
higher the uncertainty of the predicted objective function
value. The merit function is expressed as:</p>
      <p>fvar(x) = σ 2(x),
where σ 2(x) is the variance of the model’s prediction at x.</p>
      <p>Lower confidence bound (LCB) is another merit
function, which has the following form:</p>
      <p>fα (x) = fˆ(x) − ασ 2(x),
where α ≥ 0 balances between the exploration and
exploitation and fˆ(x) is the model’s prediction of the
objective function value at x.</p>
      <p>The next merit function is a probability of improvement
(POI). For any given solution vector x, the model
predictions about the function value at x is a realization of the
random variable Y (x). Having fmin as the current best
obtained value of the original objective function, we define
for any T ≤ fmin the probability of improvement as:
fT(x) = p(Y (x) ≤ T ).
(1)
2.2</p>
      <sec id="sec-2-1">
        <title>Gaussian processes</title>
        <p>Gaussian process is a random process indexed by Rn
such that its marginal indexed by a finite set of points
X N = {x1, . . . , xN } ⊂ Rn has an N-dimensional Gaussian
distribution. Evaluating the objective function at those
points results in a vector of function values t N ∈ RN . If
that vector is viewed as a particular realization of the
respective Gaussian distribution, the GP provides a
prediction of the distribution of the function value tN+1 at some
point xN+1.</p>
        <p>Gaussian process models are determined by their mean
and covariance matrix. This matrix is constructed by
applying a covariance function k : Rn × Rn 7→ R to each pair
of points in X N . The covariance functions have a certain
set of parameters, usually called hyper-parameters due to
the fact that the covariance function itself is a parameter of
the Gaussian process. Typically, the hyper-parameters are
obtained through the maximum likelihood estimation.</p>
        <p>
          We recall the covariance functions employed in the
observed methods. Note that the hyper-parameters below are
marked as θ . The first one is exponential covariance
function. It is used in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and has the following form:
kE(x p, xq) = exp
        </p>
        <p>r
− θ ,
where r = kx p − xqk is the distance between points.</p>
        <p>
          The next covariance function is called squared
exponential. This is used in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and its basic variant can be
expressed as:
kSE(x p, xq) = exp
        </p>
        <p>r2
− θ .</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the isotropic Matérn covariance function kM5/a2térn is
used in its form:
kM5/a2térn(x p, xq) = θ1 1 +
√5r
θ2
+
5r2 !
3θ22
exp
− θ2
√5r !
.
        </p>
        <p>Having defined the covariance matrix for N points, an
extension of C after including an (N + 1)th point reads:
CN+1 =</p>
        <p>
          CN
kT
k
κ
,
where k = k(xN+1, X N ) is an N-dimensional real vector of
covariances between the new point and points from
training set, while κ = k(xN+1, xN+1) is a variance of the new
point [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Consequently,
tN+1 ∼ N (μN+1, σN2+1),
(2)
where μN+1 = kTCN−1t N is the predictive mean and
σN2+1 = κ − kTCN−1k is the predictive variance.
        </p>
        <p>In the context of Gaussian processes, we have fmin =
min(t1, . . . ,tN ). Then, considering distribution from (2),
the POI criterion can be rewritten from (1) as follows:
fT(x) = Φ</p>
        <p>T − tˆ(x) !
σ (x)
,
where Φ is the cumulative distribution function of the
normal distribution N (0, 1).</p>
        <p>Areas with high POI value have a high probability to
sample point with objective value better than fmin.
Regions with model prediction tˆ(x) ≫ fmin will have POI
value close to zero, which encourages the model to search
somewhere else. The POI value becomes large when the
variance (i.e. σ 2) is large, which is typical for unexplored
areas. Therefore, the use of POI may be useful for dealing
with multi-modal functions.
2.3</p>
        <p>CMA-ES
The CMA-ES employs the concept of the adaptation of
internal variables from the data. Particularly, it adapts
several components such as mutation step size, distribution
mean and the covariance matrix, which represents a local
approximation of the function landscape.</p>
        <p>
          In the CMA-ES, a generation of candidate solutions is
usually obtained by sampling a normally-distributed
random variable:
x(kg+1) ∼ m(g) + σ (g)N
where λ ≥ 2 is a population size, x(kg+1) ∈ Rn is the k-th
offspring in g + 1-th generation, m(g) ∈ Rn is the mean of
the sampling distribution, σ (g) ∈ R+ is a step-size, C(g) ∈
Rn×n is a covariance matrix. Below we list basic equations
for the adaptation of the mean, step-size and covariance
matrix used in CMA-ES. For details we refer to [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The mean of the sampling distribution is a weighted
average of μ best-ranked points from x(g+1), . . . , x(λg+1):
1
m(g+1)
=
μ</p>
        <p>(g+1),
∑ wi xi:λ
i=1
μ
where ∑i=1 wi = 1, w1 ≥ w2 ≥ · · · ≥ wμ &gt; 0 are weight
coefficients and i : λ notation means i-th best individual
out of x(1g+1), . . . , x(λg+1).</p>
        <p>
          The covariance matrix C, being initially set to identity
matrix, is updated by rank-one-update and rank-μ-update
during each iteration. A cumulation strategy [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is applied
to combine consecutive steps in the search space.
        </p>
        <p>C(g+1) = (1 − c1 − cμ ) C(g)
μ
+ c1 p(cg+1) p(cg+1)⊤ + cμ ∑ wi yi(:gλ+1)yi(:gλ+1)⊤,
|rank-on{ezupdate} i|=1 }
rank-μ{zupdate
where c1 and cμ are learning rates for rank-one-update and
(g+1) is an evolution path and yi(:gλ+1) =
rank-μ-update, pc
xi(:gλ+1) − m(g) /σ (g).</p>
        <p>The step length update reads:
σ (g+1) = σ (g)exp
cσ
dσ</p>
        <p>kp(σg+1)
EkN (0, Ik)k − 1
!!</p>
        <p>,
(g+1) is an evolution path.
where cσ is a learning rate and pσ</p>
        <p>
          The CMA-ES is considered as a local optimizer. So in
case of multi-modal functions, it tends to end up in a
local optima. Hence, to reduce the influence of such
behavior, several restart strategies were introduced. The perhaps
best known one is the IPOP-CMA-ES [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In this
modification, an optimization process is being interrupted several
times and independently restarted with the population size
increased by a certain factor (typically 2). IPOP version is
set default for all of the CMA-ES algorithms throughout
the article.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Tested methods</title>
      <sec id="sec-3-1">
        <title>POI MAES</title>
        <p>
          Ulmer et al. refer to this algorithm as Probability
of Improvement Model-assisted Evolution Strategy (POI
MAES) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The method uses a modification of the
covariance function SE, which has n + 2 hyper-parameters,
obtained by likelihood maximization:
        </p>
        <p>k(x p, xq, θ ) = θ1exp
and δpq =
(0, if p 6= q</p>
        <p>1, if p = q
where ri2 are length-scales for the individual dimensions</p>
        <p>
          The model is incorporated into the CMA-ES via
individual-based EC, which the authors call pre-selection.
In every iteration, λpre &gt; λ new individuals are sampled
from μ parents and evaluated on the surrogate model
using the POI merit function. Then, the λ offsprings with
the highest POI are selected and evaluated on the
objective function. In the end, the surrogate model is updated.
The approach below was designed to provide robustness
approximations of the objective function values. It is also
combined with a special method to select promising
solutions [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>One can imagine robustness approximation as an
attempt to predict function values within noisy or imprecise
environment. Considering expensive optimization, it
becomes extremely important to make precise predictions for
noisy problems as it requires a certain trade-off between
limiting the noise and reducing the number of evaluations.</p>
        <p>The robustness approximation is achieved in a way that
the local model is trained around every point xk to be
estimated. To achieve this, the algorithm chooses nkrig
pairs (x,t) in a specific way described below from a given
archive A and stores them in a local training set D . If the
amount of points does not suffice, the algorithm returns
also a set X cand = {x1, . . . , xl } of length l = nkrig − |D |.
Points from X cand are then evaluated with the original
objective function and added to both A and D .</p>
        <p>The procedure to select the pairs (x,t) from the archive
A is as follows. First, the nkrig points are generated via the
Latin hypercube sampling method and are stored in a
reference set R. Then, for every point x ∈ R (subsequently
called reference point), the closest x from A is assigned.
An important note here is that the reference point from
R must be closest to the archive point x as well. If this
is the case, the archive point with its corresponding
objective function value are added to the training set.
Otherwise, the reference point is considered a suitable candidate
for sampling and added to X cand. When all points from R
are assigned, the reference point from the assigned pair
(archive point, reference point), for which the distance
between both points is the largest among all assigned pairs,
is selected and evaluated.</p>
        <p>Using the procedure above a separate training set is
selected for every point x(g) to be estimated at generation g
and the model is trained. Then, the fitness value at x(g) is
estimated according to so-called multi evaluation method
(MEM). In this method the approximation is obtained as a
mean value of several approximations at points with
random perturbation in the input space, i.e.:
fˆMEM(x) =
1 m</p>
        <p>∑ fˆ(x + δ i),
m i=1
where m is the predefined amount of points to perform
approximation and δ i denotes a random variation.
3.3</p>
        <p>
          S-CMA-ES
Another surrogate-assisted approach is called Surrogate
CMA-ES (S-CMA-ES). It was initially proposed in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
and later extended to the Doubly Trained S-CMA-ES
(DTS-CMA-ES) in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The S-CMA-ES employs the
Matérn covariance function mentioned in 2.2 with
hyperparameters θ = {σ 2f, l, σn2} optimized by the
maximumlikelihood approach.
        </p>
        <p>The S-CMA-ES algorithm uses generation-based
evolution control which means that an original-evaluated
generation and model-evaluated generations interleave. During
the original generations, all the points are evaluated by the
expensive objective function. In every model generation,
at least nMIN archive points have to be selected to train
the surrogate model. The selection process is restricted to
points that do not have the Mahalanobis distance from the
CMA-ES’ mean value m larger than some prescribed limit
r:</p>
        <p>q</p>
        <p>D ← {(x, t) ∈ A | (m − x)⊤(σ 2C)−1(m − x) ≤ r},
where C is the CMA-ES covariance matrix and σ is the
step-size at the considered generation.</p>
        <p>If there are not enough points, building the model is
postponed and all fitness evaluations are performed on the
original objective function. When there are enough points,
at most nMAX points are selected via k-NN clustering to
train a GP model.</p>
        <p>
          The extension DTS-CMA-ES selects norig &lt; λ most
uncertain points w.r.t. the employed merit function and
evaluates them on the original objective function. Then,
the surrogate model is being retrained on the archive
extended with the newly evaluated norig points. Finally, the
λ − norig remaining points function values are predicted by
the model and returned to the CMA-ES.
3.4 s*ACM-ES
This subsection describes a method called Self-adaptive
Surrogate-assisted CMA-ES (s∗ACM-ES) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which is
not GP-based, but is used for comparison.
        </p>
        <p>This method employs the ordinal regression based on
Ranking support vector machines. Evaluations made by
such model provide only rankings of the points in order to
achieve the invariance to the rank-preserving
transformations of f . The method adapts the model hyper-parameters
in an on-line manner. In addition, it also depends on much
larger population size while operating on surrogate and
preserving original population size for the original
objective function evaluations.</p>
        <p>A generation-based EC is used, the number nˆ of
modelevaluated generations is adjusted according to the model
error, assessed in the interleaving generation in which the
original objective function is evaluated.</p>
        <p>The algorithm adjusts nˆ by a linear function inversely
proportional to a global model error Err(θ ). The global
error is updated with some relaxation from a local error on
λ most recent evaluated points. Denoting Λ to be the set
of those λ points, the local error is estimated as follows:
Err(θ ) =
2 |Λ| |Λ|</p>
        <p>∑ ∑ wi, j · 1 fˆθ ,i, j ,
|Λ|(|Λ| − 1) i=1 j=i+1
where 1 fˆθ ,i, j is true if fˆ violates the ordering of pair (i, j)
given by the real objective function f and wi, j defines the
weights of such violations.</p>
        <p>
          The procedure of the surrogate error optimization is
done in the end of every ES generation, where the model
hyper-parameters are optimized by one iteration of an
additional CMA-ES (referred to as CMA-ES #2 in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]).
Here, the algorithm samples λhyp different points in a
space of hyper-parameters and builds λhyp surrogate
models. Then, those models are evaluated using the Err(θ )
metric and μhyp = ⌊λhyp/2⌋ best performing are used to
update internal variables of the CMA-ES #2. The resulting
mean of the hyper-parameter distribution is used to obtain
θ for the next generation of s∗ACM-ES.
        </p>
        <p>Finally, the standard CMA-ES configurations differ if
it optimizes the surrogate model fˆ. In such cases, it uses
larger population sizes λ = kλ λdefault and μ = kμ μdefault,
where kλ , kμ ≥ 1. In order to prevent degradation when the
model is inaccurate, kλ is also adjusted w.r.t. the Err(θ ).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>4.1 Benchmark functions</title>
        <p>
          The experiments has been conducted on a recently
proposed benchmark, introduced on the special session for
multi-modal function optimization during the Congress
of Evolutionary Computation (CEC) in 2013 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The
benchmark contains 12 noiseless multi-modal functions
being defined for dimensions 1 − 20D. Note that this
test set differs from the one used in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], where the noise
was introduced by oscillations in the input space. In
addition, the experiments has been conducted within the
BBOB framework [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] via introducing 9 new benchmark
functions. Since the CMA-ES is not able to run in 1D,
we excluded 1D functions from our experiments.
Altogether, the following set of functions was employed
(dimensionality is shown in brackets): Himmelblau (2D),
Six-Hump Camel Back (2D), Shubert (2D, 3D), Vincent
(2D, 3D), Modified Rastrigin (2D), Composition Function
1 (2D), Composition Function 2 (2D), Composition
Function 3 (2D, 3D, 5D, 10D), Composition Function 4 (3D,
5D, 10D, 20D).
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Experimental setup</title>
        <p>
          The experimental testing has been prepared according
to the BBOB framework specifications [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. First, for
every benchmark function, the global minimum is
denoted as fopt. The target function value ftarget was set to
fopt + 10−8. This value is considered as a stopping
criterion for tested algorithms. The performance of the tested
methods is measured as a difference of the best objective
value found so far tbest and global minimum of the
function: Δ f = tbest − fopt for respective numbers of original
fitness evaluations. The resulting Δ f values are calculated
for Ntrials = 15 algorithm runs for every benchmark
function and every dimension.
        </p>
        <p>
          The tested algorithms had the following settings:
• CMA-ES: the Matlab code version 3.62 of the
IPOPCMA-ES has been used, with the number of restarts
= 4, IncPopSize = 2, σstart = 83 , λ = 4 + ⌊3 logn⌋ and
setting the remaining parameters to its defaults [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
Note that since the tested surrogate methods perform
within the CMA-ES environment, they also had those
settings.
• POI MAES: the only algorithm, which does not use
the original author’s implementation. We
implemented it in Matlab according to its description in
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The λpre was set to 3λ and the number of points
to train the model to 2λ . Here we weren’t able to
confirm or deny if the algorithm replicates the results
from the original paper.
• Robust CMA: the nkrig is set to 2n, the m parameter is
10.
• S-CMA-ES: the DTS-CMA-ES extension was used,
the Mahalanobis distance limit r was set to 8, the
covariance function kM5/a2térn was used with the starting
values (σn2, l, σ 2f) = log (0.01, 2, 0.5), norig was set to
⌈0.1λ ⌉ and fvar was used as the merit function, see
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] for details.
        </p>
        <p>
          • s∗ACM-ES: uses settings from [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Results and their assessment</title>
        <p>The results of testing are now analyzed in two-stages.
During the first stage, the algorithm’s performance is analyzed
for every benchmark function and every dimension
separately. The second stage concentrates on the analysis of
results aggregated over individual functions and dimensions,
and finally over the entire benchmark set.</p>
        <p>The results of testing over individual CEC functions are
shown in Figure 1. The performance of every method has
been calculated for Ntrials runs and is represented by
the empirical median Δ mfed (straight lines) and quartiles
(translucent area) of the Δ f .</p>
        <p>It can be seen that the S-CMA-ES and s∗ACM-ES
methods have the best performance on the majority of
benchmark functions. However, for the Shubert function in 2D
and 3D and for the Composition Function 4 in 5D both
methods seem to miss the global optimum and do not
noticeably speed up the original CMA-ES. An example
visualization of the experiment on 2D Shubert function is
shown in Figure 2. In addition, the s∗ACM-ES fails to find
the global optimum for the Vincent function in 3D.</p>
        <p>The other methods lead to a deceleration of the
CMAES convergence speed in most cases. The global optimum
remains undiscovered except for the Vincent, Himmelblau
and Six-Hump Camel Back functions by POI MAES. POI
MAES was the best method on the Shubert function. Also,
this method outperformed the original CMA-ES on 20
dimensional Composite Function 4, which may indicate its
ability to speed up the CMA-ES for higher dimensions.</p>
        <p>The Robust CMA method achieved the worst results
in our experiments; comparable performance was shown
only for Shubert function. However, this can be due to the
fact that the considered benchmark functions were
noiseless whereas this method has been developed primarily for
noisy objective functions.</p>
        <p>Next, Figure 3 depicts results aggregated over different
functions, while Figure 4 shows results aggregated over
the entire benchmark set. To enable aggregation, we use
scaled logarithms of medians. First of all, the evaluation
budgets are normalized by the dimensionality. For
benchmarking, the budgets were limited to 250 function
evaluations per dimension (FE/D). Then, the scaled logarithm
Δlfog of the median Δ mfed is calculated as follows:
Δlog =
f
logΔ mfed − Δ MfIN
Δ MfAX − Δ MfIN</p>
        <p>log10(1/10−8) + log1010−8,
where Δ MfIN and Δ MfAX are the lowest and the highest
log Δ mfed values obtained by any of the compared
algorithms for the particular benchmark function f and
dimension D within the evaluation budget 250 FE/D.</p>
        <p>Figures 3 and 4 show that S-CMA-ES and s∗ACM-ES
have the fastest convergence rates. S-CMA-ES converges
fastest at the beginning of the optimization (till approx.
75FE/D) as well as at the later phases (125 − 250FE/D).
However, it slows down from 75 till 125 FE/D, being
outperformed by s∗ACM-ES, especially due to the results on
Shubert, CF3 (in 5D and 10D) and CF4 (in 5D and 20D).</p>
        <p>
          It can be seen that POI MAES finally converges (at 250
FE/D) close to the other algorithms (outperforming
CMAES and s∗ACM-ES for 3D) for low-dimensional problems.
An interesting case, however, is that POI MAES
outperforms CMA-ES on 20-dimensional f12. Such behavior
may be associated with the fact that POI-based approach
tends to explore areas with high model uncertainties which
is useful for multi-modal problems [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], especially in case
of high dimensionality, where the classic CMA-ES is not
capable to learn the landscape sufficiently. Also, slower
convergence rates of POI MAES may be explained by the
fact that the model requires more time to move from
exploration phase to exploitation. However, one can consider
the method to be not flexible enough. The reason is that the
model selects 2λ most recently evaluated points for
training which is not the best decision if the algorithm moves
far from the global optimum. Hence, a more sophisticated
selection strategy may be required to obtain better results.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        Experimental testing has shown that the best performing
among all compared methods for smaller evaluation
budgets (up to approx. 75FE/D) appears to be S-CMA-ES.
However, it is being outperformed by the s∗ACM-ES for
the budgets 75−125FE/D. For even larger evaluation
budgets the former method performed the best again. The POI
MAES has shown a slightly worse performance compared
to CMA-ES with rare improvements. However, we believe
that the method can be further enhanced by introducing a
new selection strategy. The method from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has shown
inadequate performance in our experiments. However, that
method was designed to perform in noisy environment,
which was not the case of the employed CEC benchmark.
      </p>
      <p>
        Today many experiments in the field of black-box
optimization are conducted on the functions from the BBOB
framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Thereby, this paper is considered as
an extension to the BBOB benchmark. The paper
concentrates only on the relatively small set of test functions,
which prevents from making clear conclusions. However,
we believe that the performance of the methods may be
clarified by the combination of this paper’s results with
other comparisons.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Access to computing and storage facilities owned by
parties and projects contributing to the National Grid
Infrastructure MetaCentrum, provided under the programme
"Projects of Large Research, Development, and
Innovations Infrastructures" (CESNET LM2015042), is greatly
appreciated.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Anne</given-names>
            <surname>Auger</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nikolaus</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>A restart CMA evolution strategy with increasing population size</article-title>
          .
          <source>In Evolutionary Computation</source>
          ,
          <year>2005</year>
          .
          <article-title>The 2005 IEEE Congress on</article-title>
          , volume
          <volume>2</volume>
          , pages
          <fpage>1769</fpage>
          -
          <lpage>1776</lpage>
          . IEEE,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Lukáš</given-names>
            <surname>Bajer</surname>
          </string-name>
          , Zbyneˇk Pitra, and
          <article-title>Martin Holenˇa. Benchmarking Gaussian processes and random forests surrogate models on the BBOB noiseless testbed</article-title>
          .
          <source>In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation</source>
          , pages
          <fpage>1143</fpage>
          -
          <lpage>1150</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dirk</given-names>
            <surname>Büche</surname>
          </string-name>
          ,
          <string-name>
            <surname>Nicol N Schraudolph</surname>
            , and
            <given-names>Petros</given-names>
          </string-name>
          <string-name>
            <surname>Koumoutsakos</surname>
          </string-name>
          .
          <article-title>Accelerating evolutionary algorithms with Gaussian process fitness function models</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C:
          <article-title>Applications</article-title>
          and Reviews,
          <volume>35</volume>
          (
          <issue>2</issue>
          ):
          <fpage>183</fpage>
          -
          <lpage>194</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>John</surname>
            <given-names>C Chaput</given-names>
          </string-name>
          and Jack W Szostak.
          <article-title>Evolutionary optimization of a nonbiological ATP binding protein for improved folding stability</article-title>
          .
          <source>Chemistry &amp; biology</source>
          ,
          <volume>11</volume>
          (
          <issue>6</issue>
          ):
          <fpage>865</fpage>
          -
          <lpage>874</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Nikolaus</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>The CMA evolution strategy: a comparing review</article-title>
          .
          <source>In Towards a new evolutionary computation</source>
          , pages
          <fpage>75</fpage>
          -
          <lpage>102</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Nikolaus</given-names>
            <surname>Hansen</surname>
          </string-name>
          , Anne Auger,
          <string-name>
            <surname>Steffen Finck</surname>
            , and
            <given-names>Raymond</given-names>
          </string-name>
          <string-name>
            <surname>Ros</surname>
          </string-name>
          .
          <article-title>Real-parameter black-box optimization benchmarking 2010: Experimental setup</article-title>
          .
          <source>INRIA</source>
          ,
          <year>2010</year>
          . &lt;inria00462481&gt;.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Nikolaus</given-names>
            <surname>Hansen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Ostermeier</surname>
          </string-name>
          .
          <article-title>Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation</article-title>
          .
          <source>In Evolutionary Computation</source>
          ,
          <year>1996</year>
          ., Proceedings of IEEE International Conference on, pages
          <fpage>312</fpage>
          -
          <lpage>317</lpage>
          . IEEE,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Yaochu</given-names>
            <surname>Jin</surname>
          </string-name>
          .
          <article-title>A comprehensive survey of fitness approximation in evolutionary computation</article-title>
          .
          <source>Soft computing</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Johannes</surname>
            <given-names>W Kruisselbrink</given-names>
          </string-name>
          , Michael Emmerich,
          <string-name>
            <surname>André H Deutz</surname>
            , and
            <given-names>Thomas</given-names>
          </string-name>
          <string-name>
            <surname>Bäck</surname>
          </string-name>
          .
          <article-title>A robust optimization approach using kriging metamodels for robustness approximation in the CMA-ES</article-title>
          .
          <source>In Evolutionary Computation (CEC)</source>
          ,
          <source>2010 IEEE Congress on</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Xiaodong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Andries</given-names>
            <surname>Engelbrecht</surname>
          </string-name>
          , and Michael G Epitropakis.
          <article-title>Benchmark functions for CEC'2013 special session and competition on niching methods for multimodal function optimization</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ilya</surname>
            <given-names>Loshchilov</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Michèle</given-names>
            <surname>Sebag</surname>
          </string-name>
          .
          <article-title>Intensive surrogate model exploitation in self-adaptive surrogate-assisted CMA-ES (saACM-ES)</article-title>
          .
          <source>In Proceedings of the 15th annual conference on Genetic and evolutionary computation</source>
          , pages
          <fpage>439</fpage>
          -
          <lpage>446</lpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Zbyneˇk</given-names>
            <surname>Pitra</surname>
          </string-name>
          , Lukáš Bajer, and
          <article-title>Martin Holenˇa. Doubly trained evolution control for the surrogate CMA-ES</article-title>
          .
          <article-title>Accepted for publication at PPSN 2016</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Holger</surname>
            <given-names>Ulmer</given-names>
          </string-name>
          , Felix Streichert, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Zell</surname>
          </string-name>
          .
          <article-title>Evolution strategies assisted by Gaussian processes with improved preselection criterion</article-title>
          .
          <source>In Evolutionary Computation</source>
          ,
          <year>2003</year>
          . CEC'
          <volume>03</volume>
          .
          <article-title>The 2003 Congress on</article-title>
          , volume
          <volume>1</volume>
          , pages
          <fpage>692</fpage>
          -
          <lpage>699</lpage>
          . IEEE,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>