<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Adaptive Doubly Trained Evolution Control for the Covariance Matrix Adaptation Evolution Strategy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zbyneˇk Pitra</string-name>
          <email>z.pitra@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukáš Bajer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Repický</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Physics, Charles University in Prague Malostranské nám.</institution>
          <addr-line>25, 118 00 Prague 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague Brˇehová 7</institution>
          ,
          <addr-line>115 19 Prague 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2</institution>
          ,
          <addr-line>182 07 Prague 8</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Institute of Mental Health Topolová 748</institution>
          ,
          <addr-line>250 67 Klecany</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>1885</volume>
      <fpage>120</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>An area of increasingly frequent applications of evolutionary optimization to real-world problems is continuous black-box optimization. However, evaluating realworld black-box fitness functions is sometimes very timeconsuming or expensive, which interferes with the need of evolutionary algorithms for many fitness evaluations. Therefore, surrogate regression models replacing the original expensive fitness in some of the evaluated points have been in use since the early 2000s. The Doubly Trained Surrogate Covariance Matrix Adaptation Evolution Strategy (DTS-CMA-ES) represents a surrogate-assisted version of the state-of-the-art algorithm for continuous blackbox optimization CMA-ES. The DTS-CMA-ES saves expensive function evaluations through using a surrogate model. However, the model inaccuracy on some functions can slow-down the algorithm convergence. This paper investigates an extension of DTS-CMA-ES which controls the usage of the model according to the model's error. Results of testing an adaptive and the original version of DTS-CMA-ES on the set of noiseless benchmarks are reported.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Evolutionary algorithms have become very successful in
continuous black-box optimization. That is, in
optimization where no mathematical expression of the optimized
function is available, neither an explicit nor implicit one,
and it is necessary to empirically evaluate the fitness
functions through series of measurements or simulations.</p>
      <p>Considering real-world applications, the evaluation of
a black-box function can be very time-consuming or
expensive. Taking into account this property, the
optimization method should evaluate as small amount of points as
possible and still reach the target distance to the function
optimal value.</p>
      <p>
        The Covariance Matrix Adaptation Evolution Strategy
(CMA-ES) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is considered to be the state-of-the-art
optimization algorithm for continuous black-box
optimization. On the other hand, the CMA-ES can consume many
evaluations to find the optimum of the expensive fitness
function. This property resulted in the development of
several surrogate-assisted versions of the CMA-ES (an
overview can be found in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), where a part of
evaluations is performed by a regression surrogate model instead
of the original fitness function.
      </p>
      <p>
        The local meta-model CMA-ES (lmm-CMA-ES),
proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and later improved in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], builds a quadratic
regression model for each point using a set of points
already evaluated by the fitness function. The
convergence of the algorithm is speeded-up by using a control
of changes in population ranking after the fraction of the
offspring is evaluated by the original fitness.
      </p>
      <p>
        A different surrogate-assisted approach, utilizing an
ordinal SVM to estimate the ranking of the fitness function
values, called s∗ACM-ES, has been introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
later improved in BIPOP-s∗ACM-ES-k [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to be more
robust against premature convergence to local optima. The
parameters of the SVM surrogate model are themselves
optimized using the CMA-ES algorithm.
      </p>
      <p>
        In 2016, the Doubly Trained Surrogate CMA-ES
(DTSCMA-ES) algorithm, using the ability of Gaussian
processes to provide the distribution of predicted points, was
introduced in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The algorithm employs uncertainty
criteria to choose the most promising points to be evaluated
by the original fitness.
      </p>
      <p>
        Results obtained with the three above-mentioned
surrogate-assisted algorithms on noiseless functions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
suggest that on some fitness functions (e. g.,attractive
sector function) the surrogate model happens to suffer from
a loss of accuracy. Whereas the first of these algorithms
controls the number of points evaluated by the original
fitness function to prevent the model from misleading the
search, the DTS-CMA-ES has the amount of evaluated
points fixed. Therefore, some control of the amount of
points evaluated by the original fitness in each generation
could speed-up the DTS-CMA-ES convergence.
      </p>
      <p>This paper extends the original DTS-CMA-ES with an
online adaptation of the number of the points evaluated
by the original fitness. This extended version of
DTS</p>
      <sec id="sec-1-1">
        <title>Algorithm 1 DTS-CMA-ES [10]</title>
        <p>
          Input: λ (population-size), ytarget (target value),
f (original fitness function), α (ratio of
originalevaluated points), C (uncertainty criterion)
1: σ , m, C ← CMA-ES initialize
2: A ← ∅ {archive initialization}
3: while minimal yk from A &gt; ytarget do
4: {xk}kλ=1 ∼ N m, σ 2C {CMA-ES sampling}
5: fM1 ← trainModel(A, σ , m, C) {model training}
6: (yˆ, s2) ← fM1([x1, . . . , xλ ]) {model evaluation}
7: Xorig←select ⌈αλ⌉ best points accord. to C(yˆ, s2)
8: yorig ← f (Xorig) {original fitness evaluation }
9: A = A ∪ {(Xorig, yorig)} {archive update}
10: fM2 ← trainModel(A, σ , m, C) {model retrain}
11: y ← fM2([x1, . . . , xλ ]) {2nd model prediction}
12: (y)k ← yorig,i for all original-evaluated yorig,i ∈ yorig
13: σ , m, C ← CMA-ES update
14: end while
15: xres ← xk from A where yk is minimal
Output: xres (point with minimal y)
CMA-ES is compared with the original version as well
as with the other two above mentioned surrogate
models on the noiseless part of the
Comparing-ContinuousOptimisers (COCO) platform [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ] in the expensive
scenario and compares it to the original CMA-ES,
lmmCMA-ES, s∗ACM-ES, and original DTS-CMA-ES.
Section 2 describes the original DTS-CMA-ES in more
detail. Section 3 defines the adaptivity employed to improve
the original DTS. Section 4 contains the experimental part.
Section 5 summarizes the results and concludes the paper.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Doubly Trained Surrogate CMA-ES</title>
      <p>
        The DTS-CMA-ES, introduced in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], is outlined in
Algorithm 1. The algorithm utilizes the ability of GP to
estimate the whole probability distribution of fitness to
select individuals out of the current population using some
uncertainty criterion. The selected individuals are
subsequently reevaluated with the original fitness and
incorporated into the set of points utilized for retraining the GP
model. The CMA-ES strategy parameters (σ , m, C, etc.)
are calculated using the original CMA-ES algorithm.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Adaptivity for the DTS-CMA-ES</title>
      <p>In this section, we propose a simple adaptation mechanism
for the DTS-CMA-ES. In DTS-CMA-ES, the number of
points evaluated by the original fitness function in one
generation is controlled by the ratio α. The higher values of
α, the more points are evaluated by the original fitness. As
a consequence, more training points are available for the
surrogate model around the current CMA-ES mean m. In
addition, the CMA-ES is less misled by a smaller amount</p>
      <sec id="sec-3-1">
        <title>Algorithm 2 Adaptive DTS-CMA-ES</title>
        <p>Input: λ (population-size), ytarget (target value),
f (original fitness function),β (update rate),
α0, αmin, αmax (initial, minimal, and maximal ratio
of original-evaluated points), C (uncertainty criterion),
E (0), Emin, Emax (initial, minimal, and maximal error)
1: σ , m, C, g ← CMA-ES initialize
2: A ← ∅ {archive initialization}
3: while minimal yk from A &gt; ytarget do
4: {xk}kλ=1 ∼ N m, σ 2C {CMA-ES sampling}
5: fM1 ← trainModel(A, σ , m, C) {model training}
6: (yˆ, s2) ← fM1([x1, . . . , xλ ]) {model evaluation}
7: Xorig←select ⌈αλ⌉ best points accord. to C(yˆ, s2)
8: yorig ← f (Xorig) {fitness evaluation }
9: A = A ∪ {(Xorig, yorig)} {archive update}
10: fM2 ← trainModel(A, σ , m, C) {model retrain}
11: y ← fM2([x1, . . . , xλ ]) {2nd model prediction}
12: (yR)DkE← yorig,i for all original-evaluated yorig,i ∈ yorig
13: E ← RDEμ (yˆ, y) {model’s error estimation}
14: E (g) ← (1 − β )E (g−1) + β E RDE {exponen. smooth}
15: α ← update using linear transfer function in Eq.(2)
16: σ , m, C, g ← CMA-ES update
17: end while
18: xres ← xk from A where yk is minimal
Output: xres (point with minimal y)
of points evaluated by the model. On the other hand, the
lower values of α imply less evaluations by the original
fitness and possibly a faster convergence of the algorithm.
The ratio α can be controlled according to the surrogate
model precision. Taking into account that the CMA-ES is
dependent only on the ordering of the μ best individuals
from the current population, we suggest to use the Ranking
Difference Error described in the following paragraph.</p>
        <p>
          The Ranking Difference Error (RDEμ ), is a normalized
version of the error measure used by Kern in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It is the
sum of differences of rankings of the μ best points in the
population of size λ , normalized by the maximal possible
such error for the respective μ, λ (ρ(i) and ρˆ(i) are the
ranks of the i-th element in vectors y and yˆrespectively,
where y’s ranking is expected to be more precise)
∑i∶ρ(i)≤μ Sρˆ(i) − ρ(i)S
RDEμ (yˆ, y) = maxπ ∈Permutationsof(1,...,λ ) ∑i∶π(i)≤μ S i − π(i)S
.
(1)
        </p>
        <p>The adaptive DTS-CMA-ES (aDTS-CMA-ES),
depicted in Algorithm 2, differs from the original
DTSCMA-ES in several additional steps (lines 13–15)
processed after the surrogate model fM2 is retrained using the
new original-evaluated points from the current generation
(line 10). First, the quality of the model is estimated using
the RDEμ (yˆ, y) measured between the first model’s
prediction yˆand the vector y which is composed of the
available original fitness values (from yorig) and the retrained
model’s predictions for the points which are not
originalevaluated. Due to noisy observation of the model’s error
E RDE, we have employed exponential smoothing of the
measured error using the update rate β (line 14). As the
next step (line 15), α is calculated via linear transfer
function of E (g)
α = ⎪⎪⎪⎧⎨⎪αmin + EEm(ga)x−−EEαmmmiinni(nαmax − αmin) EE ((gg)) ∈≤(EEmminin, Emax) ,
⎪⎪⎪⎪⎩ αmax E (g) ≥ Emax
(2)
where Emin and Emax are lower and upper bounds for
saturation to the values of αmin and αmax respectively.</p>
        <p>Having analyzed the RDEμ error measures on the
COCO/BBOB testbed, we observed that the measured RDE
error E (g) depends on the ratio α and the dimension D:
Emin = fmEin(α, D),</p>
        <p>Emax = fmEax(α, D).</p>
        <p>(3)
Especially dependence on α is not surprising: from the
definition of RDEμ follows that the more reevaluated
points, the higher number of summands in nominator of
(1) and hence the higher RDEμ value. Due to mutual
dependence of the parameters E and α, the calculation of α
in each generation is performed in a cycle until
convergence of α:
(1) calculate error thresholds Emin, Emax using the last
used ratio α – either from the previous iteration, or
from the previous generation (see equation (3))
(2) calculate new ratio α using newly calculated Emin,</p>
        <p>Emax (see equation (2))
In our implementation, the functions fmEin and fmEin are
results of multiple linear regression – see section 4.1 for the
details of these linear models. The remaining parts of the
algorithm are similar to the original DTS-CMA-ES.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>
        In this section, we compared the performances of the
aDTS-CMA-ES to the original DTS-CMA-ES [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the
CMA-ES [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and two other surrogate-assisted versions of
the CMA-ES, the lmm-CMA-ES [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] and the
s∗ACMES [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], on the noiseless part of the COCO/BBOB
framework [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
4.1
      </p>
      <sec id="sec-4-1">
        <title>Experimental Setup</title>
        <p>
          The considered algorithms were evaluated using the
24 noiseless COCO/BBOB single-objective
benchmarks [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ] in dimensions D = 2, 3, 5 and 10 on 15
different instances per function. The functions were
divided into three groups according to the difficulty of their
modeling with a GP model, where two groups were used
for tuning aDTS-CMA-ES parameters and the remaining
group was utilized to test the results of that tuning. The
method of dividing the functions into those groups will be
described below in connection with the aDTS-CMA-ES
settings. The experiment stopping criteria were reaching
either the maximum budget of 250 function evaluations
per dimension (FE~D), or reaching the target distance
from the function optimum Δ fT = 10−8. The following
paragraphs summarize the parameters of the compared
algorithms.
        </p>
        <p>The original CMA-ES was tested in its IPOP-CMA-ES
version (Matlab code v. 3.61) with the following settings:
the number of restarts = 4, IncPopSize = 2, σstart = 38 , λ =
4 + ⌊3 log D⌋. The remaining settings were left default.</p>
        <p>
          The lmm-CMA-ES was employed in its improved
version published in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The results have been
downloaded from the COCO/BBOB results data archive 1 in its
GECCO 2013 settings.
        </p>
        <p>
          We have used the bi-population version of the
s∗ACMES, the BIPOP-s∗ACM-ES-k [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Similarly to the
lmmCMA-ES, the algorithm results have also been
downloaded from the COCO/BBOB results data archive2.
        </p>
        <p>
          The original DTS-CMA-ES was tested using the overall
best settings from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]: the prediction variance of
Gaussian process model as the uncertainty criterion, the
population size λ = 8 + ⌊6 log D⌋, and the ratio of points
evaluated by the original fitness α = 0.05. The results of the
DTS-CMA-ES are slightly different from previously
published results [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ] due to a correction of a bug in the
original version which was affecting the selection of points
to be evaluated by the original fitness using an uncertainty
criterion.
        </p>
        <p>The aDTS-CMA-ES was tested with multiple settings
of parameters. First, the linear regression models of lower
and upper bounds for the error measure Emin, Emax were
identified via measuring RDEμ on datasets from
DTSCMA-ES runs on the COCO/BBOB benchmarks.</p>
        <p>As a first step, we figured out six BBOB functions
which are the easiest (E) and six which are the hardest
(H) to regress by our Gaussian process model based on the
RDEμ measured on 1250 independent testsets per
function in each dimension: 10 sets of λ points in each of
25 equidistantly selected generations from the
DTS-CMAES runs on the first 5 instances, see Table 1 for these sets of
functions and their respective errors. The functions which
were not identified as E or H form thetest function set.</p>
        <p>Using the same 25 DTS-CMA-ES “snapshots” on each
of 5 instances, we calculated medians (Q2) and the third
quartiles (Q3) of measured RDEμ on populations from
both groups of functions (E) and (H), where we used
five different proportions of original-evaluated pointsα =
{0.04, 0.25, 0.5, 0.75, 1.00} which were available for
retrained models and thus also for measuring models’
errors E (g). These quartiles were regressed by multiple
linear regression models using stepwise regression from a
full quadratic model of the ratio α and dimension D or
its logarithm log(D) (decision whether to use log(D) or D
1http://coco.gforge.inria.fr/data-archive/2013/
lmm-CMA-ES_auger_noiseless.tgz</p>
        <p>2http://coco.gforge.inria.fr/data-archive/2013/
BIPOP-saACM-k_loshchilov_noiseless.tgz
was according to the RMSE of the final stepwise models);
the stepwise regression was removing terms with the
highest p-value &gt; 0.05. The coefficients EmQi2n and EmQi3n of the
lower thresholds were estimated on the data from (E) and
the coefficientsEmQa2x and EmQa3x of the higher thresholds on
the data from (H), which resulted in the following models:
EmQi2n(α, D) =
EmQi3n(α, D) =
EmQa2x(α, D) =
EmQa3x(α, D) =
where
(1 log(D)
(1 D
(1 D
(1 log(D)
α
α
α
α
α log(D)
αD
αD
α log(D)
α2) ⋅ b1
α2) ⋅ b2
α2) ⋅ b3
α2) ⋅ b4
⎛⎜−00..010192⎞⎟ ⎛⎜−0.00067⎞⎟ ⎛⎜−00..010827⎞⎟ ⎟
−0.13 ⎟⎟ b2 = ⎜⎜⎜⎜ 0.0087 ⎟⎟ 0.44 ⎟⎟ b4 = ⎜⎜⎜⎜⎜⎛−0000...0.340454447⎟⎟⎟⎞⎟ .</p>
        <p>0.17
b1 = ⎜⎜⎜⎜ 0.044 ⎟⎟ −0.095 ⎟⎟ b3 = ⎜⎜⎜⎜ 0.0032 ⎟⎟</p>
        <p>⎝ 0.14 ⎠ ⎝ 0.15 ⎠ ⎝ −0.14 ⎠ ⎝ −0.19 ⎠
For the remaining investigations, three different values
of exponential smoothing update rate were used for
comparison β = {0.3, 0.4, 0.5}. The minimal and maximal
values of α were set to αmin = 0.04 and αmax = 1.0
because lower α values than 0.04 would yield to less than
one original-evaluated point per generation, and the
aDTSCMA-ES has to be able to spend the whole populations
for the original evaluations in order to work well on
functions where GP model is poor (e. g., on f6 Attractive
sector). The initial error and original ratio values were set to
E (0) = 0.05 and α0 = 0.05. The rest of aDTS-CMA-ES
parameters were left the same as in the original
DTS-CMAES settings.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>The results in Figures 1, 2, and 3 and in Table 3 show
the effect of adaptivity implemented in the
DTS-CMAES. The graphs in Figures 1, 2 and 3 depict the scaled
logarithm Δlfog of the median Δ mfed of minimal distances
from the function optimum over runs on 15 independent
instances as a function of FE~D. The scaled logarithms of
Δ mfed are calculated as
Δlog
f =
log Δ mfed − Δ MfIN
ΔMAX − Δ MfIN log10 1~10−8 + log10 10−8</p>
        <p>f
where Δ MfIN (Δ MfAX) is the minimum (maximum) log Δ mfed
found among all the compared algorithms for the
particular function f and dimension D between 0 and 250 FE/D.
Such scaling enables the aggregation of Δlog graphs across
f
arbitrary number of functions and dimensions (see
Figure 3). The values are scaled to the [−8, 0] interval, where
−8 corresponds to the minimal and 0 to the maximal
distance. This visualization has a better ability to distinguish
the differences in the convergence of tested algorithms
more than the default visualization used by the
COCO/BBOB platform and that is why it was used in this article.</p>
        <p>
          We have tested the statistical significance of differences
in algorithms’ performance on 12 COCO/BBOB test
functions in 10D for separately two evaluation budgets using
the Iman and Davenport’s improvement of the Friedman
test [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Let #FET be the smallest number of function
evaluations on which at least one algorithm reached the
target, i. e., satisfiedΔ mfed ≤ Δ fT , or #FET = 250D if no
algorithm reached the target within 250D evaluations. The
algorithms are ranked on each COCO/BBOB test function
with respect to Δ mfed at a given budget of function
evaluations. The null hypothesis of equal performance of all
algorithms is rejected at a higher function evaluation
budget #FEs = #FET (p &lt; 10−3), as well as at a lower budget
#FEs = #F3ET (p &lt; 10−3).
        </p>
        <p>
          We test pairwise differences in performance
utilizing the post-hoc Friedman test [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with the
BergmannHommel correction controlling the family-wise error in
cases when the null hypothesis of equal algorithms’
performance was rejected. To illustrate algorithms’
differences, the numbers of test functions at which one
algorithm achieved a higher rank than the other are reported
in Table 3. The table also contains the pairwise statistical
significances.
        </p>
        <p>We have compared the performances of
aDTS-CMAES using twelve settings differing in Emin, Emax, and β .
Table 2 illustrates the counts of the 1st ranks of the
compared settings according to the lowest achieved Δ mfed for
25, 50, 100, and 200 FE~D respectively. The counts are
summed across the testing sets of benchmark functions in
each individual dimension.</p>
        <p>Although the algorithm is rather robust to exact setting
of smoothing update rate, we have found that the lower
the β , the better the performance is usually observed (see
Table 2), and thus the following experiments use the rate
β = 0.3.</p>
        <p>When comparing the convergence rate, the performance
of aDTS-CMA-ES with EmQi2n is noticeable lower especially
on Rosenbrock’s functions ( f8, f9) and Different powers
f14 where the RDEμ error often exceeds the lower
error threshold even if a lower number of original-evaluated
points would be sufficient for higher speedup of the
CMAES. The adaptive control, on the other hand, helps
especially on the Attractive sector f6, which has the
optimum in a point without continues derivatives and is
therefore hard-to-regress by GPs, or on Shaffers’ functions f17,
f18 where the aDTS-CMA-ES is probably able to adapt
to multimodal neighbourhood around function’s optimum
and performs best of all the compared algorithms. Within
the budget of 250 FE/D, the aDTS-CMA-ES (especially
with EmQi2n) is also able to find one of the best fitness value
on regularly multimodal Rastrigin functions f3, f4 or f15
where the GP model apparently does not prevent the
original CMA-ES from exploiting the global structure of a
function.
f5 Linear Slope 5D
50 100 150
f9NRuomsbeenrborofecvka,lruoattaiotends /5DD
200
250
f10 E5ll0ipNsuomidbael,r1h0oi0fgehvaclounadti1oi5tni0osn/inDg 5D200
50 Numbef1r110o0fDeisvacluusat5i1oD5n0s / D
200
250
50 Number10o0f evaluati1o5n0s / D 200
Hard functions to regress
f6 Attractive Sector 5D
f4 Bueche-Rastrigin 5D
f7 Step Ellipsoidal 5D
0
-2
log f-4
"
-6
-8 0
0
-2
log f-4
"
-6
-8 0
0
-2
log f-4
"
-6
-8 0
0
-2
log f-4
"
-6
-8 0</p>
        <p>f3 Rastrigin 5D
250
250
250
250
200
250
50 Number10o0f evaluati1o5n0s / D
200
250
f1 Sphere 10D
f2 Ellipsoidal 10D
f5 Linear Slope 10D</p>
        <p>50 Number10o0f evaluati1o5n0s / D 200
Hard functions to regress
200
250
f3 Rastrigin 10D
250
250
250
250</p>
        <p>f4 Bueche-Rastrigin 10D
0
-2
log f-4
"
-6
-08 0 50 Nuf1m3bSehr1a0o0rfpevRaildugaeti1o15n00sD/ D 200
-2
log f-4
"
-6
-08 0 50 Nfu1m7bSecr1h0oa0ffefevrasluFa7ti1o15n00sD/ D 200
-2
log f-4
"
-6
-08 0 f22 Galla5g0hNeur'msbGear1u0os0fseivaanlu2a1t-i1oh5ni0sP/eDaks 1200D0
-2
log f-4
"
-6
-8 0 200
log f-4
"
0
-2
-6
-8
0
5D
#FEs⁄#FET</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>In this paper, we have presented a work-in-progress on
adaptive version of the surrogate-assisted optimization
algorithm DTS-CMA-ES. The online adjustment of the ratio
between the original- and model-evaluated points
according to the error of the surrogate model is investigated. The
new adaptive version of the algorithm employs RDEμ
between the fitness of current population predicted using the
first-trained model and the retrained model.</p>
      <p>Results of parameter tunning show that lower values
of the exponential smoothing rate β provide better
results. On the other hand, different combinations of slower
and more rapid update behaviours bring better CMA-ES
speedup for different kinds of functions, and choice of
this parameter could depend on the experimenter’s domain
knowledge. We found that the adaptive approach speeds
up the CMA-ES more than three other surrogate CMA-ES
algorithms, namely DTS-CMA-ES, s∗ACM-ES, and
lmmCMA-ES, on several functions after roughly 150 FE~D.</p>
      <p>The adaptivity of the DTS-CMA-ES is still, to a certain
extent, work in progress. A future perspective of
improving aDTS-CMA-ES is to additionally investigate different
types and properties of adaptive control of the number of
points evaluated by the original fitness in each generation.
Another conceivable direction of future research can be
found in online switching between different types of
surrogate models suitable for the aDTS-CMA-ES.
The reported research was supported by the Czech Science
Foundation grant No. 17-01251, by the Grant Agency of
the Czech Technical University in Prague with its grant
No. SGS17/193/OHK4/3T/14, and by the project Nr.
LO1611 with a financial support from the MEYS under
the NPU I program. Further, access to computing and
storage facilities owned by parties and projects contributing to
the National Grid Infrastructure MetaCentrum, provided
under the programme “Projects of Large Infrastructure for
Research, Development, and Innovations” (LM2010005),
is greatly appreciated.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Auger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brockhoff</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>Benchmarking the local metamodel CMA-ES on the noiseless BBOB'2013 test bed</article-title>
          .
          <source>In Proceedings of the GECCO '13 Companion</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Demšar</surname>
          </string-name>
          .
          <article-title>Statistical comparisons of classifiers over multiple data sets</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>7</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>García</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <article-title>An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>9</volume>
          :
          <fpage>2677</fpage>
          -
          <lpage>2694</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>The CMA Evolution Strategy: A Comparing Review</article-title>
          .
          <source>In Towards a New Evolutionary Computation, number 192</source>
          , pages
          <fpage>75</fpage>
          -
          <lpage>102</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Auger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Finck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ros</surname>
          </string-name>
          .
          <article-title>Real-parameter black-box optimization benchmarking 2012: Experimental setup</article-title>
          .
          <source>Technical report, INRIA</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Finck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ros</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Auger</surname>
          </string-name>
          .
          <article-title>Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions</article-title>
          .
          <source>Technical Report RR-6829</source>
          , INRIA,
          <year>2009</year>
          .
          <source>Updated February</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Koumoutsakos</surname>
          </string-name>
          .
          <article-title>Local Metamodels for Optimization Using Evolution Strategies</article-title>
          .
          <source>In PPSN IX Proceedings</source>
          , volume
          <volume>4193</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>939</fpage>
          -
          <lpage>948</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebag</surname>
          </string-name>
          .
          <article-title>Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy</article-title>
          .
          <source>In Proceedings of the GECCO '12</source>
          , pages
          <fpage>321</fpage>
          -
          <lpage>328</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebag</surname>
          </string-name>
          .
          <article-title>BIpopulation CMA-ES Algorithms with Surrogate Models and Line Searches</article-title>
          .
          <source>In Proceedings of the GECCO '13 Companion</source>
          , pages
          <fpage>1177</fpage>
          -
          <lpage>1184</lpage>
          . ACM Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bajer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Holenˇa</surname>
          </string-name>
          .
          <article-title>Doubly trained evolution control for the Surrogate CMA-ES</article-title>
          .
          <source>In PPSN XIV Proceedings</source>
          , pages
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bajer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Repický</surname>
          </string-name>
          , and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Holenˇa. Overview of surrogate-model versions of covariance matrix adaptation evolution strategy</article-title>
          .
          <source>In Proceedings of the GECCO '17 Companion. ACM</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>