1 Introduction

1613-0073

Adaptive Doubly Trained Evolution Control for the Covariance Matrix Adaptation Evolution Strategy

Zbyneˇk Pitra

z.pitra@gmail.com 1 2 3

Lukáš Bajer

0 2

Jakub Repický

0 2

Martin Holenˇa

2 0 Faculty of Mathematics and Physics, Charles University in Prague Malostranské nám. 25, 118 00 Prague 1 , Czech Republic 1 Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague Brˇehová 7 , 115 19 Prague 1 , Czech Republic 2 Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2 , 182 07 Prague 8 , Czech Republic 3 National Institute of Mental Health Topolová 748 , 250 67 Klecany , Czech Republic

2017

1885 120 128

An area of increasingly frequent applications of evolutionary optimization to real-world problems is continuous black-box optimization. However, evaluating realworld black-box fitness functions is sometimes very timeconsuming or expensive, which interferes with the need of evolutionary algorithms for many fitness evaluations. Therefore, surrogate regression models replacing the original expensive fitness in some of the evaluated points have been in use since the early 2000s. The Doubly Trained Surrogate Covariance Matrix Adaptation Evolution Strategy (DTS-CMA-ES) represents a surrogate-assisted version of the state-of-the-art algorithm for continuous blackbox optimization CMA-ES. The DTS-CMA-ES saves expensive function evaluations through using a surrogate model. However, the model inaccuracy on some functions can slow-down the algorithm convergence. This paper investigates an extension of DTS-CMA-ES which controls the usage of the model according to the model's error. Results of testing an adaptive and the original version of DTS-CMA-ES on the set of noiseless benchmarks are reported.

1 Introduction

Evolutionary algorithms have become very successful in continuous black-box optimization. That is, in optimization where no mathematical expression of the optimized function is available, neither an explicit nor implicit one, and it is necessary to empirically evaluate the fitness functions through series of measurements or simulations.

Considering real-world applications, the evaluation of a black-box function can be very time-consuming or expensive. Taking into account this property, the optimization method should evaluate as small amount of points as possible and still reach the target distance to the function optimal value.

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [ 4 ] is considered to be the state-of-the-art optimization algorithm for continuous black-box optimization. On the other hand, the CMA-ES can consume many evaluations to find the optimum of the expensive fitness function. This property resulted in the development of several surrogate-assisted versions of the CMA-ES (an overview can be found in [ 11 ]), where a part of evaluations is performed by a regression surrogate model instead of the original fitness function.

The local meta-model CMA-ES (lmm-CMA-ES), proposed in [ 7 ] and later improved in [ 1 ], builds a quadratic regression model for each point using a set of points already evaluated by the fitness function. The convergence of the algorithm is speeded-up by using a control of changes in population ranking after the fraction of the offspring is evaluated by the original fitness.

A different surrogate-assisted approach, utilizing an ordinal SVM to estimate the ranking of the fitness function values, called s∗ACM-ES, has been introduced in [ 8 ] and later improved in BIPOP-s∗ACM-ES-k [ 9 ] to be more robust against premature convergence to local optima. The parameters of the SVM surrogate model are themselves optimized using the CMA-ES algorithm.

In 2016, the Doubly Trained Surrogate CMA-ES (DTSCMA-ES) algorithm, using the ability of Gaussian processes to provide the distribution of predicted points, was introduced in [ 10 ]. The algorithm employs uncertainty criteria to choose the most promising points to be evaluated by the original fitness.

Results obtained with the three above-mentioned surrogate-assisted algorithms on noiseless functions [ 11 ] suggest that on some fitness functions (e. g.,attractive sector function) the surrogate model happens to suffer from a loss of accuracy. Whereas the first of these algorithms controls the number of points evaluated by the original fitness function to prevent the model from misleading the search, the DTS-CMA-ES has the amount of evaluated points fixed. Therefore, some control of the amount of points evaluated by the original fitness in each generation could speed-up the DTS-CMA-ES convergence.

This paper extends the original DTS-CMA-ES with an online adaptation of the number of the points evaluated by the original fitness. This extended version of DTS

Algorithm 1 DTS-CMA-ES [10]

Input: λ (population-size), ytarget (target value), f (original fitness function), α (ratio of originalevaluated points), C (uncertainty criterion) 1: σ , m, C ← CMA-ES initialize 2: A ← ∅ {archive initialization} 3: while minimal yk from A > ytarget do 4: {xk}kλ=1 ∼ N m, σ 2C {CMA-ES sampling} 5: fM1 ← trainModel(A, σ , m, C) {model training} 6: (yˆ, s2) ← fM1([x1, . . . , xλ ]) {model evaluation} 7: Xorig←select ⌈αλ⌉ best points accord. to C(yˆ, s2) 8: yorig ← f (Xorig) {original fitness evaluation } 9: A = A ∪ {(Xorig, yorig)} {archive update} 10: fM2 ← trainModel(A, σ , m, C) {model retrain} 11: y ← fM2([x1, . . . , xλ ]) {2nd model prediction} 12: (y)k ← yorig,i for all original-evaluated yorig,i ∈ yorig 13: σ , m, C ← CMA-ES update 14: end while 15: xres ← xk from A where yk is minimal Output: xres (point with minimal y) CMA-ES is compared with the original version as well as with the other two above mentioned surrogate models on the noiseless part of the Comparing-ContinuousOptimisers (COCO) platform [ 5, 6 ] in the expensive scenario and compares it to the original CMA-ES, lmmCMA-ES, s∗ACM-ES, and original DTS-CMA-ES. Section 2 describes the original DTS-CMA-ES in more detail. Section 3 defines the adaptivity employed to improve the original DTS. Section 4 contains the experimental part. Section 5 summarizes the results and concludes the paper. 2

Doubly Trained Surrogate CMA-ES

The DTS-CMA-ES, introduced in [ 10 ], is outlined in Algorithm 1. The algorithm utilizes the ability of GP to estimate the whole probability distribution of fitness to select individuals out of the current population using some uncertainty criterion. The selected individuals are subsequently reevaluated with the original fitness and incorporated into the set of points utilized for retraining the GP model. The CMA-ES strategy parameters (σ , m, C, etc.) are calculated using the original CMA-ES algorithm. 3

Adaptivity for the DTS-CMA-ES

In this section, we propose a simple adaptation mechanism for the DTS-CMA-ES. In DTS-CMA-ES, the number of points evaluated by the original fitness function in one generation is controlled by the ratio α. The higher values of α, the more points are evaluated by the original fitness. As a consequence, more training points are available for the surrogate model around the current CMA-ES mean m. In addition, the CMA-ES is less misled by a smaller amount

Algorithm 2 Adaptive DTS-CMA-ES

Input: λ (population-size), ytarget (target value), f (original fitness function),β (update rate), α0, αmin, αmax (initial, minimal, and maximal ratio of original-evaluated points), C (uncertainty criterion), E (0), Emin, Emax (initial, minimal, and maximal error) 1: σ , m, C, g ← CMA-ES initialize 2: A ← ∅ {archive initialization} 3: while minimal yk from A > ytarget do 4: {xk}kλ=1 ∼ N m, σ 2C {CMA-ES sampling} 5: fM1 ← trainModel(A, σ , m, C) {model training} 6: (yˆ, s2) ← fM1([x1, . . . , xλ ]) {model evaluation} 7: Xorig←select ⌈αλ⌉ best points accord. to C(yˆ, s2) 8: yorig ← f (Xorig) {fitness evaluation } 9: A = A ∪ {(Xorig, yorig)} {archive update} 10: fM2 ← trainModel(A, σ , m, C) {model retrain} 11: y ← fM2([x1, . . . , xλ ]) {2nd model prediction} 12: (yR)DkE← yorig,i for all original-evaluated yorig,i ∈ yorig 13: E ← RDEμ (yˆ, y) {model’s error estimation} 14: E (g) ← (1 − β )E (g−1) + β E RDE {exponen. smooth} 15: α ← update using linear transfer function in Eq.(2) 16: σ , m, C, g ← CMA-ES update 17: end while 18: xres ← xk from A where yk is minimal Output: xres (point with minimal y) of points evaluated by the model. On the other hand, the lower values of α imply less evaluations by the original fitness and possibly a faster convergence of the algorithm. The ratio α can be controlled according to the surrogate model precision. Taking into account that the CMA-ES is dependent only on the ordering of the μ best individuals from the current population, we suggest to use the Ranking Difference Error described in the following paragraph.

The Ranking Difference Error (RDEμ ), is a normalized version of the error measure used by Kern in [ 7 ]. It is the sum of differences of rankings of the μ best points in the population of size λ , normalized by the maximal possible such error for the respective μ, λ (ρ(i) and ρˆ(i) are the ranks of the i-th element in vectors y and yˆrespectively, where y’s ranking is expected to be more precise) ∑i∶ρ(i)≤μ Sρˆ(i) − ρ(i)S RDEμ (yˆ, y) = maxπ ∈Permutationsof(1,...,λ ) ∑i∶π(i)≤μ S i − π(i)S . (1)

The adaptive DTS-CMA-ES (aDTS-CMA-ES), depicted in Algorithm 2, differs from the original DTSCMA-ES in several additional steps (lines 13–15) processed after the surrogate model fM2 is retrained using the new original-evaluated points from the current generation (line 10). First, the quality of the model is estimated using the RDEμ (yˆ, y) measured between the first model’s prediction yˆand the vector y which is composed of the available original fitness values (from yorig) and the retrained model’s predictions for the points which are not originalevaluated. Due to noisy observation of the model’s error E RDE, we have employed exponential smoothing of the measured error using the update rate β (line 14). As the next step (line 15), α is calculated via linear transfer function of E (g) α = ⎪⎪⎪⎧⎨⎪αmin + EEm(ga)x−−EEαmmmiinni(nαmax − αmin) EE ((gg)) ∈≤(EEmminin, Emax) , ⎪⎪⎪⎪⎩ αmax E (g) ≥ Emax (2) where Emin and Emax are lower and upper bounds for saturation to the values of αmin and αmax respectively.

Having analyzed the RDEμ error measures on the COCO/BBOB testbed, we observed that the measured RDE error E (g) depends on the ratio α and the dimension D: Emin = fmEin(α, D),

Emax = fmEax(α, D).

(3) Especially dependence on α is not surprising: from the definition of RDEμ follows that the more reevaluated points, the higher number of summands in nominator of (1) and hence the higher RDEμ value. Due to mutual dependence of the parameters E and α, the calculation of α in each generation is performed in a cycle until convergence of α: (1) calculate error thresholds Emin, Emax using the last used ratio α – either from the previous iteration, or from the previous generation (see equation (3)) (2) calculate new ratio α using newly calculated Emin,

Emax (see equation (2)) In our implementation, the functions fmEin and fmEin are results of multiple linear regression – see section 4.1 for the details of these linear models. The remaining parts of the algorithm are similar to the original DTS-CMA-ES. 4

Experimental Evaluation

In this section, we compared the performances of the aDTS-CMA-ES to the original DTS-CMA-ES [ 10 ], the CMA-ES [ 4 ], and two other surrogate-assisted versions of the CMA-ES, the lmm-CMA-ES [ 1, 7 ] and the s∗ACMES [ 9 ], on the noiseless part of the COCO/BBOB framework [ 5, 6 ]. 4.1

Experimental Setup

The considered algorithms were evaluated using the 24 noiseless COCO/BBOB single-objective benchmarks [ 5, 6 ] in dimensions D = 2, 3, 5 and 10 on 15 different instances per function. The functions were divided into three groups according to the difficulty of their modeling with a GP model, where two groups were used for tuning aDTS-CMA-ES parameters and the remaining group was utilized to test the results of that tuning. The method of dividing the functions into those groups will be described below in connection with the aDTS-CMA-ES settings. The experiment stopping criteria were reaching either the maximum budget of 250 function evaluations per dimension (FE~D), or reaching the target distance from the function optimum Δ fT = 10−8. The following paragraphs summarize the parameters of the compared algorithms.

The original CMA-ES was tested in its IPOP-CMA-ES version (Matlab code v. 3.61) with the following settings: the number of restarts = 4, IncPopSize = 2, σstart = 38 , λ = 4 + ⌊3 log D⌋. The remaining settings were left default.

The lmm-CMA-ES was employed in its improved version published in [ 1 ]. The results have been downloaded from the COCO/BBOB results data archive 1 in its GECCO 2013 settings.

We have used the bi-population version of the s∗ACMES, the BIPOP-s∗ACM-ES-k [ 9 ]. Similarly to the lmmCMA-ES, the algorithm results have also been downloaded from the COCO/BBOB results data archive2.

The original DTS-CMA-ES was tested using the overall best settings from [ 10 ]: the prediction variance of Gaussian process model as the uncertainty criterion, the population size λ = 8 + ⌊6 log D⌋, and the ratio of points evaluated by the original fitness α = 0.05. The results of the DTS-CMA-ES are slightly different from previously published results [ 10, 11 ] due to a correction of a bug in the original version which was affecting the selection of points to be evaluated by the original fitness using an uncertainty criterion.

The aDTS-CMA-ES was tested with multiple settings of parameters. First, the linear regression models of lower and upper bounds for the error measure Emin, Emax were identified via measuring RDEμ on datasets from DTSCMA-ES runs on the COCO/BBOB benchmarks.

As a first step, we figured out six BBOB functions which are the easiest (E) and six which are the hardest (H) to regress by our Gaussian process model based on the RDEμ measured on 1250 independent testsets per function in each dimension: 10 sets of λ points in each of 25 equidistantly selected generations from the DTS-CMAES runs on the first 5 instances, see Table 1 for these sets of functions and their respective errors. The functions which were not identified as E or H form thetest function set.

Using the same 25 DTS-CMA-ES “snapshots” on each of 5 instances, we calculated medians (Q2) and the third quartiles (Q3) of measured RDEμ on populations from both groups of functions (E) and (H), where we used five different proportions of original-evaluated pointsα = {0.04, 0.25, 0.5, 0.75, 1.00} which were available for retrained models and thus also for measuring models’ errors E (g). These quartiles were regressed by multiple linear regression models using stepwise regression from a full quadratic model of the ratio α and dimension D or its logarithm log(D) (decision whether to use log(D) or D 1http://coco.gforge.inria.fr/data-archive/2013/ lmm-CMA-ES_auger_noiseless.tgz

2http://coco.gforge.inria.fr/data-archive/2013/ BIPOP-saACM-k_loshchilov_noiseless.tgz was according to the RMSE of the final stepwise models); the stepwise regression was removing terms with the highest p-value > 0.05. The coefficients EmQi2n and EmQi3n of the lower thresholds were estimated on the data from (E) and the coefficientsEmQa2x and EmQa3x of the higher thresholds on the data from (H), which resulted in the following models: EmQi2n(α, D) = EmQi3n(α, D) = EmQa2x(α, D) = EmQa3x(α, D) = where (1 log(D) (1 D (1 D (1 log(D) α α α α α log(D) αD αD α log(D) α2) ⋅ b1 α2) ⋅ b2 α2) ⋅ b3 α2) ⋅ b4 ⎛⎜−00..010192⎞⎟ ⎛⎜−0.00067⎞⎟ ⎛⎜−00..010827⎞⎟ ⎟ −0.13 ⎟⎟ b2 = ⎜⎜⎜⎜ 0.0087 ⎟⎟ 0.44 ⎟⎟ b4 = ⎜⎜⎜⎜⎜⎛−0000...0.340454447⎟⎟⎟⎞⎟ .

0.17 b1 = ⎜⎜⎜⎜ 0.044 ⎟⎟ −0.095 ⎟⎟ b3 = ⎜⎜⎜⎜ 0.0032 ⎟⎟

⎝ 0.14 ⎠ ⎝ 0.15 ⎠ ⎝ −0.14 ⎠ ⎝ −0.19 ⎠ For the remaining investigations, three different values of exponential smoothing update rate were used for comparison β = {0.3, 0.4, 0.5}. The minimal and maximal values of α were set to αmin = 0.04 and αmax = 1.0 because lower α values than 0.04 would yield to less than one original-evaluated point per generation, and the aDTSCMA-ES has to be able to spend the whole populations for the original evaluations in order to work well on functions where GP model is poor (e. g., on f6 Attractive sector). The initial error and original ratio values were set to E (0) = 0.05 and α0 = 0.05. The rest of aDTS-CMA-ES parameters were left the same as in the original DTS-CMAES settings. 4.2

Results

The results in Figures 1, 2, and 3 and in Table 3 show the effect of adaptivity implemented in the DTS-CMAES. The graphs in Figures 1, 2 and 3 depict the scaled logarithm Δlfog of the median Δ mfed of minimal distances from the function optimum over runs on 15 independent instances as a function of FE~D. The scaled logarithms of Δ mfed are calculated as Δlog f = log Δ mfed − Δ MfIN ΔMAX − Δ MfIN log10 1~10−8 + log10 10−8

f where Δ MfIN (Δ MfAX) is the minimum (maximum) log Δ mfed found among all the compared algorithms for the particular function f and dimension D between 0 and 250 FE/D. Such scaling enables the aggregation of Δlog graphs across f arbitrary number of functions and dimensions (see Figure 3). The values are scaled to the [−8, 0] interval, where −8 corresponds to the minimal and 0 to the maximal distance. This visualization has a better ability to distinguish the differences in the convergence of tested algorithms more than the default visualization used by the COCO/BBOB platform and that is why it was used in this article.

We have tested the statistical significance of differences in algorithms’ performance on 12 COCO/BBOB test functions in 10D for separately two evaluation budgets using the Iman and Davenport’s improvement of the Friedman test [ 2 ]. Let #FET be the smallest number of function evaluations on which at least one algorithm reached the target, i. e., satisfiedΔ mfed ≤ Δ fT , or #FET = 250D if no algorithm reached the target within 250D evaluations. The algorithms are ranked on each COCO/BBOB test function with respect to Δ mfed at a given budget of function evaluations. The null hypothesis of equal performance of all algorithms is rejected at a higher function evaluation budget #FEs = #FET (p < 10−3), as well as at a lower budget #FEs = #F3ET (p < 10−3).

We test pairwise differences in performance utilizing the post-hoc Friedman test [ 3 ] with the BergmannHommel correction controlling the family-wise error in cases when the null hypothesis of equal algorithms’ performance was rejected. To illustrate algorithms’ differences, the numbers of test functions at which one algorithm achieved a higher rank than the other are reported in Table 3. The table also contains the pairwise statistical significances.

We have compared the performances of aDTS-CMAES using twelve settings differing in Emin, Emax, and β . Table 2 illustrates the counts of the 1st ranks of the compared settings according to the lowest achieved Δ mfed for 25, 50, 100, and 200 FE~D respectively. The counts are summed across the testing sets of benchmark functions in each individual dimension.

Although the algorithm is rather robust to exact setting of smoothing update rate, we have found that the lower the β , the better the performance is usually observed (see Table 2), and thus the following experiments use the rate β = 0.3.

When comparing the convergence rate, the performance of aDTS-CMA-ES with EmQi2n is noticeable lower especially on Rosenbrock’s functions ( f8, f9) and Different powers f14 where the RDEμ error often exceeds the lower error threshold even if a lower number of original-evaluated points would be sufficient for higher speedup of the CMAES. The adaptive control, on the other hand, helps especially on the Attractive sector f6, which has the optimum in a point without continues derivatives and is therefore hard-to-regress by GPs, or on Shaffers’ functions f17, f18 where the aDTS-CMA-ES is probably able to adapt to multimodal neighbourhood around function’s optimum and performs best of all the compared algorithms. Within the budget of 250 FE/D, the aDTS-CMA-ES (especially with EmQi2n) is also able to find one of the best fitness value on regularly multimodal Rastrigin functions f3, f4 or f15 where the GP model apparently does not prevent the original CMA-ES from exploiting the global structure of a function. f5 Linear Slope 5D 50 100 150 f9NRuomsbeenrborofecvka,lruoattaiotends /5DD 200 250 f10 E5ll0ipNsuomidbael,r1h0oi0fgehvaclounadti1oi5tni0osn/inDg 5D200 50 Numbef1r110o0fDeisvacluusat5i1oD5n0s / D 200 250 50 Number10o0f evaluati1o5n0s / D 200 Hard functions to regress f6 Attractive Sector 5D f4 Bueche-Rastrigin 5D f7 Step Ellipsoidal 5D 0 -2 log f-4 " -6 -8 0 0 -2 log f-4 " -6 -8 0 0 -2 log f-4 " -6 -8 0 0 -2 log f-4 " -6 -8 0

f3 Rastrigin 5D 250 250 250 250 200 250 50 Number10o0f evaluati1o5n0s / D 200 250 f1 Sphere 10D f2 Ellipsoidal 10D f5 Linear Slope 10D

50 Number10o0f evaluati1o5n0s / D 200 Hard functions to regress 200 250 f3 Rastrigin 10D 250 250 250 250

f4 Bueche-Rastrigin 10D 0 -2 log f-4 " -6 -08 0 50 Nuf1m3bSehr1a0o0rfpevRaildugaeti1o15n00sD/ D 200 -2 log f-4 " -6 -08 0 50 Nfu1m7bSecr1h0oa0ffefevrasluFa7ti1o15n00sD/ D 200 -2 log f-4 " -6 -08 0 f22 Galla5g0hNeur'msbGear1u0os0fseivaanlu2a1t-i1oh5ni0sP/eDaks 1200D0 -2 log f-4 " -6 -8 0 200 log f-4 " 0 -2 -6 -8 0 5D #FEs⁄#FET

Acknowledgements

In this paper, we have presented a work-in-progress on adaptive version of the surrogate-assisted optimization algorithm DTS-CMA-ES. The online adjustment of the ratio between the original- and model-evaluated points according to the error of the surrogate model is investigated. The new adaptive version of the algorithm employs RDEμ between the fitness of current population predicted using the first-trained model and the retrained model.

Results of parameter tunning show that lower values of the exponential smoothing rate β provide better results. On the other hand, different combinations of slower and more rapid update behaviours bring better CMA-ES speedup for different kinds of functions, and choice of this parameter could depend on the experimenter’s domain knowledge. We found that the adaptive approach speeds up the CMA-ES more than three other surrogate CMA-ES algorithms, namely DTS-CMA-ES, s∗ACM-ES, and lmmCMA-ES, on several functions after roughly 150 FE~D.

The adaptivity of the DTS-CMA-ES is still, to a certain extent, work in progress. A future perspective of improving aDTS-CMA-ES is to additionally investigate different types and properties of adaptive control of the number of points evaluated by the original fitness in each generation. Another conceivable direction of future research can be found in online switching between different types of surrogate models suitable for the aDTS-CMA-ES. The reported research was supported by the Czech Science Foundation grant No. 17-01251, by the Grant Agency of the Czech Technical University in Prague with its grant No. SGS17/193/OHK4/3T/14, and by the project Nr. LO1611 with a financial support from the MEYS under the NPU I program. Further, access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” (LM2010005), is greatly appreciated.

[1]

Auger ,

Brockhoff , and

Hansen . Benchmarking the local metamodel CMA-ES on the noiseless BBOB'2013 test bed . In Proceedings of the GECCO '13 Companion , 2013 .

[2]

Demšar . Statistical comparisons of classifiers over multiple data sets . Journal of Machine Learning Research , 7 : 1 - 30 , 2006 .

[3]

García and

Herrera . An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons . Journal of Machine Learning Research , 9 : 2677 - 2694 , 2008 .

[4]

Hansen . The CMA Evolution Strategy: A Comparing Review . In Towards a New Evolutionary Computation, number 192 , pages 75 - 102 . Springer, 2006 .

[5]

Hansen ,

Auger ,

Finck , and

Ros . Real-parameter black-box optimization benchmarking 2012: Experimental setup . Technical report, INRIA , 2012 .

[6]

Hansen ,

Finck ,

Ros , and

Auger . Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions . Technical Report RR-6829 , INRIA, 2009 . Updated February 2010 .

[7]

Kern ,

Hansen , and

Koumoutsakos . Local Metamodels for Optimization Using Evolution Strategies . In PPSN IX Proceedings , volume 4193 of Lecture Notes in Computer Science, pages 939 - 948 . Springer, 2006 .

[8]

Loshchilov ,

Schoenauer , and

Sebag . Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy . In Proceedings of the GECCO '12 , pages 321 - 328 . ACM, 2012 .

[9]

Loshchilov ,

Schoenauer , and

Sebag . BIpopulation CMA-ES Algorithms with Surrogate Models and Line Searches . In Proceedings of the GECCO '13 Companion , pages 1177 - 1184 . ACM Press, 2013 .

[10]

Pitra ,

Bajer , and

Holenˇa . Doubly trained evolution control for the Surrogate CMA-ES . In PPSN XIV Proceedings , pages 59 - 68 . Springer, 2016 .

[11]

Pitra ,

Bajer ,

Repický , and M. Holenˇa. Overview of surrogate-model versions of covariance matrix adaptation evolution strategy . In Proceedings of the GECCO '17 Companion. ACM , 2017 .