J. Yaghob (Ed.): ITAT 2015 pp. 159–166 Charles University in Prague, Prague, 2015 Investigation of Gaussian Processes in the Context of Black-Box Evolutionary Optimization Andrej Kudinov1 , Lukáš Bajer2 , Zbyněk Pitra3 , and Martin Holeňa4 1Faculty of Information Technology, Czech Technical University, Prague, kudinand@fit.cvut.cz, 2 Faculty of Mathematics and Physics, Charles University, Prague, bajeluk@gmail.com, 3 Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University, Prague, z.pitra@gmail.com, 4 Czech Academy of Sciences, Prague, martin@cs.cas.cz Abstract: Minimizing the number of function evaluations which makes evolutionary search for the global optimum became a very challenging problem in the field of black- difficult because evolutionary algorithms (EAs) tend to get box optimization, when one evaluation of the objective trapped in one of the local optima. function may be very expensive or time-consuming. Gaus- The following section describes the theoretical funda- sian processes (GPs) are one of the approaches suggested mentals of GPs and introduces the MGSO method. It also to this end, already nearly 20 years ago, in the area of gen- explains surrogate modeling and using GPs as a surrogate eral global optimization. So far, however, they received model for CMA-ES. Section 3 presents results of exper- only little attention in the area of evolutionary black-box imental evaluation of the considered methods. Section 5 optimization. summarizes the results and concludes the paper. This work investigates the performance of GPs in the context of black-box continuous optimization, using mul- timodal functions from the CEC 2013 competition. It 2 Gaussian Processes in Optimization shows the performance of two methods based on GPs, GP is a random process such that any finite sequence Model Guided Sampling Optimization (MGSO) and GPs X1 , . . . , Xk of the involved random variables has a multi- as a surrogate model for CMA-ES. The paper compares variate Gaussian distribution. GP is defined by its mean the speed-up of both methods with respect to the number value and covariance matrix described by a function with of function evaluations using different settings to CMA- relatively small number of hyper-parameters, which are ES with no surrogate model. usually fitted by the maximum likelihood method. Firstly, GP is trained with N data points from the input space X, 1 Introduction XN = {xi |xi ∈ RD }Ni=1 Evolutionary computation became very successful during with known input-output values (xN , yN ), then it is used the past few decades in continuous black-box optimiza- for predicting the (N + 1)-st point. The conditional den- tion. In such optimization, we have no mathematical def- sity of the extended vector yN+1 = (y1 , . . . , yN , yN+1 ), con- inition of the optimized function available, thus we can ditioned on XN+1 = XN ∪ {xN+1 } is calculate analytically neither that function itself, nor its derivatives. In such cases, there is no option but to em- exp(− 1 yT C−1 yN+1 ) p(yN+1 |XN+1 ) = p 2 N+1 N+1 , pirically evaluate the objective function through measure- (2π)N+1 det(CN+1 ) ments, tests or simulations. In various real-world optimization problems, the eval- where CN+1 is the covariance matrix of a (N + 1)- uation of the objective function is very expensive or dimensional Gaussian distribution. The covariance matrix time-consuming, e.g., protein’s folding stability optimiza- can be expressed as tion [6], computer-assisted design [1] and job allocations   CN k in a computational grid [13]. In such cases, we need to CN+1 = , kT κ keep the number of function evaluations as low as possi- ble, without impairing the quality of expected results. where κ is the variance of the new point itself, k is is the In this paper, we employ two approaches addressing that vector of covariances between the new point and training task. The first, called Model Guided Sampling Optimiza- data and CN is the covariance of the Gaussian distribution tion (MGSO) [2], is one of the recent implementations of corresponding to the N training data points [5]. GPs. The second employed approach is surrogate model- Covariance functions provide prior information about ing, recalled in Subsection 2.2, which we will use in con- the objective function and express the covariance be- junction with CMA-ES. tween the function values of each two data points xi , This work investigates the performance of both methods x j as cov( f (xi ), f (x j )) = k(xi , x j ). Because the matrix on the set of niching functions from the CEC 2013 compe- x1 , . . . , xN serves as a covariance matrix, it has to positive tition [10], characterized by a high number of local optima, semidefinite. 160 A. Kudinov, L. Bajer, Z. Pitra, M. Holeňa 2.1 Model Guided Sampling Optimization An individual-based EC strategy consists in determin- ing the subset of individuals evaluated by the original fit- MGSO has the ability to use a regression model for pre- ness function in each generation. The following descrip- diction and error estimation in order to get the probabil- tion is illustrated by Figure 1. Denote λ to be the size ity of obtaining a better solution. It was inspired by two of the population provided by CMA-ES. First, λ 00 = αλ previously proposed methods in the field of black-box op- points are sampled from N(m, σ 2 C), where α ∈ [0, 1], m is timization. The first method, Estimation of Distribution the mean, σ is the step-size and C stands for the covariance Algorithms [9], creates a new set of solutions for the next matrix (1). These λ 00 points are evaluated by the original generation using estimated probability distribution from fitness function and included in training the model. Then, previously selected candidate solutions. The second ap- the extended population λ 0 = β (λ −λ 00 ), where β ∈ [1, ∞), proach is surrogate modeling, described in Section 2.2. is sampled by a model using the same distribution (2). The MGSO was proposed as an alternative method for extended population is required by the model for choosing Jones’ Efficient Global Optimization (EGO) [8]. Unlike promising points for re-evaluation by the original fitness EGO, MGSO produces not a single solution, but a whole function. Subsequently, γ(λ −λ 00 ) points, where γ ∈ [0, 1], population of solutions. The selection of candidate solu- are chosen according to some criterion from among the ex- tions is performed by sampling the probability of improve- tended population, e.g. fitness value, and used in the evalu- ment (PoI) of the GP model, which serves as a measure of ation by the original fitness function (3). The complement how promising the chosen point is for locating the opti- to λ points is gathered from the rest of the extended popu- mum. PoI is determined by means of a chosen threshold T lation by dividing it into k = (1 − γ)(λ − λ 00 ) clusters and and the estimation of the objective function shape by the selecting the best point from each cluster, which are also current GP model. evaluated by the original fitness function and added to the The crucial step in the MGSO algorithm is the sampling final population (4) [3]. of PoI, which is determined by the predicted mean fˆ(x) = ŷ and the standard deviation ŝ(x) = sy of the GP model fˆ λ in any point x of the input space (1)   T − fˆ(x) λ 00 = αλ λ − λ 00 PoIT (x) = Φ = P( fˆ(x ≤ T )), ŝ(x) λ − λ 00 which corresponds to the value of cumulative distribution function of the Gaussian for the value of threshold T . Al- (2) though all the variables are sampled from Gaussian distri- β (λ − λ 00 ) bution, PoI(x) is not Gaussian-shaped as it depends on the β (λ − λ 00 ) threshold T and the modelled function f . (3) 2.2 GP as Surrogate Model for CMA-ES λ 00 γ(λ − λ 00 ) Surrogate modeling is a technique used in optimization in k order to decrease the number of expensive function eval- uations. A surrogate model, which is a regression model of suitable kind (in our case a GP), is constructed by train- (4) ing with known values of the objective function for some λ 00 γ(λ − λ 00 ) (1 − γ)(λ − λ 00 ) inputs first, and then it is used by the employed evolution- ary optimization algorithm instead of the original objective λ function (in evolutionary optimization usually called fit- best points from each cluster ness) during the search for the global optimum. Although, best points from the extended population creating and training a model increases time complexity extended population of the optimization algorithm, using a model instead of pre-sampled points the original fitness function decreases the number of its evaluations, which is a crucial objective in expensive opti- Figure 1: Individual-based EC mization. Every regression model approximates the original fit- A generation-based EC strategy determines the num- ness function with some error. To prevent the optimiza- ber of model-evaluated generations between two genera- tion from being mislead from such an erroneous approxi- tions evaluated by the original function. After a generation mation, it is necessary to use the original fitness function is evaluated by the original fitness function, the model is for some subset of evaluations. That subset is determined trained using the obtained values. The number of consec- by the evolution control (EC) strategy [5]. utive model-evaluated generations can be determined also Investigation of Gaussian Processes in the Context of Black-Box Evolutionary Optimization 161 dynamically, as introduced in so-called adaptive EC stra- S-CMA-ES tegy [11], when the deviation between the original and the covariance cov ∈ functions ν= 52 model fitness function is assessed and then it is decided {KMatérn , Kexp , Kiso ard SE , KSE } whether to evaluate using the original fitness or the model. (0.1, 10) for KSE iso Determining the most suitable EC parameters, however, starting values of (σ 2f , `) (0.05 × J1,D , 0.1) for KardSE is an open problem, which depends on the properties of (0.5, 2) otherwise the fitness function and the current performance of the surrogate model. Moreover, the most suitable parameters starting values of σn2 0.01 change during the optimization process. MGSO covariance functions cov ∈ {Kiso ard SE , KSE } 3 Experimental Evaluation starting values (0.1, 10) for Kiso SE of (σ 2f , `) (0.05 × (J1,D ), 0.1) for Kard SE Previous investigations compared the performance of starting values of σn2 0.01 MGSO [2] and CMA-ES with GP surrogate model [4] (denoted hereafter S-CMA-ES) with CMA-ES without Table 1: Model parameter settings for S-CMA-ES and a model, using the standard black-box optimization bench- MGSO performance testing. The symbols Kiso ard SE , KSE , marks [7]. In this work, we compare those methods using ν= 5 an additional set of 12 multimodal fitness functions from Kexp , KMatérn 2 , denote, respectively, the isotropic squared the CEC 2013 competition on niching methods for multi- exponential, squared exponential with automatic relevance modal function optimization [10]: determination, exponential and Matérn with parameter ν = 25 covariance functions. J1,D denotes the vector of • f1: Five-Uneven-Peak Trap (1D), ones of length equal to the dimension D of the input space. • f2: Equal Maxima (1D), • f3: Uneven Decreasing Maxima (1D), and the version using automatic relevance determination • f4: Himmelblau (2D),   ard 2 1 • f5: Six-Hump Camel Back (2D), KSE (xi , x j ) = σ f exp − (xi − x j ) λ (xi − x j ) , > −2 2 • f6: Shubert (2D, 3D), (2) • f7: Vincent (2D, 3D), where λ stands for the characteristic length scale which • f8: Modified Rastrigin - All Global Optima (2D), measures the distance for being uncorrelated along xi . The • f9: Composition Function 1 (2D), covariance matrices 1 and 2 differ only when λ is a diag- onal matrix instead of a scalar. Two types of the Matérn • f10: Composition Function 2 (2D), covariance function were used, • f11: Composition Function 3 (2D, 3D, 5D, 10D),  r ν= 21 • f12: Composition Function 4 (3D, 5D, 10D, 20D). KMatérn (r) = exp − , (3) ` which is better known as exponential covariance function 3.1 MGSO Performance (Kexp ), and √ ! √ ! 5r 5r 2 5r MGSO performance was examined using two covari- ν= 52 KMatérn (r) = σ 2f 1 + + 2 exp − , (4) ance functions, isotropic squared exponential (KisoSE ) and ` 3` ` squared exponential with automatic relevance determina- tion (Kard where r = (xi − x j ) and the parameter ` is the characteris- SE ), with parameters shown in Table 1. The results in Tables 2 and 3 show the speed-up of MGSO with respect tic length-scale with which the distance of two considered to CMA-ES. As can be seen, the Kiso data points is compared and σ 2f is the signal variance. The SE covariance function performed better among these two in more than the half of description of the listed covariance functions can be found cases. Table 1 shows used parameter settings in our evalu- in [12]. The considered covariance functions parameters ations. are shown in Table 1. In the performed experiments, different variants of the chosen EC strategies, described in Section 2.2, were exam- 3.2 S-CMA-ES Performance ined, generation-based and individual-based. The result The speed-up results are shown in Tables 2 and 3. In per- are discussed in the following sections. formed evaluations, four covariance functions in the GP surrogate model were used, two types of the squared ex- 3.3 Generation-Based EC Strategy ponential covariance function, the isotropic version Apart from covariance function selection, generation-   iso 2 1 based EC strategy was determined by two other param- KSE (xi , x j ) = σ f exp − 2 (xi − x j ) (xi − x j ) , (1) > eters, the number of model-evaluated generations and the 2` 162 A. Kudinov, L. Bajer, Z. Pitra, M. Holeňa multiplication factor of CMA-ES’ step size σ , which is “+” mean that, unlike the employed method, no CMA- used in the original-evaluated generations in order to pro- ES run was able to reach the target. Signs “*” mean that vide points for model training from a broader region of the neither the considered method nor CMA-ES were able to input space. In the implementation, the first parameter was reach the target. Speed-ups written in bold mark cases varied among the values 1, 2, 4 and 8 consecutive model- where the S-CMA-ES’ or MGSO’s median of the ERT is evaluated generations and the second parameter was varied significantly lower than the median of the CMA-ES ac- among the values 1 and 2. cording to the one-sided Wilcoxon’s test on the signifi- cance level α = 0.05. 3.4 Individual-Based EC Strategy 4.2 Observations Apart from covariance function selection, three other pa- rameters, described in Section 2.2, were examined in the The MGSO method brought the highest speed-up in the case of individual-based EC strategy. The first parameter case of f 2, f 4, 3D version of f 7, 2D version of f 11 and α ∈ [0, 1] determines the proportion of the original popula- 5D version of f 12. The worst results were observed in the tion to be pre-sampled and evaluated by the original fitness case of f 1, 2D version of f 10 and 3D and 10D versions of function. The second parameter β ∈ [1, ∞) is a multiplica- f 12. The best results were achieved using Kiso SE covariance tor determining the size of extended population. The third function, however, in the case of f 4 and 2D version of f 7 parameter γ ∈ [0, 1] determines the amount of points with Kard SE covariance function brought much better results. the best model-fitness chosen from the extended popula- In the case of the generation-based EC, the overall best tion to be re-evaluated by the original fitness function to settings with respect to the median values are (Kiso SE , 8, 1) become a part of the final population. This parameter also – 8 consecutive model-evaluated generations with un- determines the number of clusters, where the best point is modified step size in combination with Kiso SE covariance chosen from each cluster and added to the final population. function. The overall best generation-based EC settings In performed evaluations, the parameter α was varied showed to be also the best generation-based EC settings among the values 0, 0.0625, 0.125, and 0.25, the parameter of the respective functions, except for 3D version of f 12, β was varied among the values 5 and 10 and γ was varied where S-CMA-ES performed better using larger step size. among the values 0, 0.1 and 0.2. Using different covariance functions didn’t bring much better results than the overall best covariance function. The individual-based EC strategy achieved the best re- 4 Results and Their Assessment sults with the overall settings (Kard SE , 0, 5, 0.1) – squared ex- ponential covariance matrix with automatic relevance de- 4.1 Result Tables termination, no pre-sampling before training the model, 5 as the multiplicator determining the size of extended pop- Tables 2 and 3 show the speed-up of S-CMA-ES and ulation and 0.1 as a multiplicator determining the amount MGSO, compared to CMA-ES without a surrogate model. of best points chosen from the extended population. The For the respective targets (distances to the true optimum best results using described parameters were achieved in ∆ fopt ), the speed-up of the expected running time (ERT) is the case of functions f 3 and 2D and 3D version of f 6. shown. ERT is the number of function evaluations needed However, the overall performance of the individual-based to reach the target divided by the ratio of the successful EC strategy lags far behind the generation-based EC stra- runs, which reached the target. Stopping criteria: the dis- tegy, MGSO and even CMA-ES itself. tance 10−8 to the true optimum and 100D original fitness function evaluations. The first column in each box corresponds to the over- 4.3 Best-Fitness Progress Diagrams all best settings (described in Section 4.2), the covari- Figure 2 shows examples of the best-fitness progress with ance function and EC settings of S-CMA-ES using the the best observed settings (see Table 2 for details). Me- generation-based EC strategy in terms of the average dians and the first and third quartiles of the best fitness speed-up. The second column corresponds to the best co- reached are shown; medians and quartiles measured for variance function and generation-based EC settings for MGSO and S-CMA-ES on 15 and 10 independent runs the respective function-dimension combination, if there (for both EC strategies), respectively. was any better than the overall best observed settings. The optimization progress of the individual-based EC Analogously, the third and fourth columns show results strategy showed to be the slowest in comparison to other for S-CMA-ES using individual-based EC strategy. Simi- methods. MGSO outperformed CMA-ES in most cases larly, the last two columns in each box show the speed-up and the highest speed-up was achieved in the later phase of of the MGSO. the optimization process. The generation-based EC stra- Signs “-” instead of the speed-up values mean that, un- tegy achieved the highest speed-up in the middle phase of like the CMA-ES, no run of the considered method (S- the optimization process. However, the generation-based CMA-ES or MGSO) was able to reach that target. Signs Investigation of Gaussian Processes in the Context of Black-Box Evolutionary Optimization 163 f1 (1–D) f2 (1–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e1 00.90 01.00 01.00 00.17 00.14 1e-1 01.00 01.44 00.76 00.13 01.30 1e0 00.90 01.00 01.00 00.06 00.14 1e-2 01.41 02.54 00.30 00.10 03.30 1e-1 00.90 01.00 01.00 00.06 00.08 1e-3 01.85 02.54 00.12 00.86 01.08 1e-2 00.90 01.00 01.00 - 00.05 1e-4 03.11 03.75 - 01.73 01.73 1e-3 00.90 01.00 01.00 - 00.05 1e-5 04.04 04.49 - - 02.25 1e-4 00.90 01.00 01.00 - - 1e-6 04.34 04.83 - - 02.37 1e-5 00.90 01.00 01.00 - - 1e-7 12.95 13.57 - - 08.05 1e-6 00.90 01.00 01.00 - - 1e-8 12.21 15.23 - - 16.78 1e-7 00.90 01.00 01.00 - - 1e-8 00.90 01.00 01.00 - - v= 5 param: (Kiso SE , 8, 1) (Kard (Kard SE , 8, 1) SE , 0, 5, 0.1) Kiso SE Kard SE param: (Kiso SE , 8, 1) (Kard (Kard SE , 8, 1) iso SE , 0, 5, 0.1) (KMatérn, 2 , 10, 0.2) KSE 2 −2 f3 (1–D) f4 (2–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e-1 02.44 00.52 00.45 01.83 1e2 00.54 01.00 00.16 00.11 00.35 00.35 1e-2 05.71 01.44 01.51 02.13 1e1 01.24 01.63 00.17 00.44 01.55 01.55 1e-3 21.36 07.12 17.80 09.49 1e0 02.59 03.52 - 01.26 02.20 04.00 1e-4 18.41 07.12 - 08.63 1e-1 02.74 03.47 - - 01.97 02.62 1e-5 20.07 07.76 - 09.41 1e-2 02.99 03.66 - - 02.44 03.25 1e-6 20.07 07.76 - 09.41 1e-3 03.91 05.00 - - 03.58 04.57 1e-7 * * * * 1e-4 04.45 05.22 - - 04.52 04.68 1e-8 * * * * 1e-5 13.44 14.83 - - 00.67 13.23 1e-6 17.63 20.10 - - - 18.12 1e-7 + + * * * + 1e-8 + + * * * + param: (Kiso SE , 8, 1) (Kard SE , 0, 5, 0.1) (Kexp, 0, 5, 0.2) Kiso SE param: (Kiso SE , 8, 1) (Kard (Kexp, 8, 1) SE , 0, 5, 0.1) (Kexp, 2−2, 5, 0.1) Kiso SE Kard SE f5 (2–D) f6 (2–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e0 01.00 00.12 00.09 00.65 00.65 1e2 01.18 01.18 - 01.74 03.04 02.03 1e-1 01.72 00.05 00.08 00.80 00.80 1e1 04.97 04.20 - - 00.36 02.21 1e-2 03.28 - 00.70 01.52 01.52 1e0 07.16 08.40 - - - 01.80 1e-3 02.95 - - 01.51 01.92 1e-1 + + * * * + 1e-4 03.27 - - 01.96 02.48 1e-2 + + * * * + 1e-5 03.74 - - 02.55 03.23 1e-3 + + * * * + 1e-6 * * * * * 1e-4 + + * * * + 1e-7 * * * * * 1e-5 * * * * * * 1e-8 * * * * * 1e-6 * * * * * * 1e-7 * * * * * * 1e-8 * * * * * * v= 5 param: (Kiso SE , 8, 1) (Kard iso SE , 0, 5, 0.1) (KMatérn, 0, 10, 0.2) KSE 2 Kard SE param: (Kiso SE , 8, 1) (Kard (Kard SE , 4, 1) SE , 0, 5, 0.1) (Kard SE , 0, 10, 0) Kiso SE Kard SE f6 (3–D) f7 (2–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e4 01.00 01.00 01.00 00.27 1e-2 03.28 02.93 00.05 00.43 00.14 01.02 1e3 17.55 - 07.42 04.53 1e-3 04.42 04.81 - - 00.27 00.54 1e2 07.72 - - 04.06 1e-4 05.63 06.09 - - - 00.52 1e1 05.96 - - - 1e-5 07.91 07.91 - - - 01.45 1e0 + * * * 1e-6 16.24 16.24 - - - 03.39 1e-1 + * * * 1e-7 65.25 68.62 - - - 11.29 1e-2 + * * * 1e-8 + + * * * + 1e-3 + * * * 1e-4 + * * * 1e-5 + * * * 1e-6 + * * * 1e-7 + * * * param: (Kiso SE , 8, 1) (Kard SE , 0, 5, 0.1) (Kard SE , 0, 10, 0.1) Kiso SE param: (Kiso SE , 8, 1) (Kard (Kexp, 8, 1) SE , 0, 5, 0.1) (Kard SE , 2 , 10, 0) −4 Kiso SE Kard SE f7 (3–D) f8 (2–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e-1 03.15 00.03 01.28 01.14 1e0 02.68 02.68 - 00.96 01.68 01.68 1e-2 05.26 - - 01.56 1e-1 03.91 03.91 - - 02.22 02.22 1e-3 10.36 - - 03.16 1e-2 04.46 04.79 - - 03.20 03.20 1e-4 13.33 - - 04.95 1e-3 07.15 07.65 - - 03.29 03.29 1e-5 64.47 - - 22.83 1e-4 12.95 13.74 - - 05.61 06.73 1e-6 + * * + 1e-5 22.40 24.85 - - 11.39 13.53 1e-7 + * * + 1e-6 + + * * + + 1e-8 + * * + 1e-7 + + * * + + 1e-8 + + * * + + v= 5 v= 5 param: (Kiso SE , 8, 1) (Kard SE , 0, 5, 0.1) (KMatérn 2 , 0, 10, 0) Kiso SE param: (Kiso SE , 8, 1) (KMatérn 2 , 8, 1) (Kard SE , 0, 5, 0.1) (Kexp, 2−4, 5, 0) Kiso SE Kard SE Table 2: Speed-up of S-CMA-ES using individual- and generation-based strategies and MGSO, compared to CMA-ES without a surrogate model – functions f 1− f 10 (see Section 4.1 for details). Empty columns signify that the best observed settings for the respective function-dimension combination are identical with the overall best observed settings. 164 A. Kudinov, L. Bajer, Z. Pitra, M. Holeňa f9 (2–D) f10 (2–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e1 03.40 02.74 - 00.10 01.06 1e2 01.98 01.76 00.03 00.26 00.39 1e0 03.38 03.11 - 00.07 01.15 1e1 02.29 02.62 - - 00.46 1e-1 03.82 03.82 - - 01.53 1e0 03.01 03.21 - - 00.07 1e-2 04.35 04.35 - - 01.52 1e-1 03.71 04.00 - - - 1e-3 07.80 07.80 - - 02.68 1e-2 06.15 07.02 - - - 1e-4 22.40 23.56 - - 08.54 1e-3 18.28 21.88 - - - 1e-5 + + * * + 1e-4 + + * * * 1e-6 + + * * + 1e-5 + + * * * 1e-7 + + * * + 1e-6 + + * * * 1e-8 + + * * * 1e-7 + + * * * 1e-8 + + * * * v= 5 param: (Kiso SE , 8, 1) (Kard (Kard SE , 8, 1) SE , 0, 5, 0.1) (Kard SE , 2 , 5, 0) −4 Kiso SE param: (Kiso SE , 8, 1) (KMatérn 2 , 8, 1) (Kard ard SE , 0, 5, 0.1) (KSE , 0, 10, 0) Kiso SE f11 (2–D) f11 (3–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e2 03.92 03.38 00.04 01.06 01.24 1e3 00.53 01.00 01.00 01.00 00.27 1e1 03.69 03.69 - - 01.43 1e2 02.51 03.04 - - 01.33 1e0 04.63 05.01 - - 02.32 1e1 03.04 03.38 - - 01.41 1e-1 05.19 05.19 - - 03.18 1e0 03.12 03.47 - - 01.85 1e-2 08.70 09.23 - - 02.51 1e-1 03.90 04.33 - - 01.71 1e-3 12.06 14.04 - - 03.72 1e-2 05.95 07.21 - - 02.19 1e-4 59.40 65.25 - - 18.46 1e-3 27.83 32.15 - - 11.06 1e-5 + + * * + 1e-4 + + * * + 1e-6 + + * * * 1e-5 + + * * + 1e-7 + + * * * 1e-6 + + * * + 1e-8 + + * * * 1e-7 + + * * * 1e-8 + + * * * param: (Kiso SE , 8, 1) (Kard (Kard SE , 8, 1) SE , 0, 5, 0.1) (Kiso SE , 2 , 10, 0) −4 iso KSE iso param: (KSE , 8, 1) (Kexp, 8, 1) ard ard (KSE , 0, 5, 0.1) (KSE , 0, 5, 0) Kiso SE f12 (3–D) f11 (5–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e2 02.05 01.72 00.37 00.27 00.91 00.46 1e2 03.02 - 01.36 1e1 01.42 02.27 - 00.67 00.31 00.63 1e1 03.24 - 01.51 1e0 - 02.28 - - 00.48 00.65 1e0 04.37 - 01.96 1e-1 - 02.76 - - - - 1e-1 09.79 - 03.12 1e-2 - 03.78 - - - - 1e-2 27.12 - 07.51 1e-3 - 19.56 - - - - 1e-3 47.82 - 13.17 1e-4 * + * * * * 1e-4 + * + 1e-5 * + * * * * 1e-5 + * * 1e-6 * + * * * * 1e-6 + * * 1e-7 * + * * * * 1e-7 + * * 1e-8 * + * * * * 1e-8 + * * v= 5 param: (Kiso SE , 8, 1) (Kard (Kard SE , 8, 2) iso SE , 0, 5, 0.1) (KMatérn, 0, 10, 0.1) KSE 2 Kard SE param: (Kiso SE , 8, 1) (Kard SE , 0, 5, 0.1) Kiso SE f11 (10–D) f12 (5–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e3 00.89 01.14 00.17 00.64 00.27 1e2 01.64 01.70 - 00.04 00.78 1e2 03.65 04.11 - - 01.05 1e1 05.60 06.61 - - 04.73 1e1 08.51 09.35 - - 02.61 1e0 19.24 33.01 - - 28.86 1e0 09.78 10.59 - - 02.10 1e-1 + + * * + 1e-1 20.02 21.50 - - - 1e-2 + + * * + 1e-2 + + * * * 1e-3 + + * * + 1e-3 + + * * * 1e-4 + + * * * 1e-4 + + * * * 1e-5 + + * * * 1e-5 + + * * * 1e-6 + + * * * 1e-6 + + * * * 1e-7 + * * * * 1e-7 * + * * * 1e-8 * * * * * 1e-8 * * * * * v= 5 param: (Kiso SE , 8, 1) (KMatérn 2 (Kard , 8, 1) SE , 0, 5, 0.1) (Kiso SE , 2 , 5, 0) −4 Kiso SE param: (Kiso (Kard SE , 8, 1) SE , 4, 1) (Kard iso −2 iso SE , 0, 5, 0.1) (KSE , 2 , 5, 0) KSE f12 (10–D) f12 (20–D) ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO ∆ fopt S-CMA-ES - generation EC S-CMA-ES - individual EC MGSO 1e2 08.98 - 01.88 02.02 1e3 01.39 00.05 00.18 00.42 1e1 06.83 - - 01.05 1e2 02.73 - - 00.03 1e0 17.61 - - - 1e1 07.24 - - - 1e-1 + * * * 1e0 59.06 - - - 1e-2 + * * * 1e-1 + * * * 1e-3 + * * * 1e-2 + * * * 1e-4 + * * * 1e-3 + * * * 1e-5 + * * * 1e-4 + * * * 1e-6 * * * * 1e-5 + * * * 1e-7 * * * * 1e-6 * * * * 1e-8 * * * * 1e-7 * * * * 1e-8 * * * * param: (Kiso SE , 8, 1) (Kard SE , 0, 5, 0.1) Kiso SE Kard SE param: (Kiso SE , 8, 1) (Kard iso SE , 0, 5, 0.1) (KSE , 0, 10, 0) Kiso SE Table 3: Speed-up of S-CMA-ES using individual- and generation-based strategies and MGSO, compared to CMA-ES without a surrogate model – functions f 11 − f 20 (see Section 4.1 for details). Empty columns signify that the best observed settings for the respective function-dimension combination are identical with the overall best observed settings. Investigation of Gaussian Processes in the Context of Black-Box Evolutionary Optimization 165 Figure 2: Examples of the best-fitness progress with the best observed settings, solid line: median, dashed lines: quartiles. 166 A. Kudinov, L. Bajer, Z. Pitra, M. Holeňa EC strategy generally brought the best results, it was out- [2] Bajer, L., Charypar, V., Holeňa, M.: Model guided sam- performed by MGSO in the case of 2D version of f 8 and pling optimization with gaussian processes for expensive 5D version of f 12 in the later phase of the optimization black-box optimization. In: Blum, C. (ed.), GECCO Com- process. panion’13, 2013, New York: ACM [3] Bajer, L., Holeňa, M.: Two gaussian approaches to black- box optomization. CoRR, abs/1411.7806, 2014 5 Conclusion [4] Bajer, L., Pitra, Z., Holeňa, M.: Benchmarking Gaus- sian processes and random forests surrogate models on the In this paper, two optimization approaches based on Gaus- BBOB noiseless testbed. In: GECCO’15, Madrid, Spain, sian processes were tested on the set of multimodal fitness 2015, ACM functions from the CEC 2013 competition [10], and were [5] Bajer, L., Pitra, Z., Holeňa, M.: Investigation of Gaussian compared to the state-of-the-art evolutionary approach in processes and random forests as surrogate models for evo- lutionary black-box optimization. In: GECCO’15, Poster black-box optimization, CMA-ES. One of them is Model Abstracts, Madrid, Spain, 2015, ACM, TBA Guided Sampling Optimization [2], the other approach, S- [6] Chaput, J. C., Szostak, J. W.: Evolutionary optimization of CMA-ES [5], consists in using GP as a surrogate model a nonbiological ATP binding protein for improved folding for CMA-ES. The performance of the methods was com- stability. Chemistry & Biology bf 11(6) (June 2004), 865– pared with respect to the number of function evaluations. 874 In the case of S-CMA-ES, two evolution control strate- [7] Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter gies were used, the individual- and generation-based. Al- black-box optimization benchmarking 2012: experimental though S-CMA-ES using generation-based EC strategy setup. Technical Report, INRIA, 2012 outperformed MGSO, both methods showed the perfor- [8] Jones, D. R., Schonlau, M., Welch, W. J.: Efficient mance improvement in most cases. On the other hand, Global Optimization of expensive black-box functions. J. the individual-based EC strategy brought the worst re- of Global Optimization 13(4) (Dec. 1998), 455–492 sults of all considered methods. We also observed, that S- [9] Larranaga, P., Lozano, J.: Estimation of distribution algo- CMA-ES performs better using generation-based EC set- rithms: a new tool for evolutionary computation. Kluwer ting with more consecutive model-evaluated generations. Academic Pub, 2002 Isotropic squared exponential covariance function showed [10] Li, X., Engelbrecht, A., Epitropakis, M. G.: Benchmark to be the most suitable for the optimization from all tested functions for CEC’2013 special session and competition covariance functions. on niching methods for multimodal function optimization’. This is a work in progress that is a part of a broader 2013. ongoing research. Terefore, it would be premature to draw [11] Loshchilov, I., Schoenauer, M., Sebag, M.: Self-adaptive deeper conclusions at this stage. We hope to be able to surrogate-assisted covariance matrix adaptation evolution draw such conclusions after further investigations will be strategy. In: Proceedings of the Fourteenth International performed in the future. Conference On Genetic And Evolutionary Computation Conference, GECCO’12, 321–328, New York, NY, USA, 2012, ACM Acknowledgement [12] Rasmussen, C., Williams, C.: Gaussian processes for ma- chine learning. Adaptive Computation and Machine Learn- The research reported in this paper has been supported by ing, MIT Press, Cambridge, MA, USA, Jan. 2006 the Czech Science Foundation (GAČR) grant 13-17187S. [13] Tesauro, G., Jong, N. K., Das, R., Bennani, M. N.: On the use of hybrid reinforcement learning for autonomic resource allocation. Cluster Computing 10(3) (2007), References 287–299 [1] Nik, M. A., Fayazbakhsh, K., Pasini, D., Lessard, L.: A comparative study of metamodeling methods for the de- sign optimization of variable stiffness composites. Com- posite Structures 107 (2014), 494–501