=Paper=
{{Paper
|id=None
|storemode=property
|title=Evolutionary optimization with active learning of surrogate models and fixed evaluation batch
|pdfUrl=https://ceur-ws.org/Vol-990/paper6.pdf
|volume=Vol-990
|dblpUrl=https://dblp.org/rec/conf/itat/CharyparH12
}}
==Evolutionary optimization with active learning of surrogate models and fixed evaluation batch==
Evolutionary optimization with active learning of surrogate models
and fixed evaluation batch size?
Viktor Charypar1 and Martin Holeňa2
1
Czech Technical University
Faculty of Nuclear Sciences and Physical Engineering
Břehová 7, 115 19 Praha 1, Czech Republic
charyvik@fjfi.cvut.cz
2
Institute of Computer Science
Academy of Sciences of the Czech Republic
Pod vodárenskou věžı́ 2, 182 07 Praha, Czech Republic
martin@cs.cas.cz
Abstract. Evolutionary optimization is often applied to lation used as the objective function takes minutes to
problems, where simulations or experiments used as the fit- finish, the traditional approach becomes impractical.
ness function are expensive to run. In such cases, surro- When the objective function is evaluated using a phys-
gate models are used to reduce the number of fitness eval- ical experiment, in the evolutionary optimization of
uations. Some of the problems also require a fixed size catalytic materials [1] for example, an evaluation for
batch of solutions to be evaluated at a time. Traditional
one generation of the algorithm takes between several
methods of selecting individuals for true evaluation to im-
days and several weeks and costs thousands of euros.
prove the surrogate model either require individual points
to be evaluated, or couple the batch size with the EA gener- The typical solution to this problem is perform-
ation size. We propose a queue based method for individual ing only a part of all evaluations using the true fit-
selection based on active learning of a kriging model. Indi- ness function and using a response-surface model as
viduals are selected using the confidence intervals predicted its replacement for the rest. This approach is called
by the model, added to a queue and evaluated once the surrogate modeling. When using a surrogate model,
queue length reaches the batch size. The method was tested only a small portion of all the points that need to be
on several standard benchmark problems. Results show that evaluated is evaluated using the true objective func-
the proposed algorithm is able to achieve a solution using tion (simulation or experiment) and for the rest, the
significantly less evaluations of the true fitness function.
model prediction is assigned as the fitness value. The
The effect of the batch size as well as other parameters is
discussed.
model is built using the information from the true fit-
ness evaluations.
Since the fitness function is assumed to be highly
1 Introduction non-linear the modeling methods used are non-linear
as well. Some of the commonly used methods include
Evolutionary optimization algorithms are a popular artificial neural networks, radial basis functions, re-
class of optimization techniques suitable for various gression trees, support vector machines or Gaussian
optimization problems. One of their main advantages processes [3].
is the ability to find optima of black-box functions – Furthermore, some experiments require a fixed
functions that are not explicitly defined and only their number of samples to be processed at one time. This
input/output behavior is known from previous evalu- presents its own set of challenges for adaptive sampling
ations of a finite number of points in the input space. and is the main concern of this paper. We present an
This is typical for applications in engineering, chem- evolutionary optimization method assisted by a vari-
istry or biology, where the evaluation is performed in ant of a Gaussian-process-based interpolating model
a form of computer simulation or physical experiment. called kriging. In order to best use the evaluation bud-
The main disadvantage for such applications is the get, our approach uses active learning methods in se-
very high number of evaluations of the objective func- lecting individuals to evaluate using the true fitness
tion (called fitness function in the evolutionary op- function. A key feature of the approach is support
timization context) needed for an evolutionary algo- for online and offline batch evaluation with arbitrary
rithm (EA) to reach the optimum. Even if the simu- batch size independent of the generation size of
?
This work was supported by the Grant Agency of the EA.
the Czech Technical University in Prague, grant No. The rest of the paper is organized as follows: in the
SGS12/196/OHK3/3T/14 as well as the Czech Science following section we introduce the kriging surrogate
Foundation grant 201/08/0802. model and its properties, in section 2 the methods of
34 Viktor Charypar, Martin Holeňa
coupling a model to the evolutionary optimization are 1. An initial Ni generations of the EA is performed,
discussed, section 4 provides details of the proposed yielding sets G1 , . . . , GNi of individuals (x, ft (x)),
method and finally, the results of testing the method ft being the true fitness function.
are presented and discussed in section 5. 2. The model M SNisi trained on the individuals
(x, ft (x)) ∈ i=1 Gi .
3. The fitness function ft is replaced by a model pre-
2 Model-assisted evolutionary diction fM .
optimization 4. T generations are performed evaluating fM as the
fitness function.
Since the surrogate model used as a replacement for 5. One generation is performed using ft yielding
the fitness function in the EA is built using the re- a set Gj of individuals. (initially j = Ni + 1)
sults of the true fitness function evaluations, there are 6. The model isSretrained on the individuals
j
two competing objectives. First, we need to get the (x, ft (x)) ∈ i=1 Gi
most information about the underlying relations in the 7. Steps 4–6 are repeated until the optimum is
data, in order to build a precise model of the fitness reached.
function. If the model does not capture the features of
the fitness function correctly, the optimization can get The amount of true fitness evaluations in this ap-
stuck in a fake optimum or generally fail to converge to proach is dependent on the population size of the EA
a global one. Second, we have a limited budget for the and the frequency of control generations T , which can
true fitness function evaluations. Using many points be fixed or adaptively changed during the course of
from the input space to build a perfect model can re- the optimization [6]. For problems requiring batched
quire more true fitness evaluations than not employing evaluation this approach has the advantage of evalu-
a model at all. ating the whole generation, the size of which can be
In the general use of surrogate modeling, such as set to the size of the evaluation batch. The main dis-
design space exploration, the process of select- advantage of the generation-based strategy is that not
ing points from the input space to evaluate and build all individuals in the control generation are necessarily
the model upon is called sampling [3]. Traditionally, beneficial to the model quality and the expensive true
the points to sample are selected upfront. Upfront fitness evaluations are wasted.
sampling schemes are based on the theory of design
of experiments (DoE), e.g. a Latin hypercube design.
When we don’t know anything about the function we 2.2 Individual-based approach
are trying to model, it is better to use a small set of As opposed to the generation-based approach, in the
points as a base for an initial model, which is then it- individual-based strategy, the decision whether to
eratively improved using new samples, selected based evaluate a given point using the true fitness function
on the information from previous function evaluations or the surrogate model is made for each individual
and the model itself. This approach is called adaptive separately.
sampling [3]. In model-based optimization in general, there are
Using the surrogate model in an evolutionary op- several possible approaches to individual-based sam-
timization algorithm, the adaptive sampling decisions pling. The most used approach in the evolutionary
change from selecting which points of the input space optimization is pre-selection. In each generation of the
to evaluate in order to improve the model to whether EA, number of points, which is a multiple of the popu-
to evaluate a given point (selected by the EA) with lation size, is generated and evaluated using the model
the true fitness function or not. There are two gen- prediction. The best of these individuals form the next
eral approaches to this choice: the generation-based generation of the algorithm. The optimization is per-
approach and the individual-based approach. We will formed as follows.
discuss both, with emphasis on the latter, a variant of
which is used in the method we propose in section 4. 1. An initial set of points S is chosen and evaluated
using the true fitness function ft .
2.1 Generation-based approach 2. Model M is trained using the pairs (x, ft (x)) ∈ S
3. A generation of the EA is run with the fitness
In the generation-based approach the decision whether function replaced by the model prediction fM and
to evaluate an individual point with the true fitness a population Oi of size qp is generated and eval-
function is made for the whole generation of the evo- uated with fM , where p is the desired population
lutionary algorithm. The optimization takes the fol- size for the EA and q is the pre-screening ratio.
lowing steps. Initially, i = 1.
Evolutionary optimization 35
4. A subset P ⊂ O is selected according to a selection method [2], most of them depend on the model used.
criterion. The kriging model used in our proposed method offers
5. Individuals from P are evaluated using the true a good estimate of the local model accuracy by giv-
fitness function ft . ing an error estimate of its prediction. It is possible
6. The model M is retrained using S ∪ P, the set S to use the estimate itself as a measure of the model’s
is replaced with S ∪ P, and the EA resumes from confidence in the prediction, or base a more complex
step 3. measure on the variance estimate. The measures that
were tested for use in our method will be described in
Another possibility, called the best strategy [5], is detail in section 4.1.
to replace S with S ∪ O instead of just P in step 6
after re-evaluating the set O \ P with fM (after the
model M has been re-trained). This also means using 3 Kriging meta-models
the population size qp in the EA.
The kriging method is an interpolation method origi-
The key piece of this approach is the selection crite-
nating in geostatistics [9], based on modeling the func-
rion (or criteria) used to determine which individuals
tion as a realization of a stochastic process [11].
from set O should be used in the following generation
In the ordinary kriging, which we use, the function
of the algorithm. There are a number of possibilities,
is modeled as a realization of a stochastic process
let us discuss the most common.
An obvious choice is selecting the best individuals Y (x) = µ0 + Z(x) (1)
based on the fitness value. This results in the region of
the optimum being sampled thoroughly, which helps where Z(x) is a stochastic process with mean 0 and
2
finding the true optimum. On the other hand, the re- covariance function σ ψ given by
gions far from the current optimum are neglected and
cov{Y (x + h), Y (x)} = σ 2 ψ(h), (2)
a possible better optimum can be missed. To sample
the areas of the fitness landscape that were not ex- where σ 2 is the process variance for all x. The corre-
plored yet, space-filling criteria are used, either alone lation function ψ(h) is then assumed to have the form
or in combination with the best fitness selection or " d #
other criteria. X
All the previous criteria have the fact that they ψ(h) = exp − θl |hl |pl , (3)
are concerned with the optimization itself in common. l=1
A different approach is to use the information about where θl , l = 1, . . . , d, where d is the number of dimen-
the model, most importantly its accuracy, to decide sions, are the correlation parameters. The correlation
which points of the input space to evaluate with the function depends on the difference of the two points
true fitness function in order to most improve it. This and has the intuitive property of being equal to 1 if
approach is sometimes called active learning. h = 0 and tending to 0 when h → ∞. The θl param-
eters determine how fast the correlation tends to zero
2.3 Active learning in each coordinate direction and the pl determines the
smoothness of the function.
Active learning is an approach that tries to maxi- The ordinary kriging predictor based on n sample
0
mize the amount of insight about the modeled function points {x1 , . . . , xn } with values y = (y1 , . . . , yn ) is
gained from its evaluation while minimizing the num- then given by
ber of evaluations necessary. The methods are used
ŷ(x) = µˆ0 + ψ(x)0 Ψ−1 (y − µˆ0 1), (4)
in the general field of surrogate modeling as an ef-
ficient adaptive sampling strategy. The terms adap- where ψ(x)0 = (ψ(x − x1 ), . . . , ψ(x − xn )), Ψ is an
tive sampling and active learning are often used inter- n × n matrix with elements ψ(xi − xj ), and
changeably. We will use the term active learning for
the methods based on the characteristics of the sur- 10 Ψ−1 y
µˆ0 = 0 −1 . (5)
rogate model itself, such as accuracy, with the goal of 1Ψ 1
minimizing the model prediction error either globally
An important feature of the kriging model is that
or, more importantly, in the area of the input space
apart from the prediction value it can estimate the
the EA is exploring.
prediction error as well. The kriging predictor error in
The active learning methods are most often based
point x is given by
on the local model prediction error, such as cross-
(1 − ψ 0 Ψ−1 ψ)2
validation error. Although some methods are inde- 2 2 0 −1
pendent of the model, for example the LOLA-Voronoi s (x) = σ̂ 1 − ψ Ψ ψ + (6)
10 Ψ−1 1
36 Viktor Charypar, Martin Holeňa
where the kriging variance is estimated as 6. Steps 4 and 5 are repeated until the goal is
reached, or a stall condition is fulfilled.
(y − µˆ0 1)Ψ−1 (y − µˆ0 1)
σ̂ 2 = . (7)
n The b and t parameters, as well as the func-
The parameters θl and pl can be estimated by maxi- tion c(s2 ), are chosen before running the optimization.
mizing the likelihood function of the observed data. Note that the evaluation in step 5 can be performed
For the derivation of the equations 4 - 7 as well as either immediately, i.e. online, or offline. In offline eval-
the MLE estimation of the parameters the reader may uation, after filling the evaluation queue, the EA is
consult a standard stochastic process based derivation stopped when the current iteration is finished and the
by Sacks et al. in [11] or a different approach given by control is returned to the user. After obtaining the fit-
Jones in [7]. ness values for the samples in the sample queue (e.g.
by performing an experiment), the user can manually
add the samples and resume the EA from the last gen-
4 Method description eration.
While the choice of the parameters will be dis-
In this section we will describe the proposed method
cussed in section 5, let us introduce three different
for kriging-model-assisted evolutionary optimization
measures of estimated improvement in the model pre-
with batch fitness evaluation. Our main goal was to
diction c(s2 (x)) which we tested – the standard devia-
decouple the true fitness function sampling from the
tion, the probability of improvement and the expected
EA iterations based on an assumption that requiring
improvement.
a specific number of true fitness evaluations in every
generations of the EA forces unnecessary sampling.
In the generation-based approach, some of the 4.1 Measures of estimated improvement
points may be unnecessary to evaluate, as they will To estimate the improvement, which evaluation of
not bring any new information to the surrogate model. a given point will bring, we can use several measures.
The individual-based approach is better suited for the The three measures introduced here are all based on
task, as it chooses those points from each generation, the prediction error estimate of the kriging model. The
which are estimated to be the most valuable for the goal of these measures is to prefer the points that help
model. There is still the problem of performing a given improve the model in regions explored by the EA.
number of evaluations in every generation, although Each of the measure’s results for a given point are
there might not be enough valuable points to select compared with a threshold and when the estimated
from. improvement is above the threshold, the point is eval-
The method we propose achieves the desired de- uated using the true fitness function.
coupling by introducing an evaluation queue. The evo-
lutionary algorithm uses the model prediction at all Standard deviation (STD) is the simplest measure we
times and when a point, in which the model’s con- tested. It is computed directly from the error as its
fidence in its prediction is low, is encountered, it is square root
added to the evaluation queue. Once there are enough
q
points in the queue, all the points in it are evaluated ST D(x) = sˆ2M (x). (8)
and the model is re-trained using the results. The op- The STD captures only the model’s estimate of the
timization takes the following course. error of its own prediction (based on the distance from
the known samples). As such, it does not take into
1. Initial set S of b samples is selected using a chosen
account the value of the prediction itself and can be
initial design strategy and evaluated using the true
considered a measure of the model accuracy.
fitness function ft .
2. An initial kriging model M is trained using pairs Probability of improvement (POI) [7] uses the fact,
(x, ft (x)) ∈ S. that the kriging prediction is a Gaussian process and
3. The evolutionary algorithm is started, with the the prediction in a single point is therefore a normally
model prediction fM as the fitness function. distributed random variable Y (x) with a mean and
4. For every prediction fM (x) = ŷM (x), an estimated
2 variance given by the kriging predictor. If we choose a
improvement measure c(sM (x)) is computed from
2 2 target T (based on the goal of the optimization), we
the error estimate sM (x). If c(sM (x)) > t, an im-
can estimate the probability that a given point will
provement threshold, the point is added to the
have a value y(x) ≤ T as a probability that Y (x) ≤ T .
evaluation queue Q.
The probability of improvement is therefore defined as
5. If the queue size |Q| ≥ b, the batch size, all points
x ∈ Q are evaluated, the set S is replaced by S ∪
T − ŷM (x)
{(x, ft (x)} and the EA is resumed. P OI(x) = Φ , (9)
s2M (x)
Evolutionary optimization 37
where Φ is the cumulative distribution function of the effect of the parameters and also investigated the
the standard normal distribution. As opposed to the optimal choice of batch size for problems where an
STD, the POI takes into account the prediction mean upfront choice is possible. In this section we discuss
(value) as well as its variance (error estimate). The the tests performed and their results.
area of the current optimum is therefore preferred over For testing, we used the genetic algorithm imple-
the rest of the input space. When the area of the mentation from the global optimization toolbox for the
current optimum is sampled enough, the variance be- Matlab environment and the implementation of an or-
comes very small and the term T −ŷ M (x)
s2M (x)
becomes ex- dinary kriging model from the SUMO Toolbox [4]. The
tremely negative, encouraging the sampling of less ex- parameters of the supporting methods, e.g. the genetic
plored areas. algorithm itself, were kept on their default values pro-
vided by the implementation.
Expected improvement (EI) [7, 8] is based on estimat- Because the EA itself is not deterministic, each test
ing, as the name suggests, the improvement we expect was performed 20 times and the results we present are
to achieve over the current minimum fmin , if a given statistical measures of this sample. As a performance
point is evaluated. As before, we assume the model measure we use the number of true fitness evaluations
prediction in point x to be a normally distributed ran- used to reach a set goal in all tests. The main reason to
dom variable Y (x) with a mean and variance given by use this measure is that in model-assisted optimization
the kriging predictor. We achieve an improvement I the computational cost of everything except the true
over fmin if Y (x) = fmin − I. As shown in [7] the ex- fitness evaluation is minimal in comparison. We also
pected value of I can be obtained using the likelihood track the proportion of the 20 runs that reached the
of achieving the improvement goal before various limits (time, stall, etc.) took effect.
Z I=∞
(fmin − I − ŷM (x)2
1
√ exp − (10) 5.1 Benchmark functions
2πs2M (x) I=0 2s2M (x)
Since the evolutionary algorithms and optimization
Expected improvement is the expected value of the heuristics in general are often used on black-box opti-
improvement found by integrating over this density. mization, where the properties of the objective func-
The resulting measure EI is defined as tion are unknown, it is not straightforward to asses
2 their quality on real world problems. It has therefore
EI(x) = E(I) = sM (x)[uΦ(u) + φ(u)], φ(u)], (11)
become a standard practice to test optimization algo-
where rithms and their modifications on specially designed
fmin − ŷM (x) testing problems.
u= (12)
s2M (x) These benchmark functions are explicitly defined
and their properties and optima are known. They are
and Φ and φ are the cumulative distribution function
often designed to exploit typical weaknesses of opti-
and the probability distribution functions of the nor-
mization algorithms in finding the global optimum.
mal distribution respectively. The expected improve-
We used three functions found in literature [10]. Al-
ment has an important advantage over the POI: it does
though we performed our tests in two dimensions we
not require a preset target T , which can be detrimen-
give general multi-dimensional definitions of the func-
tal to the POI’s successful sample selection when set
tions.
too high or too low.
First of the functions used is the De Jong’s func-
All three measures have an important weakness of
tion. It is one of the simplest benchmarks, it is contin-
being based on the model prediction. If the modeled
uous, convex and unimodal and is defined as
function is deceptive, the model can be very inaccu-
rate while estimating a low variance. A good initial Xn
sampling of the fitness function is therefore very im- f (x) = x2i (13)
portant. The success of the whole method is dependent i=1
on the model’s ability to capture the response surface
correctly and thus on the function itself. The domain is restricted to a hypercube −10 ≤ xi ≤
10, i = 1, . . . , n. The function has one global optimum
f (x) = 0 in point x = 0. The De Jong’s function was
5 Results and discussion primarily used as a proof of concept test.
As a second benchmark, we used the Rosenbrock’s
The proposed method was tested using simulations on function, also called Rosenbrock’s valley. The global
three standard benchmark functions. We studied the optimum is inside a long parabolic shaped valley,
model evolution during the course of the optimization, which is easy to find. Finding the global optimum in
38 Viktor Charypar, Martin Holeňa
true fitness initial model and samples final model
2 2 2
1.5 1.5 1.5
1 1 1
0.5 0.5 0.5
0 0 0
−0.5 −0.5 −0.5
−1 −1 −1
−1.5 −1.5 −1.5
−2 −2 −2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Fig. 1. The original fitness function, the
optimization initial model and the final model.
progress
2
1.8
1.6
that valley however is difficult [10]. The function has function ev (1q) ev (med) ev (3q) goal reached
1.4
De Jong 60 60 120 0.01 1
best true fitness
the following definition
1.2
Rosenbrock 60 125 310 0.1 1
1
n
X Rastrigin 260 370 580 0.1 0.85
[100(xi+1 + x2i )2 + (1 − xi )2 ]
0.8
0.6
f (x) = (14)
0.4
i=1 Table 1. GA performance on benchmark functions with-
0.2
out a model.
The domain of the function is restricted to a hyper-
2 4 6 8 10 12 14
cube −2 ≤ xi ≤ 2, i = 1, . . . , n. It has one global generation
optimum f (x) = 0 in x = 1.
Finally, the third function used as a benchmark is 5.3 Measures of estimated improvement
the Rastrigin’s function. It is based on the De Jong’s comparison
function with addition of cosine modulation, which
In order to compare the measures of estimated im-
produces a high number of regularly distributed local
provement, we performed simulations on each bench-
minima and makes the function highly multimodal.
mark using each improvement estimate measure with
The function is defined as
different values of the threshold. The batch size was
Xn set to 40 – generally found to be the ideal batch size
f (x) = 10n + [x2i − 10 cos(2πxi )] (15) – for these experiments. For comparison, we also per-
i=1 formed tests with the standard genetic algorithm with-
out a model. Results of these simulations are shown in
The domain is restricted to −5 ≤ xi ≤ −5, i =
the table 1.
1, . . . , n. The global optimum f (x) = 0 is in x = 1.
The De Jong’s function proved to be simple to op-
timize and the threshold setting did not have almost
5.2 Model evolution any effect. Only when using the standard deviation,
setting the threshold too low lead to an increase in the
As the basic illustration of how the model evolves dur- number of evaluations, as too many points were eval-
ing the course of the EA, let us consider an example uated, although the model prediction in those points
test run using the Rosenbrock’s function. For this ex- was accurate enough.
periment we set the batch size of 15, used the STD The same is true for the STD measure used on the
measure of estimated improvement with a threshold Rosenbrock’s function, where setting the threshold too
of 0.001 and set the target fitness value of 0.001 as well. low leads to a big increase in variance of the results.
The target was reached at the point (0.9909, 0.9824) Interestingly, setting the threshold too high leads to
using 90 true fitness evaluations. A genetic algorithm a decrease in the number of evaluations, but also in
without a surrogate model needed approximately the success rate of reaching the goal. The POI and EI
3000 evaluations to reach the goal in several test runs. are more stable in terms of true fitness evaluations,
The model evolution is shown in figure 1. The true but have worse overall success rate. The results are
fitness function is shown on the left, the initial model shown in figure 2
is in the middle and the final model on the right. The The Rastrigin’s function proved difficult to opti-
points where the true fitness function was sampled are mize. This is probably due to the locality of the krig-
denoted with circles an the optimum is marked with ing model and the high number of local minima of the
a star. function. Overall the STD measure is the most suc-
Evolutionary optimization 39
400 2000
1 1
1800
350 0.9 0.9
1600
0.8 0.8
300
1400
0.7 0.7
250
true evaluations
true evaluations
1200
goal reached
goal reached
0.6 0.6
200 0.5 1000 0.5
0.4 800 0.4
150
0.3 0.3
600
100
0.2 0.2
400
50 0.1 0.1
200
0 0
0 0
0.0001 0.001 0.01 0.1 1 10 5 10 15 20 25 30 40 50 60 70 80 90 100
threshold batch size
(a) STD (a) No model
160
400 1
1
140 0.9
350 0.9
0.8
120
0.8
300
0.7
0.7 100
true evaluations
goal reached
250 0.6
true evaluations
goal reached
0.6
80 0.5
200 0.5
0.4
0.4 60
150 0.3
0.3
40
0.2
100
0.2
20 0.1
50 0.1
0
0 0
5 10 15 20 25 30 40 50 60 70 80 90 100
0
0.001 0.01 0.1 0.2 0.3 0.5 0.8 batch size
threshold
(b) STD
(b) POI
median value interquartile range goal reached 160
1
140 0.9
Fig. 2. STD measure on the Rosenbrock’s function - true 0.8
120
fitness evaluations and proportion of runs reaching goal. 0.7
100
true evaluations
goal reached
0.6
80 0.5
cessful. POI and EI lead to bad sampling of the model 60
0.4
and failure to reach the optimum. 0.3
40
0.2
An interesting general result is that the more com-
20 0.1
plex measures of estimated improvement perform
0
worse than the simple standard deviation estimate. 0
5 10 15 20 25 30 40 50 60 70 80 90 100
batch size
This indicates that the goal of active learning selec-
tion criteria in the evolutionary optimization should (c) POI
be the best possible sampling for overall model accu- median value interquartile range goal reached
racy, as opposed to trying to improve the accuracy in
the best regions of the input space. Both the POI and Fig. 3. Batch size effect on Rosenbrock’s function opti-
EI are design to select next best points to reach the mization - true fitness evaluations and proportion of runs
optimum. Since in our case, this is handled by the EA raching the goal.
itself, the measures bring an unnecessary noise to the
estimate of the model accuracy. The results also show a given batch size is to set the population size in a stan-
that the best measure selection is dependent on the dard GA, in our method however, the settings are in-
optimized function. dependent so a population size of 30, which proved
efficient, was used in all of the tests.
5.4 Batch size The results on the De Jong’s functions show that
apart from small batch sizes (up to 10), the opti-
In order to study the batch size effect on the opti- mization is successful in all runs. Our method helps
mization, a number of experiments were performed stabilize the EA for small batch sizes and for batch
with different batch sizes. The only option to achieve sizes above 15 the algorithm finds the optimum using
40 Viktor Charypar, Martin Holeňa
a single batch. For a standard GA this strong depen- ture information about the model accuracy improve-
dence arises for batch sizes above 40 and the algorithm ment expected by sampling a given point. In the ex-
reaches the goal in the second generation, evaluating periments with the batch size we found that small
twice as many points. batch sizes perform better when the objective function
For the Rosenbrock’s function we get the intuitive is simple, while causing bad initial sampling of more
result that setting the batch size too low leads to more complex functions, suggesting using a larger initial
evaluations or a failure to reach the goal, while large sample. The future development of this work should
batch size do not improve the results and waste true include experiments using different batch sizes for ini-
fitness evaluations. For this function the POI proved tial sampling and comparison of the method with other
to be the most efficient measure. The comparison is ways of employing a surrogate model in the optimiza-
shown in figure 3. Overall the method reduces the tion as well as other model-assisted optimization
number of true evaluations from hundreds to tens for methods.
the Rosenbrock’s function, while slightly reducing the The method brings promising results, reducing the
success rate of the computation. number of true fitness evaluations to a large degree
The Rastrigin’s function proved difficult to opti- for some of the benchmark functions. On the other
mize even without a surrogate model. With the model, hand, its success is highly dependent on the optimized
the STD achieved the best results reducing the number function and its initial sampling.
of true fitness evaluations approximately three times
in the area of the highest success rate with batch size
References
of 70. The other two measures were ineffective. We at-
tribute the method’s difficulty optimizing the Rastri- 1. M. Baerns, M. Holeňa: Combinatorial development of
gin’s function to the fact that the kriging model is local solid catalytic materials: design of high-throughput ex-
and thus it requires a large number of samples to cap- periments, data analysis, data mining. Catalytic Sci-
ture the function’s complicated behavior in the whole ence Series. Imperial College Press, 2009.
input space. When the initial sampling is misleading, 2. K. Crombecq, L. De Tommasi, D. Gorissen,
which is more likely for the Rastrigin’s function, both T. Dhaene: A novel sequential design strategy for global
surrogate modeling. In Winter Simulation Conference,
the model prediction and estimated improvement are
WSC ’09, Winter Simulation Conference, 2009, 731–
wrong. 742.
The results suggest that best batch size and best 3. D. Gorissen: Grid-enabled adaptive surrogate modeling
estimated improvement measure are highly problem- for computer aided engineering. PhD Thesis, Ghent
dependent. The proposed method is also very sensi- University, University of Antwerp, 2009.
tive to good initial sample selection, which is the most 4. D. Gorissen, I. Couckuyt, P. Demeester, T. Dhaene,
usual reason for it to fail to find the optimum. The ex- K. Crombecq: A surrogate modeling and adaptive sam-
perimental results support the intuition that batches pling toolbox for computer based design. The Journal
too small are bad for the initial sampling of the model of Machine Learning Research 11, 2010, 2051–2055.
5. L. Gräning, Y. Jin, B. Sendhoff: Efficient evolutionary
and batches too large slow down the model improve-
optimization using individual-based evolution control
ment by evaluating points that it would not be nec- and neural networks: A comparative study. In ESANN,
essary to evaluate with smaller batches. This suggests 2005, 273–278.
using a larger initial sample and a small batch for the 6. Y. Jin, M. Olhofer, B. Sendhoff: Managing approxi-
rest of the optimization. mate models in evolutionary aerodynamic design op-
timization. In Evolutionary Computation, 2001. Pro-
ceedings of the 2001 Congress on, vol. 1, IEEE, 2001,
6 Conclusions 592–599.
7. D.R. Jones. A taxonomy of global optimization meth-
In this paper we presented a method for model-assisted ods based on response surfaces. Journal of Global Op-
evolutionary optimization with a fixed batch size re- timization, 21:345–383, 2001.
quirement. To decouple the sampling from the EA iter- 8. D. R. Jones, M. Schonlau, W.J. Welch: Efficient global
ations and support an individual-based approach while optimization of expensive black-box functions. Journal
keeping a fixed evaluation batch size, the method uses of Global Optimization 13, 1998, 455–492.
an evaluation queue. The candidates for true fitness 9. G. Matheron: Principles of geostatistics. Economic
Geology 58(8), 1963, 1246–1266.
evaluations are selected by an active learning method
10. M. Molga, C. Smutnicki: Test functions for optimiza-
using a measure of estimated improvement of the tion needs. Test Functions for Optimization Needs,
model quality based on the model prediction error es- 2005.
timate. 11. J. Sacks, W. J. Welch, T. J. Mitchell, H. P. Wynn: De-
The results suggest using simple methods for im- sign and analysis of computer experiments. Statistical
provement estimate in active learning, which only cap- Science 4(4), 1989, 409–423.