<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evolutionary optimization with active learning of surrogate models and xed evaluation batch size?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktor Charypar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holena</string-name>
          <email>martin@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Brehova 7</institution>
          ,
          <addr-line>115 19 Praha 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science Academy of Sciences of the Czech Republic Pod vodarenskou vez 2</institution>
          ,
          <addr-line>182 07 Praha</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>40</lpage>
      <abstract>
        <p>Evolutionary optimization is often applied to lation used as the objective function takes minutes to problems, where simulations or experiments used as the t- nish, the traditional approach becomes impractical. ness function are expensive to run. In such cases, surro- When the objective function is evaluated using a physgate models are used to reduce the number of tness eval- ical experiment, in the evolutionary optimization of uations. Some of the problems also require a xed size catalytic materials [1] for example, an evaluation for batch of solutions to be evaluated at a time. Traditional one generation of the algorithm takes between several pmreotvheodthseofsusrerleocgtaitneg minoddievlideuitahlserforreqturuiree eivnadliuvaidtiuoanl tpooiinmts- days and several weeks and costs thousands of euros. to be evaluated, or couple the batch size with the EA gener- The typical solution to this problem is performation size. We propose a queue based method for individual ing only a part of all evaluations using the true tselection based on active learning of a kriging model. Indi- ness function and using a response-surface model as viduals are selected using the con dence intervals predicted its replacement for the rest. This approach is called by the model, added to a queue and evaluated once the surrogate modeling. When using a surrogate model, queue length reaches the batch size. The method was tested only a small portion of all the points that need to be on several standard benchmark problems. Results show that evaluated is evaluated using the true objective functhe proposed algorithm is able to achieve a solution using tion (simulation or experiment) and for the rest, the signi cantly less evaluations of the true tness function. model prediction is assigned as the tness value. The dTihsceuesseedc.t of the batch size as well as other parameters is model is built using the information from the true tness evaluations. Since the tness function is assumed to be highly 1 Introduction non-linear the modeling methods used are non-linear as well. Some of the commonly used methods include Evolutionary optimization algorithms are a popular arti cial neural networks, radial basis functions, reclass of optimization techniques suitable for various gression trees, support vector machines or Gaussian optimization problems. One of their main advantages processes [3]. is the ability to nd optima of black-box functions { Furthermore, some experiments require a xed functions that are not explicitly de ned and only their number of samples to be processed at one time. This input/output behavior is known from previous evalu- presents its own set of challenges for adaptive sampling ations of a nite number of points in the input space. and is the main concern of this paper. We present an This is typical for applications in engineering, chem- evolutionary optimization method assisted by a variistry or biology, where the evaluation is performed in ant of a Gaussian-process-based interpolating model a form of computer simulation or physical experiment. called kriging. In order to best use the evaluation budThe main disadvantage for such applications is the get, our approach uses active learning methods in severy high number of evaluations of the objective func- lecting individuals to evaluate using the true tness tion (called tness function in the evolutionary op- function. A key feature of the approach is support timization context) needed for an evolutionary algo- for online and o ine batch evaluation with arbitrary rithm (EA) to reach the optimum. Even if the simu- batch size independent of the generation size of ? This work was supported by the Grant Agency of the EA. the Czech Technical University in Prague, grant No. The rest of the paper is organized as follows: in the SGS12/196/OHK3/3T/14 as well as the Czech Science following section we introduce the kriging surrogate Foundation grant 201/08/0802. model and its properties, in section 2 the methods of</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>coupling a model to the evolutionary optimization are
discussed, section 4 provides details of the proposed
method and nally, the results of testing the method
are presented and discussed in section 5.
2</p>
      <p>Model-assisted evolutionary
optimization
1. An initial Ni generations of the EA is performed,
yielding sets G1; : : : ; GNi of individuals (x; ft(x)),
ft being the true tness function.
2. The model M is trained on the individuals</p>
      <p>(x; ft(x)) 2 SiN=i1 Gi.
3. The tness function ft is replaced by a model
pre</p>
      <p>diction fM .
4. T generations are performed evaluating fM as the</p>
      <p>tness function.
5. One generation is performed using ft yielding</p>
      <p>a set Gj of individuals. (initially j = Ni + 1)
6. The model is retrained on the individuals
(x; ft(x)) 2 Sj</p>
      <p>i=1 Gi
7. Steps 4{6 are repeated until the optimum is
reached.</p>
      <p>
        Since the surrogate model used as a replacement for
the tness function in the EA is built using the
results of the true tness function evaluations, there are
two competing objectives. First, we need to get the
most information about the underlying relations in the
data, in order to build a precise model of the tness
function. If the model does not capture the features of
the tness function correctly, the optimization can get The amount of true tness evaluations in this
apstuck in a fake optimum or generally fail to converge to proach is dependent on the population size of the EA
a global one. Second, we have a limited budget for the and the frequency of control generations T , which can
true tness function evaluations. Using many points be xed or adaptively changed during the course of
from the input space to build a perfect model can re- the optimization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For problems requiring batched
quire more true tness evaluations than not employing evaluation this approach has the advantage of
evalua model at all. ating the whole generation, the size of which can be
      </p>
      <p>
        In the general use of surrogate modeling, such as set to the size of the evaluation batch. The main
disdesign space exploration, the process of select- advantage of the generation-based strategy is that not
ing points from the input space to evaluate and build all individuals in the control generation are necessarily
the model upon is called sampling [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Traditionally, bene cial to the model quality and the expensive true
the points to sample are selected upfront. Upfront tness evaluations are wasted.
sampling schemes are based on the theory of design
of experiments (DoE), e.g. a Latin hypercube design.
      </p>
      <p>
        When we don't know anything about the function we 2.2 Individual-based approach
are trying to model, it is better to use a small set of As opposed to the generation-based approach, in the
points as a base for an initial model, which is then it- individual-based strategy, the decision whether to
eratively improved using new samples, selected based evaluate a given point using the true tness function
on the information from previous function evaluations or the surrogate model is made for each individual
and the model itself. This approach is called adaptive separately.
sampling [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In model-based optimization in general, there are
      </p>
      <p>Using the surrogate model in an evolutionary op- several possible approaches to individual-based
samtimization algorithm, the adaptive sampling decisions pling. The most used approach in the evolutionary
change from selecting which points of the input space optimization is pre-selection. In each generation of the
to evaluate in order to improve the model to whether EA, number of points, which is a multiple of the
poputo evaluate a given point (selected by the EA) with lation size, is generated and evaluated using the model
the true tness function or not. There are two gen- prediction. The best of these individuals form the next
eral approaches to this choice: the generation-based generation of the algorithm. The optimization is
perapproach and the individual-based approach. We will formed as follows.
discuss both, with emphasis on the latter, a variant of
which is used in the method we propose in section 4.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Generation-based approach</title>
      <p>In the generation-based approach the decision whether
to evaluate an individual point with the true tness
function is made for the whole generation of the
evolutionary algorithm. The optimization takes the
following steps.
1. An initial set of points S is chosen and evaluated</p>
      <p>using the true tness function ft.
2. Model M is trained using the pairs (x; ft(x)) 2 S
3. A generation of the EA is run with the tness
function replaced by the model prediction fM and
a population Oi of size qp is generated and
evaluated with fM , where p is the desired population
size for the EA and q is the pre-screening ratio.</p>
      <p>
        Initially, i = 1.
4. A subset P O is selected according to a selection method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], most of them depend on the model used.
      </p>
      <p>criterion. The kriging model used in our proposed method o ers
5. Individuals from P are evaluated using the true a good estimate of the local model accuracy by
givtness function ft. ing an error estimate of its prediction. It is possible
6. The model M is retrained using S [ P, the set S to use the estimate itself as a measure of the model's
is replaced with S [ P, and the EA resumes from con dence in the prediction, or base a more complex
step 3. measure on the variance estimate. The measures that
were tested for use in our method will be described in
detail in section 4.1.</p>
      <p>
        Another possibility, called the best strategy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], is
to replace S with S [ O instead of just P in step 6
after re-evaluating the set O n P with fM (after the
model M has been re-trained). This also means using 3 Kriging meta-models
the population size qp in the EA. The kriging method is an interpolation method
origi
      </p>
      <p>
        The key piece of this approach is the selection crite- nating in geostatistics [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], based on modeling the
funcrion (or criteria) used to determine which individuals tion as a realization of a stochastic process [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
from set O should be used in the following generation In the ordinary kriging, which we use, the function
of the algorithm. There are a number of possibilities, is modeled as a realization of a stochastic process
let us discuss the most common.
      </p>
      <p>An obvious choice is selecting the best individuals Y (x) = 0 + Z(x) (1)
based on the tness value. This results in the region of
the optimum being sampled thoroughly, which helps where Z(x) is a stochastic process with mean 0 and
nding the true optimum. On the other hand, the re- covariance function 2 given by
agiopnosssfiabrlefrbometttehreocputrimreunmtocpatnimbuemmairsesende.gTleoctseadmapnlde covfY (x + h); Y (x)g = 2 (h); (2)
the areas of the tness landscape that were not ex- where 2 is the process variance for all x. The
correplored yet, space- lling criteria are used, either alone lation function (h) is then assumed to have the form
or in combination with the best tness selection or
otheArllcrtihteeripar.evious criteria have the fact that they (h) = exp " Xd ljhljpl # ; (3)
are concerned with the optimization itself in common. l=1
A di erent approach is to use the information about
the model, most importantly its accuracy, to decide
which points of the input space to evaluate with the
true tness function in order to most improve it. This
approach is sometimes called active learning.
where l; l = 1; : : : ; d, where d is the number of
dimensions, are the correlation parameters. The correlation
function depends on the di erence of the two points
and has the intuitive property of being equal to 1 if
h = 0 and tending to 0 when h ! 1. The l
parameters determine how fast the correlation tends to zero
2.3 Active learning in each coordinate direction and the pl determines the
smoothness of the function.</p>
      <p>Active learning is an approach that tries to maxi- The ordinary kriging predictor based on n sample
mize the amount of insight about the modeled function points fx1; : : : ; xng with values y = (y1; : : : ; yn)0 is
gained from its evaluation while minimizing the num- then given by
ibnerthoef egveanleuraatlionesldneocfessusarrryo.gaTtheemmoedtehliondgs aasreanuseefd- y^(x) = ^0 + (x)0 1(y ^01); (4)
cient adaptive sampling strategy. The terms adap- where (x)0 = ( (x x1); : : : ; (x xn)), is an
tive sampling and active learning are often used inter- n n matrix with elements (xi xj ), and
changeably. We will use the term active learning for
the methods based on the characteristics of the sur- 10 1y
rogate model itself, such as accuracy, with the goal of ^0 = 10 11 : (5)
minimizing the model prediction error either globally An important feature of the kriging model is that
or, more importantly, in the area of the input space apart from the prediction value it can estimate the
the EA is exploring. prediction error as well. The kriging predictor error in</p>
      <p>The active learning methods are most often based point x is given by
on the local model prediction error, such as
crossvpaelniddaetnitonof etrhreorm. oAdeltlh,ofourghexasmompele mtheethLoOdLs Aa-rVeorinodneo-i s2(x) = ^2 1 0 1 + (1 0 1 )2 (6)
where the kriging variance is estimated as
6. Steps 4 and 5 are repeated until the goal is</p>
      <p>reached, or a stall condition is ful lled.</p>
      <p>In this section we will describe the proposed method
for kriging-model-assisted evolutionary optimization
with batch tness evaluation. Our main goal was to
decouple the true tness function sampling from the
EA iterations based on an assumption that requiring
a speci c number of true tness evaluations in every
generations of the EA forces unnecessary sampling.</p>
      <p>In the generation-based approach, some of the 4.1 Measures of estimated improvement
points may be unnecessary to evaluate, as they will To estimate the improvement, which evaluation of
not bring any new information to the surrogate model. a given point will bring, we can use several measures.
The individual-based approach is better suited for the The three measures introduced here are all based on
task, as it chooses those points from each generation, the prediction error estimate of the kriging model. The
which are estimated to be the most valuable for the goal of these measures is to prefer the points that help
model. There is still the problem of performing a given improve the model in regions explored by the EA.
number of evaluations in every generation, although Each of the measure's results for a given point are
there might not be enough valuable points to select compared with a threshold and when the estimated
from. improvement is above the threshold, the point is
eval</p>
      <p>The method we propose achieves the desired de- uated using the true tness function.
coupling by introducing an evaluation queue. The
evolutionary algorithm uses the model prediction at all Standard deviation (STD) is the simplest measure we
times and when a point, in which the model's con- tested. It is computed directly from the error as its
dence in its prediction is low, is encountered, it is square root
points in the queue, all the points in it are evaluated ST D(x) = qs^2M (x): (8)
added to the evaluation queue. Once there are enough
and the model is re-trained using the results. The
optimization takes the following course.
1. Initial set S of b samples is selected using a chosen
initial design strategy and evaluated using the true</p>
      <p>tness function ft.
2. An initial kriging model M is trained using pairs</p>
      <p>(x; ft(x)) 2 S.
3. The evolutionary algorithm is started, with the</p>
      <p>model prediction fM as the tness function.
4. For every prediction fM (x) = y^M (x), an estimated
improvement measure c(s2M (x)) is computed from
the error estimate s2M (x). If c(s2M (x)) &gt; t, an
improvement threshold, the point is added to the
evaluation queue Q.
5. If the queue size jQj b, the batch size, all points
x 2 Q are evaluated, the set S is replaced by S [
f(x; ft(x)g and the EA is resumed.</p>
      <p>The STD captures only the model's estimate of the
error of its own prediction (based on the distance from
the known samples). As such, it does not take into
account the value of the prediction itself and can be
considered a measure of the model accuracy.</p>
      <p>
        Probability of improvement (POI) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses the fact,
that the kriging prediction is a Gaussian process and
the prediction in a single point is therefore a normally
distributed random variable Y (x) with a mean and
variance given by the kriging predictor. If we choose a
target T (based on the goal of the optimization), we
can estimate the probability that a given point will
have a value y(x) T as a probability that Y (x) T .
      </p>
      <p>
        The probability of improvement is therefore de ned as
Expected improvement (EI) [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] is based on
estimating, as the name suggests, the improvement we expect
to achieve over the current minimum fmin, if a given
point is evaluated. As before, we assume the model
prediction in point x to be a normally distributed
random variable Y (x) with a mean and variance given by
the kriging predictor. We achieve an improvement I
over fmin if Y (x) = fmin I. As shown in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] the
expected value of I can be obtained using the likelihood
of achieving the improvement
where is the cumulative distribution function of the e ect of the parameters and also investigated the
the standard normal distribution. As opposed to the optimal choice of batch size for problems where an
STD, the POI takes into account the prediction mean upfront choice is possible. In this section we discuss
(value) as well as its variance (error estimate). The the tests performed and their results.
area of the current optimum is therefore preferred over For testing, we used the genetic algorithm
implethe rest of the input space. When the area of the mentation from the global optimization toolbox for the
current optimum is sampled enough, the variance be- Matlab environment and the implementation of an
orcomes very small and the term T s2My^M(x()x) becomes ex- dinary kriging model from the SUMO Toolbox [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
tremely negative, encouraging the sampling of less ex- parameters of the supporting methods, e.g. the genetic
plored areas. algorithm itself, were kept on their default values
provided by the implementation.
      </p>
      <p>Because the EA itself is not deterministic, each test
was performed 20 times and the results we present are
statistical measures of this sample. As a performance
measure we use the number of true tness evaluations
used to reach a set goal in all tests. The main reason to
use this measure is that in model-assisted optimization
the computational cost of everything except the true</p>
      <p>tness evaluation is minimal in comparison. We also
track the proportion of the 20 runs that reached the
goal before various limits (time, stall, etc.) took e ect.</p>
      <p>The domain is restricted to a hypercube 10 xi
10; i = 1; : : : ; n. The function has one global optimum
f (x) = 0 in point x = 0. The De Jong's function was
5 Results and discussion primarily used as a proof of concept test.
As a second benchmark, we used the Rosenbrock's
The proposed method was tested using simulations on function, also called Rosenbrock's valley. The global
three standard benchmark functions. We studied the optimum is inside a long parabolic shaped valley,
model evolution during the course of the optimization, which is easy to nd. Finding the global optimum in
p
2 s2M (x) I=0
1</p>
      <p>Z I=1
exp
(fmin</p>
      <p>I</p>
      <p>y^M (x)2
2s2M (x)
Expected improvement is the expected value of the
improvement found by integrating over this density.</p>
      <p>The resulting measure EI is de ned as</p>
      <p>EI(x) = E(I) = s2M (x)[u (u) + (u)]; (u)]; (11)
where
u = fmin
and and are the cumulative distribution function
and the probability distribution functions of the
normal distribution respectively. The expected
improvement has an important advantage over the POI: it does
not require a preset target T , which can be
detrimental to the POI's successful sample selection when set
too high or too low.</p>
      <p>All three measures have an important weakness of
being based on the model prediction. If the modeled
function is deceptive, the model can be very
inaccurate while estimating a low variance. A good initial
sampling of the tness function is therefore very
important. The success of the whole method is dependent
on the model's ability to capture the response surface
correctly and thus on the function itself.
Since the evolutionary algorithms and optimization
heuristics in general are often used on black-box
optimization, where the properties of the objective
function are unknown, it is not straightforward to asses
their quality on real world problems. It has therefore
become a standard practice to test optimization
algorithms and their modi cations on specially designed
testing problems.</p>
      <p>These benchmark functions are explicitly de ned
and their properties and optima are known. They are
often designed to exploit typical weaknesses of
optimization algorithms in nding the global optimum.</p>
      <p>
        We used three functions found in literature [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Although we performed our tests in two dimensions we
give general multi-dimensional de nitions of the
functions.
      </p>
      <p>First of the functions used is the De Jong's
function. It is one of the simplest benchmarks, it is
continuous, convex and unimodal and is de ned as
f (x) =
n
X xi2
i=1
(13)</p>
      <p>true fitness
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2</p>
      <p>initial model and samples
Th0e.2 domain of the function is restricted to a
hypercube 2 2 xi 2; i = 14; : : : ; n. It has o6ne globagelneration 8
optimum f (x) = 0 in x = 1.</p>
      <p>Finally, the third function used as a benchmark is 5.3
the Rastrigin's function. It is based on the De Jong's
function with addition of cosine modulation, which
produces a high number of regularly distributed local
minima and makes the function highly multimodal.</p>
      <p>The function is de ned as
In order to compare the measures of estimated
improvement, we performed simulations on each
benchmark using each improvement estimate measure with
di erent values of the threshold. The batch size was
n set to 40 { generally found to be the ideal batch size
f (x) = 10n + X[xi2 10 cos(2 xi)] (15) { for these experiments. For comparison, we also
peri=1 formed tests with the standard genetic algorithm
with1T;h:e: : d;nom.Taihne gislorbeasltroipctteimd utmo f (5x) = 0xiis in x 5=, 1i. = tohuet taambloed1e.l. Results of these simulations are shown in
The De Jong's function proved to be simple to
optimize and the threshold setting did not have almost
5.2 Model evolution any e ect. Only when using the standard deviation,
setting the threshold too low lead to an increase in the
As the basic illustration of how the model evolves dur- number of evaluations, as too many points were
evaling the course of the EA, let us consider an example uated, although the model prediction in those points
test run using the Rosenbrock's function. For this ex- was accurate enough.
periment we set the batch size of 15, used the STD The same is true for the STD measure used on the
measure of estimated improvement with a threshold Rosenbrock's function, where setting the threshold too
of 0:001 and set the target tness value of 0:001 as well. low leads to a big increase in variance of the results.
The target was reached at the point (0:9909; 0:9824) Interestingly, setting the threshold too high leads to
using 90 true tness evaluations. A genetic algorithm a decrease in the number of evaluations, but also in
without a surrogate model needed approximately the success rate of reaching the goal. The POI and EI
3000 evaluations to reach the goal in several test runs. are more stable in terms of true tness evaluations,</p>
      <p>The model evolution is shown in gure 1. The true but have worse overall success rate. The results are
tness function is shown on the left, the initial model shown in gure 2
is in the middle and the nal model on the right. The The Rastrigin's function proved di cult to
optipoints where the true tness function was sampled are mize. This is probably due to the locality of the
krigdenoted with circles an the optimum is marked with ing model and the high number of local minima of the
a star. function. Overall the STD measure is the most
suc400
350
300
cessful. POI and EI lead to bad sampling of the model
and failure to reach the optimum.</p>
      <p>An interesting general result is that the more
complex measures of estimated improvement perform
worse than the simple standard deviation estimate.</p>
      <p>This indicates that the goal of active learning
selection criteria in the evolutionary optimization should
be the best possible sampling for overall model
accuracy, as opposed to trying to improve the accuracy in
the best regions of the input space. Both the POI and
EI are design to select next best points to reach the
optimum. Since in our case, this is handled by the EA
itself, the measures bring an unnecessary noise to the
estimate of the model accuracy. The results also show
that the best measure selection is dependent on the
optimized function.
5.4</p>
    </sec>
    <sec id="sec-3">
      <title>Batch size</title>
      <p>In order to study the batch size e ect on the
optimization, a number of experiments were performed
with di erent batch sizes. The only option to achieve
2000
1800
1600
1400
s
itno1200
a
lau1000
v
e
rue800
t
600
400
200
1
0.9
0.8
0.7
0.6 deh</p>
      <p>c
0.5 rae</p>
      <p>l
0.4 oga
0.3
0.2
0.1
0
05 10 15 20 25 30 40 ba5tc0h size60 70 80 90 100</p>
      <p>(a) No model
160
140
120
itsn100
o
a
lau80
v
e
e
tru60
160
140
120
itsn100
o
a
lau80
v
e
e
tru60
1
0.9
0.8
0.7
0.6 deh</p>
      <p>c
0.5 rae</p>
      <p>l
0.4 gao
0.3
40 0.2
20 0.1</p>
      <p>0
05 10 15 20 25 30 40 ba5tc0h size60 70 80 90 100
(b) STD
1
0.9
0.8
0.7
0.6 deh</p>
      <p>c
0.5 rae</p>
      <p>l
0.4 oga
0.3
40 0.2
20 0.1</p>
      <p>0
05 10 15 20 25 30 40 ba5tc0h size60 70 80 90 100</p>
      <p>(c) POI
median value
interquartile range</p>
      <p>goal reached
Fig. 3. Batch size e ect on Rosenbrock's function
optimization - true tness evaluations and proportion of runs
raching the goal.
a given batch size is to set the population size in a
standard GA, in our method however, the settings are
independent so a population size of 30, which proved
e cient, was used in all of the tests.</p>
      <p>The results on the De Jong's functions show that
apart from small batch sizes (up to 10), the
optimization is successful in all runs. Our method helps
stabilize the EA for small batch sizes and for batch
sizes above 15 the algorithm nds the optimum using
6</p>
      <p>Conclusions
a single batch. For a standard GA this strong depen- ture information about the model accuracy
improvedence arises for batch sizes above 40 and the algorithm ment expected by sampling a given point. In the
exreaches the goal in the second generation, evaluating periments with the batch size we found that small
twice as many points. batch sizes perform better when the objective function</p>
      <p>For the Rosenbrock's function we get the intuitive is simple, while causing bad initial sampling of more
result that setting the batch size too low leads to more complex functions, suggesting using a larger initial
evaluations or a failure to reach the goal, while large sample. The future development of this work should
batch size do not improve the results and waste true include experiments using di erent batch sizes for
initness evaluations. For this function the POI proved tial sampling and comparison of the method with other
to be the most e cient measure. The comparison is ways of employing a surrogate model in the
optimizashown in gure 3. Overall the method reduces the tion as well as other model-assisted optimization
number of true evaluations from hundreds to tens for methods.
the Rosenbrock's function, while slightly reducing the The method brings promising results, reducing the
success rate of the computation. number of true tness evaluations to a large degree</p>
      <p>The Rastrigin's function proved di cult to opti- for some of the benchmark functions. On the other
mize even without a surrogate model. With the model, hand, its success is highly dependent on the optimized
the STD achieved the best results reducing the number function and its initial sampling.
of true tness evaluations approximately three times
in the area of the highest success rate with batch size References
of 70. The other two measures were ine ective. We
attribute the method's di culty optimizing the
Rastrigin's function to the fact that the kriging model is local
and thus it requires a large number of samples to
capture the function's complicated behavior in the whole
input space. When the initial sampling is misleading,
which is more likely for the Rastrigin's function, both
the model prediction and estimated improvement are
wrong.</p>
      <p>The results suggest that best batch size and best
estimated improvement measure are highly
problemdependent. The proposed method is also very
sensitive to good initial sample selection, which is the most
usual reason for it to fail to nd the optimum. The
experimental results support the intuition that batches
too small are bad for the initial sampling of the model
and batches too large slow down the model
improvement by evaluating points that it would not be
necessary to evaluate with smaller batches. This suggests
using a larger initial sample and a small batch for the
rest of the optimization.</p>
      <p>In this paper we presented a method for model-assisted
evolutionary optimization with a xed batch size
requirement. To decouple the sampling from the EA
iterations and support an individual-based approach while
keeping a xed evaluation batch size, the method uses
an evaluation queue. The candidates for true tness
evaluations are selected by an active learning method
using a measure of estimated improvement of the
model quality based on the model prediction error
estimate.</p>
      <p>The results suggest using simple methods for
improvement estimate in active learning, which only
cap</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Baerns</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Holena: Combinatorial development of solid catalytic materials: design of high-throughput experiments, data analysis, data mining</article-title>
          .
          <source>Catalytic Science Series</source>
          . Imperial College Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>K.</given-names>
            <surname>Crombecq</surname>
          </string-name>
          , L. De Tommasi,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gorissen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Dhaene: A novel sequential design strategy for global surrogate modeling</article-title>
          .
          <source>In Winter Simulation Conference, WSC '09, Winter Simulation Conference</source>
          ,
          <year>2009</year>
          ,
          <volume>731</volume>
          {
          <fpage>742</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Gorissen</surname>
          </string-name>
          :
          <article-title>Grid-enabled adaptive surrogate modeling for computer aided engineering</article-title>
          .
          <source>PhD Thesis</source>
          , Ghent University, University of Antwerp,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Gorissen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Couckuyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Demeester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dhaene</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Crombecq: A surrogate modeling and adaptive sampling toolbox for computer based design</article-title>
          .
          <source>The Journal of Machine Learning Research 11</source>
          ,
          <year>2010</year>
          ,
          <year>2051</year>
          {
          <year>2055</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. L. Graning,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sendho</surname>
          </string-name>
          :
          <article-title>E cient evolutionary optimization using individual-based evolution control and neural networks: A comparative study</article-title>
          .
          <source>In ESANN</source>
          ,
          <year>2005</year>
          ,
          <volume>273</volume>
          {
          <fpage>278</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sendho</surname>
          </string-name>
          :
          <article-title>Managing approximate models in evolutionary aerodynamic design optimization</article-title>
          .
          <source>In Evolutionary Computation</source>
          ,
          <year>2001</year>
          .
          <article-title>Proceedings of the 2001 Congress on</article-title>
          , vol.
          <volume>1</volume>
          , IEEE,
          <year>2001</year>
          ,
          <volume>592</volume>
          {
          <fpage>599</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>D.R.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>A taxonomy of global optimization methods based on response surfaces</article-title>
          .
          <source>Journal of Global Optimization</source>
          ,
          <volume>21</volume>
          :
          <fpage>345</fpage>
          {
          <fpage>383</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schonlau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.J.</given-names>
            <surname>Welch</surname>
          </string-name>
          :
          <article-title>E cient global optimization of expensive black-box functions</article-title>
          .
          <source>Journal of Global Optimization</source>
          <volume>13</volume>
          ,
          <year>1998</year>
          ,
          <volume>455</volume>
          {
          <fpage>492</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Matheron: Principles of geostatistics</article-title>
          .
          <source>Economic Geology</source>
          <volume>58</volume>
          (
          <issue>8</issue>
          ),
          <year>1963</year>
          ,
          <volume>1246</volume>
          {
          <fpage>1266</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Molga</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Smutnicki: Test functions for optimization needs</article-title>
          .
          <source>Test Functions for Optimization Needs</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J. Sacks</surname>
            ,
            <given-names>W. J.</given-names>
          </string-name>
          <string-name>
            <surname>Welch</surname>
            , T. J. Mitchell,
            <given-names>H. P.</given-names>
          </string-name>
          <string-name>
            <surname>Wynn</surname>
          </string-name>
          <article-title>: Design and analysis of computer experiments</article-title>
          .
          <source>Statistical Science</source>
          <volume>4</volume>
          (
          <issue>4</issue>
          ),
          <year>1989</year>
          ,
          <volume>409</volume>
          {
          <fpage>423</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>