=Paper=
{{Paper
|id=None
|storemode=property
|title=Evolutionary optimization with active learning of surrogate models and fixed evaluation batch
|pdfUrl=https://ceur-ws.org/Vol-990/paper6.pdf
|volume=Vol-990
|dblpUrl=https://dblp.org/rec/conf/itat/CharyparH12
}}
==Evolutionary optimization with active learning of surrogate models and fixed evaluation batch==
<pdf width="1500px">https://ceur-ws.org/Vol-990/paper6.pdf</pdf>
<pre>
Evolutionary optimization with active learning of surrogate models
                 and fixed evaluation batch size?

                                        Viktor Charypar1 and Martin Holeňa2
                                              1
                                                 Czech Technical University
                                  Faculty of Nuclear Sciences and Physical Engineering
                                      Břehová 7, 115 19 Praha 1, Czech Republic
                                                 charyvik@fjfi.cvut.cz
                                             2
                                               Institute of Computer Science
                                      Academy of Sciences of the Czech Republic
                                 Pod vodárenskou věžı́ 2, 182 07 Praha, Czech Republic
                                                    martin@cs.cas.cz

Abstract. Evolutionary optimization is often applied to         lation used as the objective function takes minutes to
problems, where simulations or experiments used as the fit-     finish, the traditional approach becomes impractical.
ness function are expensive to run. In such cases, surro-       When the objective function is evaluated using a phys-
gate models are used to reduce the number of fitness eval-      ical experiment, in the evolutionary optimization of
uations. Some of the problems also require a fixed size         catalytic materials [1] for example, an evaluation for
batch of solutions to be evaluated at a time. Traditional
                                                                one generation of the algorithm takes between several
methods of selecting individuals for true evaluation to im-
                                                                days and several weeks and costs thousands of euros.
prove the surrogate model either require individual points
to be evaluated, or couple the batch size with the EA gener-        The typical solution to this problem is perform-
ation size. We propose a queue based method for individual      ing only a part of all evaluations using the true fit-
selection based on active learning of a kriging model. Indi-    ness function and using a response-surface model as
viduals are selected using the confidence intervals predicted   its replacement for the rest. This approach is called
by the model, added to a queue and evaluated once the           surrogate modeling. When using a surrogate model,
queue length reaches the batch size. The method was tested      only a small portion of all the points that need to be
on several standard benchmark problems. Results show that       evaluated is evaluated using the true objective func-
the proposed algorithm is able to achieve a solution using      tion (simulation or experiment) and for the rest, the
significantly less evaluations of the true fitness function.
                                                                model prediction is assigned as the fitness value. The
The effect of the batch size as well as other parameters is
discussed.
                                                                model is built using the information from the true fit-
                                                                ness evaluations.
                                                                    Since the fitness function is assumed to be highly
1     Introduction                                              non-linear the modeling methods used are non-linear
                                                                as well. Some of the commonly used methods include
Evolutionary optimization algorithms are a popular              artificial neural networks, radial basis functions, re-
class of optimization techniques suitable for various           gression trees, support vector machines or Gaussian
optimization problems. One of their main advantages             processes [3].
is the ability to find optima of black-box functions –              Furthermore, some experiments require a fixed
functions that are not explicitly defined and only their        number of samples to be processed at one time. This
input/output behavior is known from previous evalu-             presents its own set of challenges for adaptive sampling
ations of a finite number of points in the input space.         and is the main concern of this paper. We present an
This is typical for applications in engineering, chem-          evolutionary optimization method assisted by a vari-
istry or biology, where the evaluation is performed in          ant of a Gaussian-process-based interpolating model
a form of computer simulation or physical experiment.           called kriging. In order to best use the evaluation bud-
    The main disadvantage for such applications is the          get, our approach uses active learning methods in se-
very high number of evaluations of the objective func-          lecting individuals to evaluate using the true fitness
tion (called fitness function in the evolutionary op-           function. A key feature of the approach is support
timization context) needed for an evolutionary algo-            for online and offline batch evaluation with arbitrary
rithm (EA) to reach the optimum. Even if the simu-              batch size independent of the generation size of
?
    This work was supported by the Grant Agency of              the EA.
    the Czech Technical University in Prague, grant No.             The rest of the paper is organized as follows: in the
    SGS12/196/OHK3/3T/14 as well as the Czech Science           following section we introduce the kriging surrogate
    Foundation grant 201/08/0802.                               model and its properties, in section 2 the methods of
34      Viktor Charypar, Martin Holeňa

coupling a model to the evolutionary optimization are        1. An initial Ni generations of the EA is performed,
discussed, section 4 provides details of the proposed           yielding sets G1 , . . . , GNi of individuals (x, ft (x)),
method and finally, the results of testing the method           ft being the true fitness function.
are presented and discussed in section 5.                    2. The model M  SNisi trained on the individuals
                                                                (x, ft (x)) ∈ i=1   Gi .
                                                             3. The fitness function ft is replaced by a model pre-
2     Model-assisted evolutionary                               diction fM .
      optimization                                           4. T generations are performed evaluating fM as the
                                                                fitness function.
Since the surrogate model used as a replacement for          5. One generation is performed using ft yielding
the fitness function in the EA is built using the re-           a set Gj of individuals. (initially j = Ni + 1)
sults of the true fitness function evaluations, there are    6. The model isSretrained on the individuals
                                                                               j
two competing objectives. First, we need to get the             (x, ft (x)) ∈ i=1 Gi
most information about the underlying relations in the       7. Steps 4–6 are repeated until the optimum is
data, in order to build a precise model of the fitness          reached.
function. If the model does not capture the features of
the fitness function correctly, the optimization can get         The amount of true fitness evaluations in this ap-
stuck in a fake optimum or generally fail to converge to    proach is dependent on the population size of the EA
a global one. Second, we have a limited budget for the      and the frequency of control generations T , which can
true fitness function evaluations. Using many points        be fixed or adaptively changed during the course of
from the input space to build a perfect model can re-       the optimization [6]. For problems requiring batched
quire more true fitness evaluations than not employing      evaluation this approach has the advantage of evalu-
a model at all.                                             ating the whole generation, the size of which can be
    In the general use of surrogate modeling, such as       set to the size of the evaluation batch. The main dis-
design space exploration, the process of select-            advantage of the generation-based strategy is that not
ing points from the input space to evaluate and build       all individuals in the control generation are necessarily
the model upon is called sampling [3]. Traditionally,       beneficial to the model quality and the expensive true
the points to sample are selected upfront. Upfront          fitness evaluations are wasted.
sampling schemes are based on the theory of design
of experiments (DoE), e.g. a Latin hypercube design.
When we don’t know anything about the function we           2.2   Individual-based approach
are trying to model, it is better to use a small set of     As opposed to the generation-based approach, in the
points as a base for an initial model, which is then it-    individual-based strategy, the decision whether to
eratively improved using new samples, selected based        evaluate a given point using the true fitness function
on the information from previous function evaluations       or the surrogate model is made for each individual
and the model itself. This approach is called adaptive      separately.
sampling [3].                                                   In model-based optimization in general, there are
    Using the surrogate model in an evolutionary op-        several possible approaches to individual-based sam-
timization algorithm, the adaptive sampling decisions       pling. The most used approach in the evolutionary
change from selecting which points of the input space       optimization is pre-selection. In each generation of the
to evaluate in order to improve the model to whether        EA, number of points, which is a multiple of the popu-
to evaluate a given point (selected by the EA) with         lation size, is generated and evaluated using the model
the true fitness function or not. There are two gen-        prediction. The best of these individuals form the next
eral approaches to this choice: the generation-based        generation of the algorithm. The optimization is per-
approach and the individual-based approach. We will         formed as follows.
discuss both, with emphasis on the latter, a variant of
which is used in the method we propose in section 4.         1. An initial set of points S is chosen and evaluated
                                                                using the true fitness function ft .
2.1   Generation-based approach                              2. Model M is trained using the pairs (x, ft (x)) ∈ S
                                                             3. A generation of the EA is run with the fitness
In the generation-based approach the decision whether           function replaced by the model prediction fM and
to evaluate an individual point with the true fitness           a population Oi of size qp is generated and eval-
function is made for the whole generation of the evo-           uated with fM , where p is the desired population
lutionary algorithm. The optimization takes the fol-            size for the EA and q is the pre-screening ratio.
lowing steps.                                                   Initially, i = 1.
                                                                                   Evolutionary optimization       35

 4. A subset P ⊂ O is selected according to a selection     method [2], most of them depend on the model used.
    criterion.                                              The kriging model used in our proposed method offers
 5. Individuals from P are evaluated using the true         a good estimate of the local model accuracy by giv-
    fitness function ft .                                   ing an error estimate of its prediction. It is possible
 6. The model M is retrained using S ∪ P, the set S         to use the estimate itself as a measure of the model’s
    is replaced with S ∪ P, and the EA resumes from         confidence in the prediction, or base a more complex
    step 3.                                                 measure on the variance estimate. The measures that
                                                            were tested for use in our method will be described in
     Another possibility, called the best strategy [5], is detail in section 4.1.
to replace S with S ∪ O instead of just P in step 6
after re-evaluating the set O \ P with fM (after the
model M has been re-trained). This also means using 3 Kriging meta-models
the population size qp in the EA.
                                                            The kriging method is an interpolation method origi-
     The key piece of this approach is the selection crite-
                                                            nating in geostatistics [9], based on modeling the func-
rion (or criteria) used to determine which individuals
                                                            tion as a realization of a stochastic process [11].
from set O should be used in the following generation
                                                                In the ordinary kriging, which we use, the function
of the algorithm. There are a number of possibilities,
                                                            is modeled as a realization of a stochastic process
let us discuss the most common.
     An obvious choice is selecting the best individuals                       Y (x) = µ0 + Z(x)                   (1)
based on the fitness value. This results in the region of
the optimum being sampled thoroughly, which helps where Z(x) is a stochastic process with mean 0 and
                                                                                    2
finding the true optimum. On the other hand, the re- covariance function σ ψ given by
gions far from the current optimum are neglected and
                                                                        cov{Y (x + h), Y (x)} = σ 2 ψ(h),          (2)
a possible better optimum can be missed. To sample
the areas of the fitness landscape that were not ex- where σ 2 is the process variance for all x. The corre-
plored yet, space-filling criteria are used, either alone lation function ψ(h) is then assumed to have the form
or in combination with the best fitness selection or                                   " d               #
other criteria.                                                                            X
     All the previous criteria have the fact that they                    ψ(h) = exp −         θl |hl |pl ,        (3)
are concerned with the optimization itself in common.                                      l=1

A different approach is to use the information about where θl , l = 1, . . . , d, where d is the number of dimen-
the model, most importantly its accuracy, to decide sions, are the correlation parameters. The correlation
which points of the input space to evaluate with the function depends on the difference of the two points
true fitness function in order to most improve it. This and has the intuitive property of being equal to 1 if
approach is sometimes called active learning.               h = 0 and tending to 0 when h → ∞. The θl param-
                                                            eters determine how fast the correlation tends to zero
2.3 Active learning                                         in each coordinate direction and the pl determines the
                                                            smoothness of the function.
Active learning is an approach that tries to maxi-              The ordinary kriging predictor based on n sample
                                                                                                                  0
mize the amount of insight about the modeled function points {x1 , . . . , xn } with values y = (y1 , . . . , yn ) is
gained from its evaluation while minimizing the num- then given by
ber of evaluations necessary. The methods are used
                                                                       ŷ(x) = µˆ0 + ψ(x)0 Ψ−1 (y − µˆ0 1),        (4)
in the general field of surrogate modeling as an ef-
ficient adaptive sampling strategy. The terms adap- where ψ(x)0 = (ψ(x − x1 ), . . . , ψ(x − xn )), Ψ is an
tive sampling and active learning are often used inter- n × n matrix with elements ψ(xi − xj ), and
changeably. We will use the term active learning for
the methods based on the characteristics of the sur-                                    10 Ψ−1 y
                                                                                 µˆ0 = 0 −1 .                      (5)
rogate model itself, such as accuracy, with the goal of                                 1Ψ 1
minimizing the model prediction error either globally
                                                                An important feature of the kriging model is that
or, more importantly, in the area of the input space
                                                            apart from the prediction value it can estimate the
the EA is exploring.
                                                            prediction error as well. The kriging predictor error in
     The active learning methods are most often based
                                                            point x is given by
on the local model prediction error, such as cross-
                                                                                              (1 − ψ 0 Ψ−1 ψ)2
                                                                                                              
validation error. Although some methods are inde-                2        2        0 −1
pendent of the model, for example the LOLA-Voronoi              s (x) = σ̂ 1 − ψ Ψ ψ +                             (6)
                                                                                                   10 Ψ−1 1
36      Viktor Charypar, Martin Holeňa

where the kriging variance is estimated as                    6. Steps 4 and 5 are repeated until the goal is
                                                                 reached, or a stall condition is fulfilled.
                  (y − µˆ0 1)Ψ−1 (y − µˆ0 1)
            σ̂ 2 =                           .       (7)
                               n                               The b and t parameters, as well as the func-
The parameters θl and pl can be estimated by maxi-         tion c(s2 ), are chosen before running the optimization.
mizing the likelihood function of the observed data.       Note that the evaluation in step 5 can be performed
   For the derivation of the equations 4 - 7 as well as    either immediately, i.e. online, or offline. In offline eval-
the MLE estimation of the parameters the reader may        uation, after filling the evaluation queue, the EA is
consult a standard stochastic process based derivation     stopped when the current iteration is finished and the
by Sacks et al. in [11] or a different approach given by   control is returned to the user. After obtaining the fit-
Jones in [7].                                              ness values for the samples in the sample queue (e.g.
                                                           by performing an experiment), the user can manually
                                                           add the samples and resume the EA from the last gen-
4 Method description                                       eration.
                                                               While the choice of the parameters will be dis-
In this section we will describe the proposed method
                                                           cussed in section 5, let us introduce three different
for kriging-model-assisted evolutionary optimization
                                                           measures of estimated improvement in the model pre-
with batch fitness evaluation. Our main goal was to
                                                           diction c(s2 (x)) which we tested – the standard devia-
decouple the true fitness function sampling from the
                                                           tion, the probability of improvement and the expected
EA iterations based on an assumption that requiring
                                                           improvement.
a specific number of true fitness evaluations in every
generations of the EA forces unnecessary sampling.
     In the generation-based approach, some of the 4.1 Measures of estimated improvement
points may be unnecessary to evaluate, as they will To estimate the improvement, which evaluation of
not bring any new information to the surrogate model. a given point will bring, we can use several measures.
The individual-based approach is better suited for the The three measures introduced here are all based on
task, as it chooses those points from each generation, the prediction error estimate of the kriging model. The
which are estimated to be the most valuable for the goal of these measures is to prefer the points that help
model. There is still the problem of performing a given improve the model in regions explored by the EA.
number of evaluations in every generation, although            Each of the measure’s results for a given point are
there might not be enough valuable points to select compared with a threshold and when the estimated
from.                                                      improvement is above the threshold, the point is eval-
     The method we propose achieves the desired de- uated using the true fitness function.
coupling by introducing an evaluation queue. The evo-
lutionary algorithm uses the model prediction at all Standard deviation (STD) is the simplest measure we
times and when a point, in which the model’s con- tested. It is computed directly from the error as its
fidence in its prediction is low, is encountered, it is square root
added to the evaluation queue. Once there are enough
                                                                                         q
points in the queue, all the points in it are evaluated                       ST D(x)  =    sˆ2M (x).                (8)
and the model is re-trained using the results. The op- The STD captures only the model’s estimate of the
timization takes the following course.                     error of its own prediction (based on the distance from
                                                           the known samples). As such, it does not take into
  1. Initial set S of b samples is selected using a chosen
                                                           account the value of the prediction itself and can be
      initial design strategy and evaluated using the true
                                                           considered a measure of the model accuracy.
      fitness function ft .
  2. An initial kriging model M is trained using pairs Probability of improvement (POI) [7] uses the fact,
      (x, ft (x)) ∈ S.                                     that the kriging prediction is a Gaussian process and
  3. The evolutionary algorithm is started, with the the prediction in a single point is therefore a normally
      model prediction fM as the fitness function.         distributed random variable Y (x) with a mean and
  4. For every prediction fM (x) = ŷM (x), an estimated
                                 2                         variance given by the kriging predictor. If we choose a
      improvement measure c(sM (x)) is computed from
                            2           2                  target T (based on the goal of the optimization), we
      the error estimate sM (x). If c(sM (x)) > t, an im-
                                                           can estimate the probability that a given point will
      provement threshold, the point is added to the
                                                           have a value y(x) ≤ T as a probability that Y (x) ≤ T .
      evaluation queue Q.
                                                           The probability of improvement is therefore defined as
  5. If the queue size |Q| ≥ b, the batch size, all points
      x ∈ Q are evaluated, the set S is replaced by S ∪
                                                                                                     
                                                                                         T − ŷM (x)
      {(x, ft (x)} and the EA is resumed.                                 P OI(x) = Φ                    ,           (9)
                                                                                           s2M (x)
                                                                                  Evolutionary optimization        37

where Φ is the cumulative distribution function of         the effect of the parameters and also investigated the
the standard normal distribution. As opposed to the        optimal choice of batch size for problems where an
STD, the POI takes into account the prediction mean        upfront choice is possible. In this section we discuss
(value) as well as its variance (error estimate). The      the tests performed and their results.
area of the current optimum is therefore preferred over        For testing, we used the genetic algorithm imple-
the rest of the input space. When the area of the          mentation from the global optimization toolbox for the
current optimum is sampled enough, the variance be-        Matlab environment and the implementation of an or-
comes very small and the term T −ŷ  M (x)
                                  s2M (x)
                                           becomes ex-     dinary kriging model from the SUMO Toolbox [4]. The
tremely negative, encouraging the sampling of less ex-     parameters of the supporting methods, e.g. the genetic
plored areas.                                              algorithm itself, were kept on their default values pro-
                                                           vided by the implementation.
Expected improvement (EI) [7, 8] is based on estimat-          Because the EA itself is not deterministic, each test
ing, as the name suggests, the improvement we expect       was performed 20 times and the results we present are
to achieve over the current minimum fmin , if a given      statistical measures of this sample. As a performance
point is evaluated. As before, we assume the model         measure we use the number of true fitness evaluations
prediction in point x to be a normally distributed ran-    used to reach a set goal in all tests. The main reason to
dom variable Y (x) with a mean and variance given by       use this measure is that in model-assisted optimization
the kriging predictor. We achieve an improvement I         the computational cost of everything except the true
over fmin if Y (x) = fmin − I. As shown in [7] the ex-     fitness evaluation is minimal in comparison. We also
pected value of I can be obtained using the likelihood     track the proportion of the 20 runs that reached the
of achieving the improvement                               goal before various limits (time, stall, etc.) took effect.
              Z I=∞
                             (fmin − I − ŷM (x)2
                                                 
       1
  √                  exp  −                         (10) 5.1 Benchmark functions
    2πs2M (x) I=0                  2s2M (x)
                                                         Since the evolutionary algorithms and optimization
Expected improvement is the expected value of the heuristics in general are often used on black-box opti-
improvement found by integrating over this density. mization, where the properties of the objective func-
The resulting measure EI is defined as                   tion are unknown, it is not straightforward to asses
                     2                                   their quality on real world problems. It has therefore
   EI(x) = E(I) = sM (x)[uΦ(u) + φ(u)], φ(u)], (11)
                                                         become a standard practice to test optimization algo-
where                                                    rithms and their modifications on specially designed
                       fmin − ŷM (x)                    testing problems.
                   u=                               (12)
                           s2M (x)                           These benchmark functions are explicitly defined
                                                         and their properties and optima are known. They are
and Φ and φ are the cumulative distribution function
                                                         often designed to exploit typical weaknesses of opti-
and the probability distribution functions of the nor-
                                                         mization algorithms in finding the global optimum.
mal distribution respectively. The expected improve-
                                                         We used three functions found in literature [10]. Al-
ment has an important advantage over the POI: it does
                                                         though we performed our tests in two dimensions we
not require a preset target T , which can be detrimen-
                                                         give general multi-dimensional definitions of the func-
tal to the POI’s successful sample selection when set
                                                         tions.
too high or too low.
                                                             First of the functions used is the De Jong’s func-
    All three measures have an important weakness of
                                                         tion. It is one of the simplest benchmarks, it is contin-
being based on the model prediction. If the modeled
                                                         uous, convex and unimodal and is defined as
function is deceptive, the model can be very inaccu-
rate while estimating a low variance. A good initial                                      Xn
sampling of the fitness function is therefore very im-                            f (x) =     x2i               (13)
portant. The success of the whole method is dependent                                     i=1
on the model’s ability to capture the response surface
correctly and thus on the function itself.               The domain is restricted to a hypercube −10 ≤ xi ≤
                                                         10, i = 1, . . . , n. The function has one global optimum
                                                         f (x) = 0 in point x = 0. The De Jong’s function was
5 Results and discussion                                 primarily used as a proof of concept test.
                                                             As a second benchmark, we used the Rosenbrock’s
The proposed method was tested using simulations on function, also called Rosenbrock’s valley. The global
three standard benchmark functions. We studied the optimum is inside a long parabolic shaped valley,
model evolution during the course of the optimization, which is easy to find. Finding the global optimum in
38                            Viktor Charypar, Martin Holeňa

                                               true fitness                                      initial model and samples                                        final model
                      2                                                             2                                                         2


                     1.5                                                           1.5                                                       1.5


                      1                                                             1                                                         1


                     0.5                                                           0.5                                                       0.5


                      0                                                             0                                                         0


                −0.5                                                              −0.5                                                      −0.5


                     −1                                                            −1                                                        −1


                −1.5                                                              −1.5                                                      −1.5


                     −2                                                            −2                                                        −2
                      −2    −1.5   −1       −0.5     0    0.5   1       1.5   2     −2   −1.5   −1   −0.5    0     0.5   1   1.5   2          −2   −1.5   −1   −0.5        0   0.5   1   1.5   2


                                                         Fig. 1. The original fitness function,    the
                                                                                         optimization    initial model and the final model.
                                                                                                      progress
                      2

                     1.8

                     1.6
that valley however is difficult [10]. The function has                                                           function   ev (1q) ev (med) ev (3q) goal reached
                     1.4
                                                                                                                  De Jong       60       60     120 0.01       1
 best true fitness


the following definition
                     1.2
                                                                                                                  Rosenbrock    60      125     310    0.1     1
                      1
                                            n
                                            X                                                                     Rastrigin    260      370     580    0.1   0.85
                                                   [100(xi+1 + x2i )2 + (1 − xi )2 ]
                     0.8

                     0.6
                           f (x) =                                                                   (14)
                     0.4
                                            i=1                                                                  Table 1. GA performance on benchmark functions with-
                     0.2
                                                                                                                 out a model.
The domain of the function is restricted to a hyper-
                                        2                           4                    6                          8                  10                             12                       14
cube −2 ≤ xi ≤ 2, i = 1, . . . , n. It has one global    generation

optimum f (x) = 0 in x = 1.
     Finally, the third function used as a benchmark is 5.3 Measures of estimated improvement
the Rastrigin’s function. It is based on the De Jong’s                comparison
function with addition of cosine modulation, which
                                                               In order to compare the measures of estimated im-
produces a high number of regularly distributed local
                                                               provement, we performed simulations on each bench-
minima and makes the function highly multimodal.
                                                               mark using each improvement estimate measure with
The function is defined as
                                                               different values of the threshold. The batch size was
                            Xn                                 set to 40 – generally found to be the ideal batch size
              f (x) = 10n +     [x2i − 10 cos(2πxi )] (15) – for these experiments. For comparison, we also per-
                            i=1                                formed tests with the standard genetic algorithm with-
                                                               out a model. Results of these simulations are shown in
The domain is restricted to −5 ≤ xi ≤ −5, i =
                                                               the table 1.
1, . . . , n. The global optimum f (x) = 0 is in x = 1.
                                                                    The De Jong’s function proved to be simple to op-
                                                               timize and the threshold setting did not have almost
5.2 Model evolution                                            any effect. Only when using the standard deviation,
                                                               setting the threshold too low lead to an increase in the
As the basic illustration of how the model evolves dur- number of evaluations, as too many points were eval-
ing the course of the EA, let us consider an example uated, although the model prediction in those points
test run using the Rosenbrock’s function. For this ex- was accurate enough.
periment we set the batch size of 15, used the STD                  The same is true for the STD measure used on the
measure of estimated improvement with a threshold Rosenbrock’s function, where setting the threshold too
of 0.001 and set the target fitness value of 0.001 as well. low leads to a big increase in variance of the results.
The target was reached at the point (0.9909, 0.9824) Interestingly, setting the threshold too high leads to
using 90 true fitness evaluations. A genetic algorithm a decrease in the number of evaluations, but also in
without a surrogate model needed approximately the success rate of reaching the goal. The POI and EI
3000 evaluations to reach the goal in several test runs. are more stable in terms of true fitness evaluations,
     The model evolution is shown in figure 1. The true but have worse overall success rate. The results are
fitness function is shown on the left, the initial model shown in figure 2
is in the middle and the final model on the right. The              The Rastrigin’s function proved difficult to opti-
points where the true fitness function was sampled are mize. This is probably due to the locality of the krig-
denoted with circles an the optimum is marked with ing model and the high number of local minima of the
a star.                                                        function. Overall the STD measure is the most suc-
                                                                                                                                                                                        Evolutionary optimization                                  39


                          400                                                                                                                    2000
                                                                                                    1                                                                                                                           1
                                                                                                                                                 1800
                          350                                                                       0.9                                                                                                                         0.9

                                                                                                                                                 1600
                                                                                                    0.8                                                                                                                         0.8
                          300
                                                                                                                                                 1400
                                                                                                    0.7                                                                                                                         0.7

                          250
       true evaluations


                                                                                                                              true evaluations
                                                                                                                                                 1200


                                                                                                        goal reached


                                                                                                                                                                                                                                    goal reached
                                                                                                    0.6                                                                                                                         0.6


                          200                                                                       0.5                                          1000                                                                           0.5


                                                                                                    0.4                                             800                                                                         0.4
                          150

                                                                                                    0.3                                                                                                                         0.3
                                                                                                                                                    600
                          100
                                                                                                    0.2                                                                                                                         0.2
                                                                                                                                                    400

                           50                                                                       0.1                                                                                                                         0.1
                                                                                                                                                    200
                                                                                                    0                                                                                                                           0
                           0                                                                                                                          0
                          0.0001     0.001         0.01               0.1         1               10                                                      5   10   15   20   25   30     40       50     60   70    80   90   100
                                                          threshold                                                                                                                             batch size


                                                   (a) STD                                                                                                                             (a) No model

                                                                                                                                                    160
                          400                                                                                                                                                                                                   1
                                                                                                    1
                                                                                                                                                    140                                                                         0.9
                          350                                                                       0.9
                                                                                                                                                                                                                                0.8
                                                                                                                                                    120
                                                                                                    0.8
                          300
                                                                                                                                                                                                                                0.7
                                                                                                    0.7                                             100


                                                                                                                                 true evaluations


                                                                                                                                                                                                                                    goal reached
                          250                                                                                                                                                                                                   0.6
       true evaluations


                                                                                                        goal reached


                                                                                                    0.6
                                                                                                                                                     80                                                                         0.5
                          200                                                                       0.5
                                                                                                                                                                                                                                0.4
                                                                                                    0.4                                              60
                          150                                                                                                                                                                                                   0.3
                                                                                                    0.3
                                                                                                                                                     40
                                                                                                                                                                                                                                0.2
                          100
                                                                                                    0.2

                                                                                                                                                     20                                                                         0.1
                           50                                                                       0.1
                                                                                                                                                                                                                                0
                                                                                                    0                                                 0
                                                                                                                                                          5   10   15   20   25   30     40       50     60   70    80   90   100
                            0
                           0.001   0.01      0.1             0.2            0.3       0.5         0.8                                                                                           batch size
                                                          threshold

                                                                                                                                                                                         (b) STD
                                                   (b) POI
                            median value           interquartile range                      goal reached                                            160
                                                                                                                                                                                                                                1

                                                                                                                                                    140                                                                         0.9

Fig. 2. STD measure on the Rosenbrock’s function - true                                                                                                                                                                         0.8
                                                                                                                                                    120
fitness evaluations and proportion of runs reaching goal.                                                                                                                                                                       0.7

                                                                                                                                                    100
                                                                                                                                 true evaluations


                                                                                                                                                                                                                                    goal reached
                                                                                                                                                                                                                                0.6


                                                                                                                                                     80                                                                         0.5


cessful. POI and EI lead to bad sampling of the model                                                                                                60
                                                                                                                                                                                                                                0.4


and failure to reach the optimum.                                                                                                                                                                                               0.3

                                                                                                                                                     40
                                                                                                                                                                                                                                0.2
    An interesting general result is that the more com-
                                                                                                                                                     20                                                                         0.1
plex measures of estimated improvement perform
                                                                                                                                                                                                                                0

worse than the simple standard deviation estimate.                                                                                                    0
                                                                                                                                                          5   10   15   20   25   30     40       50     60   70    80   90   100
                                                                                                                                                                                                batch size
This indicates that the goal of active learning selec-
tion criteria in the evolutionary optimization should                                                                                                                                     (c) POI
be the best possible sampling for overall model accu-                                                                                                 median value                            interquartile range        goal reached
racy, as opposed to trying to improve the accuracy in
the best regions of the input space. Both the POI and                                                                  Fig. 3. Batch size effect on Rosenbrock’s function opti-
EI are design to select next best points to reach the                                                                  mization - true fitness evaluations and proportion of runs
optimum. Since in our case, this is handled by the EA                                                                  raching the goal.
itself, the measures bring an unnecessary noise to the
estimate of the model accuracy. The results also show  a given batch size is to set the population size in a stan-
that the best measure selection is dependent on the    dard GA, in our method however, the settings are in-
optimized function.                                    dependent so a population size of 30, which proved
                                                       efficient, was used in all of the tests.
5.4 Batch size                                              The results on the De Jong’s functions show that
                                                       apart from small batch sizes (up to 10), the opti-
In order to study the batch size effect on the opti- mization is successful in all runs. Our method helps
mization, a number of experiments were performed stabilize the EA for small batch sizes and for batch
with different batch sizes. The only option to achieve sizes above 15 the algorithm finds the optimum using
40      Viktor Charypar, Martin Holeňa

a single batch. For a standard GA this strong depen-         ture information about the model accuracy improve-
dence arises for batch sizes above 40 and the algorithm      ment expected by sampling a given point. In the ex-
reaches the goal in the second generation, evaluating        periments with the batch size we found that small
twice as many points.                                        batch sizes perform better when the objective function
    For the Rosenbrock’s function we get the intuitive       is simple, while causing bad initial sampling of more
result that setting the batch size too low leads to more     complex functions, suggesting using a larger initial
evaluations or a failure to reach the goal, while large      sample. The future development of this work should
batch size do not improve the results and waste true         include experiments using different batch sizes for ini-
fitness evaluations. For this function the POI proved        tial sampling and comparison of the method with other
to be the most efficient measure. The comparison is          ways of employing a surrogate model in the optimiza-
shown in figure 3. Overall the method reduces the            tion as well as other model-assisted optimization
number of true evaluations from hundreds to tens for         methods.
the Rosenbrock’s function, while slightly reducing the           The method brings promising results, reducing the
success rate of the computation.                             number of true fitness evaluations to a large degree
    The Rastrigin’s function proved difficult to opti-       for some of the benchmark functions. On the other
mize even without a surrogate model. With the model,         hand, its success is highly dependent on the optimized
the STD achieved the best results reducing the number        function and its initial sampling.
of true fitness evaluations approximately three times
in the area of the highest success rate with batch size
                                                             References
of 70. The other two measures were ineffective. We at-
tribute the method’s difficulty optimizing the Rastri-        1. M. Baerns, M. Holeňa: Combinatorial development of
gin’s function to the fact that the kriging model is local       solid catalytic materials: design of high-throughput ex-
and thus it requires a large number of samples to cap-           periments, data analysis, data mining. Catalytic Sci-
ture the function’s complicated behavior in the whole            ence Series. Imperial College Press, 2009.
input space. When the initial sampling is misleading,         2. K. Crombecq, L. De Tommasi, D. Gorissen,
which is more likely for the Rastrigin’s function, both          T. Dhaene: A novel sequential design strategy for global
                                                                 surrogate modeling. In Winter Simulation Conference,
the model prediction and estimated improvement are
                                                                 WSC ’09, Winter Simulation Conference, 2009, 731–
wrong.                                                           742.
    The results suggest that best batch size and best         3. D. Gorissen: Grid-enabled adaptive surrogate modeling
estimated improvement measure are highly problem-                for computer aided engineering. PhD Thesis, Ghent
dependent. The proposed method is also very sensi-               University, University of Antwerp, 2009.
tive to good initial sample selection, which is the most      4. D. Gorissen, I. Couckuyt, P. Demeester, T. Dhaene,
usual reason for it to fail to find the optimum. The ex-         K. Crombecq: A surrogate modeling and adaptive sam-
perimental results support the intuition that batches            pling toolbox for computer based design. The Journal
too small are bad for the initial sampling of the model          of Machine Learning Research 11, 2010, 2051–2055.
                                                              5. L. Gräning, Y. Jin, B. Sendhoff: Efficient evolutionary
and batches too large slow down the model improve-
                                                                 optimization using individual-based evolution control
ment by evaluating points that it would not be nec-              and neural networks: A comparative study. In ESANN,
essary to evaluate with smaller batches. This suggests           2005, 273–278.
using a larger initial sample and a small batch for the       6. Y. Jin, M. Olhofer, B. Sendhoff: Managing approxi-
rest of the optimization.                                        mate models in evolutionary aerodynamic design op-
                                                                 timization. In Evolutionary Computation, 2001. Pro-
                                                                 ceedings of the 2001 Congress on, vol. 1, IEEE, 2001,
6    Conclusions                                                 592–599.
                                                              7. D.R. Jones. A taxonomy of global optimization meth-
In this paper we presented a method for model-assisted           ods based on response surfaces. Journal of Global Op-
evolutionary optimization with a fixed batch size re-            timization, 21:345–383, 2001.
quirement. To decouple the sampling from the EA iter-         8. D. R. Jones, M. Schonlau, W.J. Welch: Efficient global
ations and support an individual-based approach while            optimization of expensive black-box functions. Journal
keeping a fixed evaluation batch size, the method uses           of Global Optimization 13, 1998, 455–492.
an evaluation queue. The candidates for true fitness          9. G. Matheron: Principles of geostatistics. Economic
                                                                 Geology 58(8), 1963, 1246–1266.
evaluations are selected by an active learning method
                                                             10. M. Molga, C. Smutnicki: Test functions for optimiza-
using a measure of estimated improvement of the                  tion needs. Test Functions for Optimization Needs,
model quality based on the model prediction error es-            2005.
timate.                                                      11. J. Sacks, W. J. Welch, T. J. Mitchell, H. P. Wynn: De-
    The results suggest using simple methods for im-             sign and analysis of computer experiments. Statistical
provement estimate in active learning, which only cap-           Science 4(4), 1989, 409–423.

</pre>