<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Non-Linear Regression Methods on Black-Box Optimization Benchmarks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vojteˇch Kopal</string-name>
          <email>vojtech.kopal@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇ a</string-name>
          <email>martin@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague, Faculty of Mathematics and Physics</institution>
          ,
          <addr-line>Malostranské nám. 25, 118 00 Praha 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic</institution>
          ,
          <addr-line>Pod Vodárenskou veˇží 2, 182 07 Praha</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>135</fpage>
      <lpage>142</lpage>
      <abstract>
        <p>The paper compares several non-linear regression methods on synthetic data sets generated using standard benchmarks for continuous black-box optimization. For that comparison, we have chosen regression methods that have been used as surrogate models in such optimization: radial basis function networks, Gaussian processes, and random forests. Because the purpose of black-box optimization is frequently some kind of design of experiments, and because a role similar to surrogate models is in the traditional design of experiments played by response surface models, we also include standard response surface models, i.e., polynomial regression. The methods are evaluated based on their mean-squared error and on the Kendall's rank correlation coefficient between the ordering of function values according to the model and according to the function used to generate the data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper, we compare non-linear regression
methods that could be used as surrogate models for
optimization tasks. The methods are compared on synthetic data
sets generated using standard benchmarks for continuous
black-box optimization, for which we used
implementations based on definitions fromReal-Parameter Black-Box
Optimization Benchmarking 2009 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>A continuous black-box optimization is a task where we
try to minimize a continuous objective function f : X ⊆
Rn → R for which we do not have an analytical
expression. Such problems arise, for example, if the values of
the objective function are results of experimental
measurements.</p>
      <p>
        For that comparison, we have chosen regression
methods that have been used as surrogate models in such
optimization: radial basis function networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
Gaussian processes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and random forests [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We measure the accuracy of each methods based on
mean square error and Kendall’s rank coefficient and based
on the results we suggest which methods work better as
surrogate models. We are interested in properties of each
method to be used as a surrogate model, though our
experiments do not replace a direct evaluation in optimization or
in evolutionary algorithms. This is a subject of two other
papers included in this proceedings.</p>
      <p>
        Other comparisons of non-linear models have been
presented. A numerical comparison of neural networks and
polynomial regression has been performed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], in the latter one also classification and regression
tree (CART) model has been compared. An evaluation
of Gaussian processes with other non-linear methods has
been done in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These studies compared
accuracy of each model for prediction and have not paid
attention to surrogate models for optimization. Example
of such works can be found in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where they have
compared quadratic polynomial regression with other methods
based on prediction accuracy and mean-squared error, and
in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where is polynomial regression compared with
radial basis function networks based on accuracy and also on
optimization results. In this paper, we compare the
methods by means of mean-squared error and also Kendall’s
rank coefficient.
      </p>
      <p>We briefly describe the theoretical background for each
of these methods: how the corresponding models are
being induced and how they are used to predict new values.
For the synthetic data, we added an overview of how the
functions look like in a 3-dimensional space (Figure 1).</p>
      <p>The paper is organised as follows. In Section 2, we
recall the theoretical fundamentals of the employed
regression methods. In Section 3, we describe the setup of our
experiments and summarise the results, before the paper
concludes in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Regression Methods in Data Mining</title>
      <p>With a continuously increasing amount of gathered data,
data mining techniques allow us to search for patterns in
the data sets and model the underlying reality. Various
models have been introduced in the past, starting from a
linear regression to complex nonlinear methods such as
neural networks, or Gaussian processes. These models are
used to approximate a function that describes the
relationship between target and input values.</p>
      <p>We now introduce the methods compared in this paper.
Each of these methods has its strengths and weaknesses</p>
      <p>
        Polynomial
regression
Complexity
Θ(M2N)
Θ(N3)
Θ(MKN˜log2 N˜) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
polynomial
time [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
which we point out in Table 2 and later we will discuss
them in the context of results of our experiments.
      </p>
      <p>We assume to have a pair ((X ),Y ), where X is
pdimensional data set with n points, i.e. X is a matrix
p × n, or it is a vector X = (x1, x2, ..., xn), where xi is
a column vector of size p, i.e. xi = (xi1, xi2, ..., xip), and
Y = (Y1,Y2, ...,Yn) is a vector of size n of target values to
corresponding rows in matrix X. We use ||x|| as the
Euclidean norm of vector x. In the paper, we use following
notation:
• X ,Y, β are vectors with elements Xi,Yi, βi,
respectively, also β j,k is a scalar denoting a parameter in
polynomial regression for interaction x jxk,
• f is a function and f (x) is an output of the functions
corresponding to input x, for multivariate function f ,
we have either matrix notation Y = f (X), or vector
notation Yi = f (xi),
• f¯(X) is an average output over f (xi), ∀i ∈ {1, ..., N}.
2.1</p>
      <sec id="sec-2-1">
        <title>Polynomial Regression</title>
        <p>The most simple form of polynomial regression (PR) is
linear regression in which the model is described by p + 1
parameters β0, β1, ..., βp,
f (Xi) = β0 +</p>
        <p>
          x jβ j
p
X
j=1
which can be computed by [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
β = (β0, . . . , βp) = (XT X)−1XT y
(1)
        </p>
        <p>Polynomial regression is still part of the linear
regression family, because the dependence on the model
parameters is linear. However, we consider also higher powers
of input variables. For example, in the quadratic case we
add both xi2 form for i ∈ {1, ..., p}, and also as an
interaction xix j for i, j ∈ {1, ..., p}, i 6= j. Consequently we have
(p2 + p)/2 new variables.</p>
        <p>For our experiments, we will restrict attention to
quadratic regression,
f (Xi) = β0 +
x jβ j +
x2j βp+ j +
x jxkβi,k
p
X
j=1</p>
        <p>p
X
j=1</p>
        <p>Xp Xp
j=1 k=1
k&lt; j</p>
        <p>
          This is also the standard restriction in response surface
modeling [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Random Forests</title>
        <p>
          Random Forests (RF) is a model proposed by Breiman [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
and it is based on ensembles of decision trees. Due to
our interest in surrogate models for continuous black-box
optimization, we are interested in ensembles of regression
trees.
        </p>
        <p>A regression tree is a function defined by means of a
binary tree with inner nodes representing predicates, and
edges from a node to its children representing whether
the predicate is or is not fulfilled. The leaf nodes give
the predicted target value. The tree is built recursively
starting with a root node and searching for an optimal
binary predicate over the input variables. Regression trees
can be applied to data sets with both categorical/discrete
variables, and real-valued variables. Since we focus on
surrogate models for continuous black-box optimization,
we only consider real-valued predicates. For a real-valued
variable, the data set is split into two parts through
minimizing following formula</p>
        <p>X X</p>
        <p>(yi − c1)2 +
xi∈R1( j,s)
xi∈R2( j,s)
(yi − c2)2
where R1, R2 are the two linearly bounded regions with
axes-perpendicular borders into which the data set is split
using a j-th variable x j and its splitting point s, and c1, c2
are the averages of function values of points belonging
to R1, R2, respectively. After finding the optimal splitting
point we recursively apply this process to both regions R1
and R2, and for each of them, only the data points in the
region are considered. This process continues until a
stopping criterion is met. This can be either the minimum
number of data points in leaves or inner nodes, or the depth of
the tree.</p>
        <p>If the regression tree finally splits the input space into
the regions R1, . . . , Rm, we can compute the prediction for
a new data point using the following formula:</p>
        <p>M
X
m=1
f (x) =</p>
        <p>cmI(x ∈ Rm)
where cm is an average target value of data points in region
Rm.</p>
        <p>An ensemble of regression trees averages the
predictions when presented with a new data point.</p>
        <p>There are several options how to induce a number of
trees over the same data set that will lead to low
correlation. In traditional bagging, independent subsets of the
original data used for individual trees are obtained by
sampling from the data set uniformly and with replacement. In
addition, random subsets of input variables can be used. In
Matlab implementation of random forests, a square root of
number of input variables are selected by default, which is
also a setting we have used for our experiments.</p>
        <p>The model parameters are number of trees (NT) which
are added to the ensemble and the minimum number of
data in leaves (ML).
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Gaussian Processes</title>
        <p>A Gaussian process (GP) is a random process such
that its restriction to any finite number of points has a
Gaussian probability distribution. A Gaussian process
GP(μ (x), κ (x, x0)) is defined by its mean function μ (x)
and a covariance function κ (x, x0).</p>
        <p>f (x) ∼ GP(μ (x), κ (x, x0))
(2)
These functions determine the mean and covariance of the
process because</p>
        <p>E[ f (x)] = μ (x),</p>
        <p>= κ (x, x0)
Cov[ f (x), f (x0)] = E[( f (x) − μ (x))( f (x0) − μ (x0))] (3)
The important part of modelling functions with
Gaussian processes is choosing the covariance function. An
important feature of covariance functions is that they can
be combined together using addition and multiplication,
i.e. for κ , κ 0 covariance functions, κ × κ 0 and κ + κ 0 are
again covariance functions. Frequently used covariance
functions are: linear, periodic, squared-exponential, and
rational quadratic.
• Linear:
• Periodic:
κlin(x, x0) = xx0</p>
        <p>2 r
κper(r) = exp(− l2 sin2(π p ))
• Squared-exponential:
• Rational Quadratic:
r2
κSE (r) = exp(− 2l2 )
κRQ(r) = (1 +
where r = |x − x0| and c, l, p, α are parameters of the
covariance function (because the covariance function itself
is a parameter of the Gaussian process, they are called
hyper-parameters of the process). l is a length-scale, p
defines period, and α changes the smoothness of rational
quadratic function. An additional parameter in the model
is the noise level (SN) which is an additive Gaussian noise
in the model.</p>
        <p>When working with multivariate data sets, the
covariance functions which have the length-scale as a
parameter, can either apply the same length-scale l to all
dimensions, or i-th dimension has its length-scale li. In the first
case, the covariance functions have isotropic distance
measure, the latter case uses automatic relevance
determination (ARD).
Radial basis network functions (RBF) is a feed-forward
neural network with one hidden layer in which the nodes
have radial transfer function ρ . The output of the network
is given by</p>
        <p>N
X
i=1
ϕ (x) =</p>
        <p>aiρ (||x − ci||)
or its normalized version:</p>
        <p>PN
ϕ (x) = Pi=N1 aiρ (||x − ci||)</p>
        <p>i=1 ρ (||x − ci||)
where ρ (||x − ci||) is usually in form of gaussian:
ρ (||x − ci||) = exp
||x − ci||2
2σi2
ci is a center vector of the respective neuron, ai is a weight
of the neuron, and ||x − ci|| is a norm, typically the
Euclidean norm. The model parameters are the spread
constant σi2 (SC), the maximum of neurons (MAX) that can be
added to network during iterative learning process, and the
error goal (EG) which is a mean-squared error on training
set. The maximum neurons or the error goal are stopping
criteria for the network induction.
(4)
(5)
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Model Selection and Evaluation</title>
        <p>The parameters for regression models were selected by
10-fold cross validation based on the mean-squared
error (MSE)
err = MSE =</p>
        <p>N
1 X
N
i=1
(Yi − f¯(X))2</p>
        <p>The cross validation is suited for limited data samples,
but it is also justified method for synthetic data.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments with Synthetic Data</title>
      <p>
        As we are interested primarily in the suitability of the
considered regression methods for surrogate models in
black-box optimization, we compared them on synthetic
data generated using standard benchmarks for continuous
black-box optimization [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>All performed experiments were implemented in
Matlab. For each function, we have sampled 5000
p-dimensional data points where p ∈ {5, 10, 20, 40} and
used it for a 10-fold cross-validation to compare the
considered models. The result of cross validation is MSE for
training set, MSE for testing set and the Kendall’s rank
correlation coefficient. The significance of the difference
between results obtained for two models m, m0 was tested
using independent sample t-test
t = ¨
resm − resm0</p>
      <p>1k (σm2 + σm20 )
which we compare for a significance level α ∈ (0, 1)
against the (1 − α2 )-quantile of the Student distribution
with 2(k-1) degrees of freedom, where k is the number of
cross-validation folds, degrees of freedom, and resm, σm
are computed as follows:
resm =</p>
      <p>resm,i
k
1 X
k</p>
      <p>i=1
1
(k − 1) i=1</p>
      <p>k</p>
      <p>X
σm =
(resi − res)2</p>
      <p>For a comparison of two models, it would have been
better to use paired t-test, which provide better estimates,
but since we have decided to use unpaired t-test at the
beginning of our experiment, we haven’t had necessary
subresults to perform it.</p>
      <p>
        We have used MSE together with Kendall’s rank
correlation coefficient [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] between the ordering of function
values according to the model (y1, . . . , yn) and according
to the function used to generate the data (t1, . . . , tn)
τm =
(# of concordant pairs) − (# of discordant pairs)
12 n(n − 1)
(6)
(7)
where for (t j, y j) and (tk, yk) different pairs of target value
t and predicted value y, (t j, y j) and (tk, yk) are concordant
if t j &lt; tk and y j &lt; yk, or t j &gt; tk and y j &gt; yk, and discordant
otherwise.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Selection of Model Parameters</title>
        <p>For each dataset, we have searched for optimal model
parameters (in the case of a Gaussian process, these are its
hyper-parameters) minimizing the MSE. With regression
trees, we have considered different settings for the
number of trees and minimum number of data points in leaves.
With Gaussian processes, we have tried rational quadratic
and squared exponential in their isomorphic form, and also
the ARD version of squared exponential. With radial basis
function networks, as a radial function, we have
considered different settings of the parameters: spread constant,
MSE goal, maximum of neurons. As to polynomial
regression, we have used quadratic regression. See Table 3
for overview of selected parameters for each model.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>We will now present the results of our experiments. First,
we have included a detailed Table 3 with measured values
of the MSE and the Kendall’s coefficient for each dataset
and each model. We can see how optimal combinations of
values of parameters for each model change with higher
dimensions. Random forests have lower number of trees
(NT) and higher minimum number of data in leaves (ML).
The comparison of the performance of each method across
different dimensions of the data sets follows.</p>
        <p>Table 4 shows the results of our experiments where we
have compared four different models across 40 different
data sets. For each model, we entered the number of times
the model was better than the other model and we also
added how many times the result was significantly better
on the significance level 0.05.</p>
        <p>A summary of the results can be seen in Table 2 and
additional comments on the results follow. With 10
dimensions, the radial basis functions started performing better,
although not significantly. With 20 dimensions, there are
even less methods that outperforms polynomial regression
significantly according to MSE and random forests were
the weakest model from the triple of models RBF, GP and
RT. With 40 dimensions, there is a surprising result since
MSE values are much lower comparing to lower
dimensions and we would expect the MSE to be growing with
higher dimensions. This may be an artifact of the
function definitions which suppress higher dimensions and that
may lower the MSE values.</p>
        <p>In summary, when comparing the MSE over all
dimensions, the Gaussian processes were the best model for our
data followed by radial basis functions and random forests
before polynomial regression at the last position. With
Kendall’s coefficient, the results are not that clear. Even
though the Gaussian processes have the most wins, they do
f16
f17
-5 -5
-5 -5
(i) f23 Katsuura Function
Lunacek</p>
        <p>bi-Rastrigin
(j) f24
Function
9.23±0.04e-1 SEiso
8.3±0.1e-1 SEiso
7±0.1e-1 RQiso
5.99±0.07e-1 RQiso</p>
        <p>MSE</p>
        <p>10 20
Dimensions
5</p>
        <p>10 20
Dimensions
(a) Polynomial regression
(b) Random forests
(c) Radial basis function networks
(d) Gaussian processes
Dimension Method
not have the most significant wins. Based on the
significant wins, the best performing model were random forests.</p>
        <p>With higher dimensions, when comparing the models
based on the MSE, we may notice that the results for
Gaussian processes and random forests are less significant.
Which is also the case with Kendall’s coefficient, where
the polynomial regression gets more wins with higher
dimension.</p>
        <p>Now we have a look at how long does it take to evaluate
10-fold cross validation for selected parameters settings
for each model (see Figure 2). With higher dimensions,
each method takes more time to evaluate. All the
computations were performed on PC (x86-64) Intel Core i7 920
(4x 2.66 GHz + HyperThreading), 6 GB RAM.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusion</title>
      <p>The figures and tables presented in Results compared four
different regression methods over 40 synthetic data sets
(10 functions × 4 different dimensions) generated using
standard benchmark functions for continuous black-box
optimization. We have shown how the performance of
these methods changes with increasing dimensionality and
how the time to cross-validate the models grows. We have
compared the methods based on the MSE and on Kendall’s
coefficient. We will now comment on each of them.</p>
      <p>Gaussian process is probably the most complex method.
With its time complexity O(N3) it takes the longest time to
compute, some of the cross-validations, i.e. 10
constructions of the model, took up to 24 hours. This model was
better then the others according to both MSE and Kendall’s
coefficient comparison.</p>
      <p>Random forests ended up with poorer results for 40
dimensional data and overall they were slightly behind
Gaussian Processes based on the MSE. According to
Kendall’s coefficient results, they were comparable with
Gaussian processes and, according to the number of
significant wins, they even outperformed GP. With some data
sets (f19-10d, f20-05d), we have learnt 2000 trees out of
4500 samples. In these cases, we could have compare the
results with nearest neighbor method.</p>
      <p>Radial basis functions network has the clearly poorer
results compared to Gaussian processes and random forests
according to both the MSE and the Kendall’s coefficient.</p>
      <p>Even though the polynomial regression was included
due to its importance as traditional response surface
model, the method was not always worse then all other
methods. For the dimensions 20 and 40, their MSE was
comparable to that of random forest. Also with higher
dimensions, the results based on the Kendall’s coefficient
are comparable to both GP and RBF and it even
outperformed RBF.</p>
      <p>In this paper, we have compared a selection of
nonlinear methods on synthetic data sets based on their
meansquared error and on the Kendall’s rank correlation
coefficient. We have chosen regression methods that have been
used as surrogate models in such optimization: radial
basis function networks, Gaussian processes, random forests,
and polynomial regression. A better accuracy of the
models suggests better applicability of the models as a
surrogate model for optimization. From the results we have
learnt that Gaussian processes had better results in most
cases, thus, would be better surrogate model compared to
the others, although random forests were only slightly
behind.</p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>This research was partially supported by SVV project
number 260 224.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>[1] Springer Encyclopedia of Mathematics</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Alessandri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cassettari</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mosca</surname>
          </string-name>
          . R.:
          <article-title>Nonparametric nonlinear regression using polynomial and neural approximators: a numerical comparison</article-title>
          .
          <source>Computational Management Science</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>24</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bajer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Holenˇ a, M.: Surrogate model for continuous and discrete genetic optimization based on rbf networks</article-title>
          .
          <source>In: Colin Fyfe</source>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Tino</surname>
          </string-name>
          , Darryl Charles, Cesar GarciaOsorio, and Hujun Yin, (eds),
          <source>Intelligent Data Engineering and Automated Learning - IDEAL</source>
          <year>2010</year>
          , volume
          <volume>6283</volume>
          of Lecture Notes in Computer Science,
          <volume>251</volume>
          -
          <fpage>258</fpage>
          , Springer Berlin Heidelberg,
          <year>2010</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bajer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pitra</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <article-title>Holenˇ a, M.: Benchmarking gaussian processes and random forests surrogate models on the bbob noiseless testbed</article-title>
          .
          <source>In: GECCO</source>
          <year>2015</year>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random forests</article-title>
          .
          <source>Mach. Learn</source>
          .
          <volume>45</volume>
          (
          <issue>1</issue>
          ) (
          <year>October 2001</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Buche</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schraudolph</surname>
            ,
            <given-names>N. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koumoutsakos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Accelerating evolutionary algorithms with gaussian process fitness function models</article-title>
          .
          <source>Systems, Man, and Cybernetics</source>
          , Part C:
          <article-title>Applications</article-title>
          and Reviews, IEEE Transactions on
          <volume>35</volume>
          (
          <issue>2</issue>
          ) (May
          <year>2005</year>
          ),
          <fpage>183</fpage>
          -
          <lpage>194</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Gano</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , D. E.:
          <article-title>Comparison of three surrogate modeling techniques: datascape, kriging and second order regression</article-title>
          .
          <source>In: 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference</source>
          ,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Real-parameter black-box optimization benchmarking 2009: noiseless functions definitions</article-title>
          .
          <source>Research Report RR-6829</source>
          ,
          <year>2009</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The elements of statistical learning</article-title>
          . Springer Series in Statistics, Springer New York Inc., New York, NY, USA,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Hultquist</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A comparison of gaussian process regression, random forests and support vector regression for burn severity assessment in diseased forests</article-title>
          .
          <source>Remote Sensing Letters</source>
          <volume>5</volume>
          (
          <issue>8</issue>
          ) (
          <year>2014</year>
          ),
          <fpage>723</fpage>
          -
          <lpage>732</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Kleijnen</surname>
            ,
            <given-names>J. P. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Beers</surname>
            , W., van Nieuwenhuyse,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Constrained optimization in expensive simulation: novel approach</article-title>
          .
          <source>European Journal of Operational Research</source>
          <volume>202</volume>
          (
          <issue>1</issue>
          ) (
          <year>2010</year>
          ),
          <fpage>164</fpage>
          -
          <lpage>174</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Louppe</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Understanding random forests: from theory to practice</article-title>
          .
          <source>PhD thesis</source>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Comparison of surrogate models with different methods in groundwater remediation process</article-title>
          .
          <source>Journal of Earth System Science</source>
          <volume>123</volume>
          (
          <issue>7</issue>
          ) (
          <year>2014</year>
          ),
          <fpage>1579</fpage>
          -
          <lpage>1589</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>R. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montgomery</surname>
            ,
            <given-names>D. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson-Cook</surname>
            ,
            <given-names>C. M.:</given-names>
          </string-name>
          <article-title>Response surface methodology: proces and product optimization using designed experiments, 2009</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Rasmussen</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          :
          <article-title>Evaluation of gaussian processes and other methods for non-linear regression</article-title>
          .
          <source>Technical Report</source>
          ,
          <year>1996</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Razi</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Athappilly</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A comparative predictive analysis of neural networks (nns), nonlinear regression and classification and regression tree (cart) models</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>29</volume>
          (
          <issue>1</issue>
          ) (
          <year>2005</year>
          ),
          <fpage>65</fpage>
          -
          <lpage>74</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Govil</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Miranda</surname>
          </string-name>
          , R.:
          <article-title>A neural-network learning theory and a polynomial time rbf algorithm</article-title>
          .
          <source>Neural Networks, IEEE Transactions on 8(6)</source>
          (Nov.
          <year>1997</year>
          ),
          <fpage>1301</fpage>
          -
          <lpage>1313</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ong</surname>
            ,
            <given-names>Y. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>P. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keane</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lum</surname>
          </string-name>
          , K. Y.:
          <article-title>Combining global and local surrogate models to accelerate evolutionary optimization</article-title>
          .
          <source>Systems, Man, and Cybernetics</source>
          , Part C:
          <article-title>Applications</article-title>
          and Reviews, IEEE Transactions on
          <volume>37</volume>
          (
          <article-title>1) (Jan</article-title>
          .
          <year>2007</year>
          ),
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>