<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adaptive Selection of Gaussian Process Model for Active Learning in Expensive Optimization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jakub Repicky´</string-name>
          <email>repicky@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zbynˇek Pitra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Physics, Charles University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Czech Academy of Sciences</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Novel Approach to GP-Based Active Learning</institution>
        </aff>
      </contrib-group>
      <fpage>80</fpage>
      <lpage>84</lpage>
      <abstract>
        <p>Black-box optimization denotes the optimization of objective functions the values of which are only available through empirical measurements or experiments. Such optimization tasks are most often tackled with evolutionary algorithms and other kinds of metaheuristics methods (e. g., [12]), which need to evaluate the objective function in many points. This is a serious problem in situations when its evaluation is expensive with respect to some kind of resources, e.g., the cost of needed experiments. A standard attempt to circumvent that problem is to evaluate the original objective function only in a small fraction of those points, and to evaluate a surrogate model of the original function [5] in the remaining points. Once a model has been trained, the success of the optimization in the remaining iterations depends on a resource aware selection of points in which the original function will be evaluated, which is a typical active learning task. The surrogate model used in the reported research is a Gaussian process (GP) [11], which treats the values of an unknown function as jointly Gaussian random variables. The advantage of GP compared to other kinds of surrogate models is its capability of quantifying the uncertainty of prediction, by calculating the variance of the posterior distribution of function values. So far, active learning of points in which the original objective function will be evaluated during a GP-based surrogate modelling nearly always used a fixed covariance function of the surrogate model and a fixed active learning algorithm (e.g., [9, 15]). In the reported research, an adaptive selection of the covariance function according to the available data is investigated. To this end, a pool of kernels of 8 various kinds has been established, including non-stationary kernels and composed kernels of depth one (Figure 1). Adaptive selection of GP covariance functions is known from GP regression literature [3, 6, 8]. Differently to our approach, the number of available kinds of simple kernels is smaller; on the other hand, they can be recurrently composed to</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        an arbitrary depth through algebraic operations. An additional difference
concerns the criteria according to which the models are selected: Whereas in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], only
likelihood is used, and in [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ], only the Bayes information criterion (BIC ) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
was used, our research includes the investigation of suitability of different
criteria for the selection of GP covariance functions in surrogate modelling. Since the
maximum likelihood estimate is prone to favour overfitting models, we consider
only criteria that account for model complexity. Apart from the BIC used in
[
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ], we consider also the Akaike information criterion (AIC ) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the deviance
information criterion (DIC ) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and a criterion proposed by Watanabe in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
called by him widely applicable information criterion (WAIC ).
2.1
      </p>
      <sec id="sec-1-1">
        <title>Algorithm of Active Learning</title>
        <p>
          The adaptive selection of covariance function is based on GP-based Doubly
trained surrogate covariance matrix adaptation evolutionary strategy
(DTSCMA-ES) [
          <xref ref-type="bibr" rid="ref10 ref2">2, 10</xref>
          ]. A GP model is built in each generation of the evolution
strategy. When the model is trained, a fraction of the population maximizing the
probability of improvement [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is selected for evaluation with the expensive
objective function. The model is retrained with the newly evaluated points and used
to rank the whole population. The fraction of points for evaluation is adapted
according to recent performance of the surrogate model, more precisely, to its
ranking difference error in the last iteration. A novel contribution of the
reported research is a selection of the covariance function according to one of the
aforementioned criteria, taking place at the training phase.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results and Conclusion</title>
      <p>The best performing model selection in our experiments are those based on AIC
and BIC, with no clear difference between them. Results for DIC and WAIC are
not shown because we have not yet succeed to implement these.</p>
      <p>On average, optimization results do not show an improvement on the
DTSCMA-ES. Nevertheless, a promising result has been obtained on the “Step
ellipsoidal” function, characterized by plateaus lying on a quadratic structure,
especially in multi-dimensional variants (Figure 2). The most frequently selected
kernel for this function under both AIC and BIC has been the sum of the squared
exponential and the quadratic kernel, which provides an intuitive interpretation
of the function.</p>
      <p>This result suggests that composite kernels can capture global features of
the objective function landscape and provide better predictive performance, but
extending the pool of covariances might be needed e. g., by composite kernels
of arbitrary depth. This is challenging due the computational cost of searching
through an open-ended space of kernel expressions. A possible solution might be
found in a co-evolution of the kernel for the surrogate model and the candidate
solution to the objective function.</p>
      <sec id="sec-2-1">
        <title>Acknowledgment</title>
        <p>The research reported in this paper has been supported by the SVV project
number 260 453 and the Czech Science Foundation (GACˇ R) grant 17-01251.
A
A.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Appendix</title>
      <sec id="sec-3-1">
        <title>Covariance Functions</title>
        <p>
          4
2
E0
S
-2
-4
4
2
N0
-2
N
-4
4
2
IN0
L-2
-4
4
D2
A0
-2
U
Q
-4
4
R2
E0
P-2
-4
4
DD02
A-2
-4
4
N2
N0
+
E-2
S-4
noiseless testbed. Each function is defined everywhere on RD and has its global
optimum in [
          <xref ref-type="bibr" rid="ref5">−5, 5</xref>
          ]D for all dimensionalities D ≥ 2. For every function and two
dimensionalities, 2D and 10D, 15 independent trials of each algorithm are
conducted on multiple function instances, defined by transformations (translations,
rotations and shifts) of both the search space and f -values. A trial terminates
when the optimum is reached within a small tolerance or when a given budget
of function evaluations, 250D in our case, is exhausted.
        </p>
        <p>A.3</p>
        <p>Experimental Results</p>
        <p>f7 Step Ellipsoidal 10D</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akaike</surname>
          </string-name>
          , H.:
          <article-title>Information Theory and an Extension of the Maximum Likelihood Principle</article-title>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>213</lpage>
          . Springer New York (
          <year>1973</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bajer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Model-based evolutionary optimization methods</article-title>
          .
          <source>Ph.D. thesis, Faculty of Mathematics and Physics</source>
          , Charles University in Prague, Prague (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Duvenaud</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grosse</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tenenbaum</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zoubin</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Structure discovery in nonparametric regression through compositional kernel search</article-title>
          . In: Dasgupta,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>McAllester</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 30th International Conference on Machine Learning</source>
          . vol.
          <volume>28</volume>
          , pp.
          <fpage>1166</fpage>
          -
          <lpage>1174</lpage>
          . PMLR, Atlanta, Georgia, USA (Jun
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Real-parameter Black-Box Optimization Benchmarking 2012: Experimental setup</article-title>
          .
          <source>Tech. rep.</source>
          ,
          <source>INRIA</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Surrogate-assisted evolutionary computation: Recent advances and future challenges</article-title>
          .
          <source>Swarm and Evolutionary Computation</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          (
          <year>Jun 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kronberger</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kommenda</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Evolution of covariance functions for gaussian process regression using genetic programming</article-title>
          . In:
          <string-name>
            <surname>Moreno-D´</surname>
          </string-name>
          ıaz, R.,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quesada-Arencibia</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (eds.)
          <source>Computer Aided Systems Theory - EUROCAST</source>
          <year>2013</year>
          . pp.
          <fpage>308</fpage>
          -
          <lpage>315</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kushner</surname>
            ,
            <given-names>H.J.:</given-names>
          </string-name>
          <article-title>A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise</article-title>
          .
          <source>J. Basic Eng</source>
          .
          <volume>86</volume>
          (
          <issue>1</issue>
          ),
          <fpage>97</fpage>
          -
          <lpage>106</lpage>
          (
          <year>1964</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duvenaud</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grosse</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tenenbaum</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Automatic construction and natural-language description of nonparametric regression models</article-title>
          .
          <source>CoRR abs/1402.4304 (Apr</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>An evolution strategy assisted by an ensemble of local Gaussian process models</article-title>
          .
          <source>In: Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference - GECCO '13</source>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pitra</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bajer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holenˇa</surname>
          </string-name>
          , M.:
          <article-title>Doubly trained evolution control for the surrogate CMA-ES</article-title>
          .
          <article-title>In: Parallel Problem Solving from Nature -</article-title>
          PPSN XIV, pp.
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          . Springer International Publishing (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rasmussen</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C.K.I.</given-names>
          </string-name>
          :
          <article-title>Gaussian Processes for Machine Learning</article-title>
          .
          <source>Adaptative computation and machine learning series</source>
          , MIT Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Schaefer</surname>
          </string-name>
          , R.:
          <source>Foundations of Global Genetic Optimization</source>
          . Springer Publishing Company, Incorporated, 1st edn. (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schwarz</surname>
          </string-name>
          , G.:
          <article-title>Estimating the dimension of a model</article-title>
          .
          <source>The Annals of Statistics</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>461</fpage>
          -
          <lpage>464</lpage>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Spiegelhalter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Best</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Der Linde</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Bayesian measures of model complexity and fit</article-title>
          .
          <source>Journal of the Royal Statistical Society. Series B: Statistical Methodology</source>
          <volume>64</volume>
          (
          <issue>4</issue>
          ),
          <fpage>583</fpage>
          -
          <lpage>616</lpage>
          (12
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naujoks</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Investigating Uncertainty Propagation in Surrogate-assisted Evolutionary Algorithms</article-title>
          .
          <source>In: Proceedings of the Genetic and Evolutionary Computation Conference</source>
          . pp.
          <fpage>881</fpage>
          -
          <lpage>888</lpage>
          . GECCO '17,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Watanabe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Algebraic Geometry and Statistical Learning Theory. Cambridge Monographs on Applied and Computational Mathematics</source>
          , Cambridge University Press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>