<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Suitability of Modern Neural Networks for Active and Transfer Learning in Surrogate-Assisted Black-Box Optimization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Holeňa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Koza</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Academy of Sciences, Institute of Computer Science</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Czech Technical University, Faculty of Information Technology</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>47</fpage>
      <lpage>67</lpage>
      <abstract>
        <p>Active learning plays a crucial role in black-box optimization, especially for objective functions that are expensive to evaluate. Continuous black-box optimization has adopted an approach called surrogate modelling, where the original black-box objective is approximated with a regression model. An active learning task in this context is to decide which points should be evaluated using the original objective to update the surrogate model. Apart from low-order polynomials, the first surrogate models were artificial neural networks of the kinds multilayer perceptron and radial basis function network. In the late 2000s, neural networks have been superseded by other kinds of surrogate models, primarily Gaussian processes. However, over the last 15 years, neural networks have seen significant and successful development, suggesting that they once again have the potential to serve as promising surrogate models. This paper reviews possible research directions concerning that potential, and recalls initial results from investigations in some of these directions. Finally, it contributes to those results by investigating the state-of-the-art black-box optimizer CMA-ES surrogate-assisted by two variants of random-activation-function neural network ensembles.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>One area where active learning plays a really important role is black-box optimization (BBO), i.e.,
optimization of objective functions for which no analytical description is provided. It employs optimization
methods that need as input only points in the search space paired with respective values of the objective
function obtained in a non-analytical way, e.g. from sensors, in experiments or through numerical
simulations. Most frequently used are evolutionary optimization approaches, such as evolution
strategies, genetic algorithms, and diferential evolution, or other metaheuristics, such as particle swarm
optimization.</p>
      <p>
        Because BBO methods receive only information about values of the objective function, they typically
need many such values. This is a problem in situations when evaluating the black-box objective
function is time-consuming and/or expensive. That is frequently the case if it is evaluated empirically
in experiments. For example, for the evolutionary optimization tasks described in the book [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the
evaluation of a comparatively small generation of a genetic algorithm can sometimes take more than a
week and cost more than 10000 e. To deal with expensive evaluations, continuous BBO has in the late
1990s and early 2000s adopted an approach called surrogate modelling or metamodelling [
        <xref ref-type="bibr" rid="ref136 ref2 ref3 ref4 ref5 ref6 ref7 ref8">2, 3, 4, 5, 6, 7, 8</xref>
        ].
In principle, a surrogate model is any regression model that with a suficient fidelity approximates
the original black-box objective function, restricting the necessity of its evaluation only to a small
proportion of points, whereas everywehere else, only the surrogate model is used.
      </p>
      <p>Selecting the points in which the original objective function should be evaluated is a step in which
active learning is involved. However, it is not active learning of a regression model although the
surrogate model itself is a regression model. The reason is that its utility functions are not based
on the model, like are the commoly used utility functions uncertainty decrease, model performance,
diversity, or surprise-novelty. Instead, they are based on the BBO, the most common being minimizing
the objective function for a given evaluation budget, and minimizing the evaluation budget for a given
objective-function threshold. Nevertheless, even active learning in surrogate-assisted BBO follows the
basic priciple of active learning: to actively select next model inputs according to the considered utility
function.</p>
      <p>The earliest kinds of surrogate models in continuous BBO were low-order polynomials and artificial
neural networks (ANNs) of the kind multilayer perceptron (MLP). The former have always remained a
suitable choice in situations when enough evaluations of the original black-box objective function are
afordable for the approximation properties of polynomials to be in efect. On the other hand, surrogate
modelling for substantially less evaluations of the original objective has during the last two decades
undergone further development. MLPs were soon replaced with another kind of ANNs, radial basis
function networks (RBFs), which better fit local peculiarities of an objective function landscape. Those
networks, however, have since the late 2000s been superseded by other kinds of surrogate models,
primarily Gaussian processes (GPs), but also ranking support vector machines (RSVMs), and random
forests (RFs). GPs are currently the most successful kind of surrogate models for BBO with small
evaluation budget of functions with complicated multimodal landscapes, mainly due to their ability to
assess the uncertainty of the estimate of the original objective function in a given point, more precisely,
to provide the probability distribution of this estimate. That property of GPs allows to combine the
original BBO method, e.g. an evolutionary one, with Bayesian optimization.</p>
      <p>Consequently, only little attention has been paid to ANN-based surrogate models in continuous BBO
during the last 15 years. This contrasts with the intense and successful development of the ANN area
during that time, which suggests that ANNs again have the potential to serve as promising surrogate
models. This paper attempts to bring a small contribution to research into that potential, presenting in
addition a review of possible directions for such a research, connected with diferent classes of neural
networks. Moreover, it also points out that ANNs can serve as the basis for transfer learning between
surrogate-assisted BBO of diferent functions.</p>
      <p>The next section surveys important aspects and key methods concerning surrogate-assisted
continuous BBO. The review of possible research directions concerning the usability of modern neural
networks in surrogate-assisted BBO is presented in Section 3. Finally, Section 4 reports an experimental
contribution to one of those research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Surrogate-Assisted Continuous BBO</title>
      <p>
        Surrogate modelling for continuous BBO relies on combination and interaction of three components:
a regression model serving as a surrogate of the original black-box objective function, a BBO method
seeking the optimum of that objective function, and a strategy when to evaluate the original objective
function and when its surrogate model. That strategy is in the context of evolutionary BBO usually
called evolution control [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref9">9, 10, 11, 12, 13</xref>
        ]. There are two other aspects, namely observing constraints
on the feasible set of the black-box objective function (cf. e.g. [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]), and generalizing surrogate
modelling from single objective to multiple objectives (cf. e.g. [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]), however, we will restrict our
attention to single-objective unconstrained optimization.
      </p>
      <p>
        As already mentioned in the introduction, the regression models that are the most suitable kind
of surrogate models if suficiently many evaluations of the original black-box objective function are
afordable, are low-order polynomials, typically quadratic functions [
        <xref ref-type="bibr" rid="ref137 ref18 ref19 ref20 ref21 ref22">18, 19, 20, 21, 22</xref>
        ]. For substantially
less evaluations, the most traditional kind have been MLPs [
        <xref ref-type="bibr" rid="ref23 ref9">23, 9</xref>
        ], soon replaced with RBFs [
        <xref ref-type="bibr" rid="ref21 ref22 ref24 ref25 ref26">24, 25,
26, 21, 22</xref>
        ], and since the late 2000s with GPs a.k.a. kriging [
        <xref ref-type="bibr" rid="ref11 ref27 ref28 ref29 ref30">27, 28, 11, 29, 30</xref>
        ]. Occasionally, RBFs were
used as local models in combination with GP-based global models [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Other kinds of surrogate models
employed during the last decade include decision trees [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], RFs [
        <xref ref-type="bibr" rid="ref32 ref33 ref34">33, 34, 32</xref>
        ], and RSVMs [
        <xref ref-type="bibr" rid="ref35 ref36">35, 36</xref>
        ]. The
last one has an exceptional property of invariance with respect to order-preserving transformations of the
objective function. This is important in situations when the BBO algorithm possesses such invariance, a
frequently encountered property of evolutionary algorithms. On the other hand, the surrogate modelling
methods proposed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] use GPs to perform preselection based on a partial ordering that
is also invariant with respect to order-preserving transformations. More importantly, the adaptive
function value warping approach recently proposed in [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] aims at providing such invariance to any
surrogate model. As a final remark to diferent kinds of surrogate models, important works about that
topic always consider several kinds [
        <xref ref-type="bibr" rid="ref12 ref137 ref20 ref32 ref38 ref39">38, 12, 39, 20, 32</xref>
        ], to compare them and select the best among
them, and in [
        <xref ref-type="bibr" rid="ref22 ref39">22, 39</xref>
        ] also to aggregate their results, thus providing a team of surrogate models.
      </p>
      <p>
        As to the BBO methods, not only the two most important kinds of surrogate models, i.e. low-order
polynomials [
        <xref ref-type="bibr" rid="ref137 ref18 ref19 ref20">18, 19, 20</xref>
        ], and GPs [
        <xref ref-type="bibr" rid="ref11 ref26 ref28 ref29 ref30">26, 28, 11, 29, 30</xref>
        ], but also the less common RBFs, RFs, and RSVMs
[
        <xref ref-type="bibr" rid="ref24 ref33 ref34 ref36">24, 36, 33, 34</xref>
        ] are most often combined with the Covariance matrix adaptation evolution strategy
(CMA-ES). That is not surprising because CMA-ES has already in the 2000s become a state-of-the-art
approach to single-objective unconstrained continuous BBO. Basically, the CMA-ES evolves a Gaussian
estimate of the position of the minimum of the original objective function. That evolution relies on
simultaneous adaptation of the vector mean of the Gaussian estimate, of the scalar step size, and of the
covariance matrix. For more details of this sophisticated evolution strategy, the reader is referred to
the journal papers [
        <xref ref-type="bibr" rid="ref40 ref41">40, 41</xref>
        ]. GPs were also combined with other evolutionary optimization methods
[
        <xref ref-type="bibr" rid="ref27 ref42">27, 42</xref>
        ], and GPs, polynomials, and RBFs were combined with particle swarm optimization [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and with
memetic optimization [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Moreover, GPs are used in black-box optimization in two diferent ways. In
connection with evolutionary and similar BBO methods, they serve as a regression model evaluated
instead of the original objective function. In addition, they also play a key role in Bayesian optimization,
which then relies on GP-estimates of probability distributions of values of the original objective. Those
probability distributions enable several ways of searching for optima of that objective function, each of
them governed by a specific assessment of uncertainty of the objective function estimate, commonly
called acquisition function [
        <xref ref-type="bibr" rid="ref43 ref44 ref45">43, 44, 45</xref>
        ]. Occasionally, Bayesian optimization is combined with CMA-ES.
For example in [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ], optimization switches from the most traditional Bayesian optimization method,
EGO (Eficient Global Optimization) [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ], to CMA-ES.
      </p>
      <p>
        Finally, evolution control has been since the first surrogate-assisted BBO methods performed basically
in two ways, generation-based, and individual-based. In the generation based, all points are in some
generations evaluated with the true objective function, and in the remaining generations with the
model. On the other hand, in every generation of the individual-based evolution control, based on the
evaluation of all points with the model, a preselection of points to be evaluated with the true objective
function is performed [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In most of the surrogate-assisted methods, however, the evolution control is
specifically tailored to the respective method. Noteworthy, the authors of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] investigated mutually
replacing the evolution control of two important polynomial-assisted methods lmm-CMA [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ] and
lq-CMA-ES [
        <xref ref-type="bibr" rid="ref137 ref20">20</xref>
        ], and of two variants of the GP-assisted method DTS-CMA-ES [
        <xref ref-type="bibr" rid="ref12 ref47">47, 12</xref>
        ] with the evolution
control of the others. According to their findings, the success of those important methods is definitely
not limited to using the respective specific tailored evolution control. The surrogate-assisted black-box
optimization methods constructing several surrogate models simultaneously either aggregate them
to a team [
        <xref ref-type="bibr" rid="ref22 ref25">25, 22</xref>
        ] or complement the evolution control by a classifier selecting the most appropriate
among those models. Important examples of classifiers used in this context are ANNs [
        <xref ref-type="bibr" rid="ref48 ref49 ref50">48, 49, 50</xref>
        ],
and classification trees [
        <xref ref-type="bibr" rid="ref51 ref52">51, 52</xref>
        ]. Their learning can be viewed as metalearning because it is based on
metafeatures, i.e. properties empirically characterizing the objective function landscape and the BBO
method [
        <xref ref-type="bibr" rid="ref21 ref32 ref49 ref53">21, 32, 49, 53</xref>
        ]. Apart from classification according to the appropriateness of the surrogate
model for the considered data, metalearning can be also used for regression of model error on the
combination of values of metafeatures [
        <xref ref-type="bibr" rid="ref54">54</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Usability of Modern Neural Networks in Surrogate-Assisted BBO</title>
      <p>This section primarily reviews eight kinds of modern neural networks that we consider worth a research
into their ability to serve as surrogate models in BBO. A high-level overview of those kinds of ANNs
is given in Table 1, which for each of them mentions whether such research has already started. In
Subsection 3.1, two kinds integrating GPs into ANNs are recalled. Subsection 3.2 recalls three kinds of
ANNs providing the most advantageous property of GPs, their ability to estimate the distribution of
black-box objective function values. Finally, in Subsection 3.3, three well-known kinds of modern neural
networks, namely variational autoencoders, transformers, and generative adversarial networks, are
recalled due to the fact that they have already proven useful in the related area of Bayesian optimization.
In addition, Subsection 3.4 is devoted to knowledge transfer in surrogate-assisted BBO, which relates to
the usability of modern neural networks through their important role in transfer learning.</p>
      <sec id="sec-3-1">
        <title>3.1. Integration of GPs into ANNs</title>
        <p>
          The integration of GPs into ANNs has been proposed on two diferent levels:
1. At the layer level – a GP serves as the final layer of an MLP [
          <xref ref-type="bibr" rid="ref55 ref56">55, 56</xref>
          ]. Integration on that level is
based on the following two assumptions:
(i) If  denotes the number of the ANN input neurons, then the ANN computes a mapping
net of  -dimensional input values into the set  on which is the GP defined. Consequently,
the number  of neurons in the last hidden layer fulfills  ⊂ R , and the ANN maps an
input  into a point  = net() ∈  , corresponding to an observation  ( + ) governed by
the GP, where  is a zero-mean Gaussian noise. From the point of view of the ANN inputs,
the GP is now (GP(net(· )),  (net(· ), net(· ))), where GP is the mean function, and
 is the covariance function of the GP [
          <xref ref-type="bibr" rid="ref83">83</xref>
          ].
(ii) The GP mean  is assumed to be a known constant, thus not contributing to the GP
hyperparameters, and independent of net
2. At the level of individual neurons – GPs can replace all hidden and output neurons of an MLP.
        </p>
        <p>
          This kind of neural networks is commonly called deep Gaussian process [
          <xref ref-type="bibr" rid="ref59 ref60 ref61 ref62 ref63 ref84 ref85 ref86 ref87 ref88 ref89 ref90">59, 60, 61, 62, 63, 84, 85,
86, 87, 88, 89, 90</xref>
          ].
        </p>
        <p>
          Integration on both levels has been developed primarily for Bayesian modelling and optimization.
Nevertheless, GPs integrated as the last layer of MLPs have been used as surrogate models in a
CMAES-driven BBO [
          <xref ref-type="bibr" rid="ref57 ref58">57, 58</xref>
          ]. In particular, those surrogate models incorporate GPs with five commonly
employed covariance functions linear, quadratic, rational quadratic, squared exponential, and Matérn 52 ,
as well as with one composite covariance function superposing the quadratic and squared exponential.
Those 6 models were compared in [
          <xref ref-type="bibr" rid="ref57">57</xref>
          ] from the point of view of regression accuracy, evaluated on a
large dataset collected during many previous runs of DTS-CMA-ES on the collection of 24 noiseless
benchmarks from the Comparing Continuous Optimizers platform [
          <xref ref-type="bibr" rid="ref91 ref92">91, 92</xref>
          ] (cf. Section 4) in dimensions
2, 3, 5, 10, and 20. Then in [
          <xref ref-type="bibr" rid="ref58">58</xref>
          ], they were compared on the same benchmarks in the same dimensions
from the point of view of the success of surrogate-assisted optimization with CMA-ES. Unfortunately,
neither of those comparisons included more traditional surrogate models nor the CMA-ES without
surrogate assistance. To our knowledge, the only comparison that included both a GP integrated as the
last layer of an MLP, and more traditional surrogate models, was the comparison from the point of view
of regression accuracy in [
          <xref ref-type="bibr" rid="ref93">93</xref>
          ]. However, it included only one such integrated surrogate model, with the
GP using the most simple covariance function – the linear one, in addition to the traditional GP-based
surrogate models with eight diferent covariance functions, including the five listed above.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. ANNs Estimating the Distribution of Black-Box Objective Function Values</title>
        <p>
          In our opinion, the property of GPs most advantageous from the point of view of surrogate modelling
is that they estimate the whole distribution of a predicted value of the original black-box objective
function. Recall from Section 2 that due to that property also ensembles of regression trees – RFs – are
used as surrogate models [
          <xref ref-type="bibr" rid="ref32 ref33 ref34">33, 34, 32</xref>
          ]. This draws attention to those modern neural networks that also
allow estimation of such a distribution. Basically, there are three classes of them, difering in the way
how that estimate can be obtained.
        </p>
        <p>
          1. The multivariate normal distribution underlying GPs is actually the asymptotic distribution for
network width increasing to infinity . Such results have been established for several kinds of ANNs
[
          <xref ref-type="bibr" rid="ref88 ref94 ref95 ref96 ref97">94, 88, 95, 96, 97</xref>
          ]. In addition, closely related is the infinite width limit of the neural tangent
kernel, which governs the kernel gradient of the functional cost used in MLP regression [
          <xref ref-type="bibr" rid="ref64 ref65">64, 65</xref>
          ].
Although those results have great theoretical value, there can be a serious disparity between the
infinite width results and their finite width counterparts [
          <xref ref-type="bibr" rid="ref77">77</xref>
          ]. Therefore, it is unclear whether
they can be applied to surrogate modelling.
2. The distribution of a predicted value, or more precisely the parameters of such a distribution,
can be directly learned by an ANN. The best-known kind of such neural networks are the prior
networks, learning the parameters of a normal-inverse Wishart distribution, which is the conjugate
prior to a multivariate normal distribution [
          <xref ref-type="bibr" rid="ref66 ref67 ref68 ref69 ref70 ref98">66, 67, 68, 69, 70, 98</xref>
          ]. Prior networks belong to a
broader class of evidential neural networks [
          <xref ref-type="bibr" rid="ref100 ref101 ref102 ref103 ref99">99, 100, 101, 102, 103</xref>
          ]. Their name refers to the fact
that they follow the basic principle of the Dempster-Shafer theory of evidence [
          <xref ref-type="bibr" rid="ref104">104</xref>
          ] – to fall back
onto prior belief for unfamiliar data.
3. An estimate of the distribution of a predicted value is produced by an ensemble of neural networks.
        </p>
        <p>
          Important kinds of such ensembles are ensembles obtained through diversification of training data
[
          <xref ref-type="bibr" rid="ref105 ref106">105, 106</xref>
          ], ensembles obtained through diversification of network properties [
          <xref ref-type="bibr" rid="ref107 ref108 ref109">107, 108, 109</xref>
          ], a
specific subgroup of which are ensembles in which the diversification is achieved through diverse
activation functions [
          <xref ref-type="bibr" rid="ref76">76</xref>
          ], ensembles obtained through negative correlation learning [
          <xref ref-type="bibr" rid="ref110 ref111 ref112">110, 111, 112</xref>
          ],
bagging ensembles [
          <xref ref-type="bibr" rid="ref113 ref72">72, 113</xref>
          ], boosting ensembles [
          <xref ref-type="bibr" rid="ref114 ref115">114, 115</xref>
          ] deep ensembles [
          <xref ref-type="bibr" rid="ref116 ref73 ref74">73, 74, 116</xref>
          ] including
deep echo-state network ensembles [
          <xref ref-type="bibr" rid="ref117">117</xref>
          ], and anchored ensembles [
          <xref ref-type="bibr" rid="ref75">75</xref>
          ] with a later modification
random activation function (RAF) ensembles [
          <xref ref-type="bibr" rid="ref76">76</xref>
          ]. RAF ensembles take over the principle of
anchored ensembles that regularization is performed not with respect to zero, but with respect to
the initialization values of the parameters, which are assumed normally distributed. Diferently to
an anchored ensemble, however, an RAF ensemble uses varied ativation fuctions from an a priori
specified set of size AF. From that set, the activation function is chosen randomly, apart from the
ifrst AF members of the ensemble, among which each activation function occurs exactly once.
        </p>
        <p>We consider this last mentioned kind of ensembles as the state of the art.</p>
        <p>
          To our knowledge, the only ANNs estimating the distribution of function values that have already
been used as surrogate models in BBO, are prior networks. In [
          <xref ref-type="bibr" rid="ref71">71</xref>
          ], the prediction accuracy of four
versions has been evaluated on the above mentioned dataset from previous runs of DTS-CMA-ES.
This direction of research is continued by the present paper: Section 4 reports results for CMA-ES
surrogate-assisted by two variants of RAF ensembles.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. ANNs Found Useful in Bayesian Optimization</title>
        <p>
          Recall from Section 2 that GPs, simultaneously with their importance as surrogate models in BBO with
non-Bayesian methods, such as CMA-ES, also play a crucial role in Bayesian optimization. That is why
this subsection lists three well-known kinds of modern neural networks that have been recently found
useful in Bayesian optimization. In our opinion, this indicates that they are worth investigating whether
they could be used also in surrogate-assisted BBO.
1. Variational autoencoders have been utilized in Bayesian optimization because they allow for
optimization in a lower-dimensional latent space [
          <xref ref-type="bibr" rid="ref77 ref78">77, 78</xref>
          ].
2. The generative adversarial networks (GANs) paradigm has been recently shown to be applicable
to BBO: A generator proposes samples that align with the distribution of low values (or even the
optimal value) of the black-box function, while one or more discriminators classify samples based
on whether they belong to that distribution [
          <xref ref-type="bibr" rid="ref79 ref80">79, 80</xref>
          ].
3. Transformers have proven efective in estimating complex prior distributions for Bayesian
optimization [
          <xref ref-type="bibr" rid="ref81 ref82">81, 82</xref>
          ]. Notably, an OptFormer transformer trained on Google Vizier [
          <xref ref-type="bibr" rid="ref118">118</xref>
          ], the largest
hyperparameter optimization (HPO) database, achieved superior HPO outcomes compared to
GP-based Bayesian optimization [
          <xref ref-type="bibr" rid="ref81">81</xref>
          ]. Furthermore, the recently introduced transformer-based
Prior-data Fitted Networks [
          <xref ref-type="bibr" rid="ref82">82</xref>
          ] can mimic Gaussian Processes (GPs) and Bayesian networks,
while also incorporating additional information into the prior.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. ANN-Based Transfer Learning for Surrogate-Assisted Black-Box Optimization</title>
        <p>
          Obtaining accurate surrogate models in the initial stages of BBO is challenging due to the scarcity of
data points with evaluated objective function values. That can be mitigated by leveraging
knowledgetransfer learning. And a connection of modern kinds of neural networks with transfer learning is even
more obvious than with active learning. Indeed, transfer learning is nowadays one of the areas where
ANNs play most important role [
          <xref ref-type="bibr" rid="ref119 ref120 ref121">119, 120, 121</xref>
          ]. Diferent types of ANNs have been utilized to this end,
including convolutional [
          <xref ref-type="bibr" rid="ref122 ref123">122, 123</xref>
          ], recurrent [
          <xref ref-type="bibr" rid="ref124">124</xref>
          ], autoencoder [
          <xref ref-type="bibr" rid="ref125 ref126">125, 126</xref>
          ], GAN [
          <xref ref-type="bibr" rid="ref127 ref128 ref129">127, 128, 129</xref>
          ], and
transformer [
          <xref ref-type="bibr" rid="ref81">81</xref>
          ]. In the context of the research direction pursued in this paper, most interesting are
those that also have connections to BBO:
(i) Four ANN-based transfer learning approaches draw inspiration from the GAN paradigm. CoGAN
trains two GANs to generate the source and target, respectively, achieves a domain invariant
feature space by tying the high layers parameters of the two GANs, and performs domain
adaptation by training a classifier on the discriminator output [
          <xref ref-type="bibr" rid="ref130">130</xref>
          ]. Adversarial discriminative
domain adaptation learns first a discriminative representation using the labels in the source
domain, and then, using a domain-adversarial loss, a separate encoding that maps the target data
to the same space through an asymmetric mapping [
          <xref ref-type="bibr" rid="ref127">127</xref>
          ]. Minimax-game-based selective transfer
learning employs a selector and a discriminator to identify source domain data resembling the
target domain’s distribution, and distinguish genuine target domain data from selected source
domain data, respectively [
          <xref ref-type="bibr" rid="ref129">129</xref>
          ]. Selective adversarial network addresses negative transfer by
excluding outlier classes from the source domain selection, and maximizing the similarity between
source and target domain data distributions [
          <xref ref-type="bibr" rid="ref128">128</xref>
          ].
(ii) An autoencoder for transfer learning, described in [
          <xref ref-type="bibr" rid="ref125 ref126">125, 126</xref>
          ], incorporates embedding and label
encoding layers. The embedding layer reduces the disparity between instance distributions from
the source and target domains, while the label encoding layer utilizes a softmax regression model
to encode label information from the source domain.
(iii) The transformer OptFormer has demonstrated competitiveness with specific transfer learning
methods, although its usage leans more toward metalearning than traditional transfer learning
[
          <xref ref-type="bibr" rid="ref81">81</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation of RAF Ensembles</title>
      <p>
        This section describes a small experimental contribution to one of the above surveyed possible research
directions: RAF ensembles are experimentally evaluated as surrogate models for CMA-ES. The
experiments were performed on the probably most commonly used platform for experimenting in continuous
optimization – COCO ( Comparing Continuous Optimizers) [
        <xref ref-type="bibr" rid="ref92">92</xref>
        ]. COCO contains severeal suites of
benchmark functions, our evaluation was performed with the most traditional suite, which is the bbob
suite [
        <xref ref-type="bibr" rid="ref92">92</xref>
        ]. It consists of 24 dimension-scalable noiseless benchmark functions, the definitions of which
have been given in [
        <xref ref-type="bibr" rid="ref91">91</xref>
        ]. Each function is used in 15 diferently rotated and/or translated instances. The
employed benchmarks forming the bobo suite are surveyed in Appendix A.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Considered Variants of RAF Ensembles</title>
        <p>
          As activation functions forming an RAF ensemble, we employed those included in the implementation
[
          <xref ref-type="bibr" rid="ref131">131</xref>
          ], to which the RAF paper refers [
          <xref ref-type="bibr" rid="ref76">76</xref>
          ]. They are listed in Appendix B. We used them in two variants
of RAF ensembles:
1. An RAF ensemble of size 5 trained directly using the above mentioned implementation [
          <xref ref-type="bibr" rid="ref131">131</xref>
          ], and
aggregated by the empirical mean. In the results, it will be denoted simply RAF.
2. An ensemble of size 5, in which the diferences of values of the original black-box objective
function with respect to its median are first transformed to their logarithms before using [
          <xref ref-type="bibr" rid="ref131">131</xref>
          ] in
the logarithmic scale to train the ensemble. This transformation attempts to deal with situations
when the function returns in many points values close to the median. The aggregation function
is again the empirical mean, which in terms of the data before the logarithmic transformation
actually corresponds to the empirical geometric mean. That version will be in the results denoted
RAF-log.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Considered CMA-ES Variants for Comparison</title>
        <p>
          CMA-ES surrogate-assisted by the above mentioned two variants of RAF ensembles was compared
with CMA-ES without surrogate modelling, as well as with two earlier surrogate-assisted variants of
CMA-ES:
3. CMA-ES without surrogate modelling was used in an implementation that is in the COCO data
archive [
          <xref ref-type="bibr" rid="ref132">132</xref>
          ] called default-CMA-ES, and described as "default CMA-ES from the pycma module,
version 3.3.0". Here, it will be in the results denoted simply default.
4. DTS-CMA-ES [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], using a surrogate GP with the covariance function Matérn 52 . In the results, it
will be denoted simply DTS.
        </p>
        <p>
          5. lq-CMA-ES [
          <xref ref-type="bibr" rid="ref137 ref20">20</xref>
          ], which will be in the results denoted simply lq.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evolution Control</title>
        <p>Whereas DTS-CMA-ES and lq-CMA-ES have each their own evolution control, for the two variants of
RAF ensembles was necessary to propose when to evaluate a given point  by the original black-box
objective function bb, and when by its surrogate model sm. We decided to use a modification of the
lq-CMA-ES evolution control. That modification is described below in Algorithm 1 using the notation
 ((1, . . . , ), (1, . . . , )) for the Kendall correlation coeficient between the sequences (1, . . . , )
and (1, . . . , ), and the notation  for the ranking function on R, i.e.,
 : R → Π() with Π() denoting the set of permutations of {1, . . . , }</p>
        <p>such that ∀ ∈ R : ( ()) &lt; ( ()) ⇒  ≤  . (1)</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Results</title>
        <p>In Tables 2–3, the two considered variants of RAF ensembles, and three considered other CMA-ES
variants, are compared based on the diference between the optimal value of the objective function,
and its value achieved for a given evaluation budget. The achieved values were averaged over the
15 instances provided by the COCO benchmark suite in each dimension for each of the 24 noiseless
functions listed in Appendix A. The comparisons were performed separately for each of the five above
described groups of those functions, and subsequently also for all 24 of them, each time including the
instances in dimensions 2, 3, 5, 10, and 20. For each evaluation budget, hence, six evaluations were
default CMA-ES
13.5
13.5
17.5
23**</p>
        <p>default CMA-ES
6.5
7
15*
16**</p>
        <p>default CMA-ES
11
12
19
22.5**</p>
        <p>default CMA-ES
4.5
2
14
18.5*</p>
        <p>default CMA-ES
4
1
25**
25**</p>
        <p>CMA-ES alone
11.5
12
15
13
default CMA-ES
14
18.5
23**
20*</p>
        <p>default CMA-ES
45
45.5
96**
99**
Algorithm 1 Evolution control used for RAF and RAF-log ensembles.
performed. The comparisons in Table 2 were conducted for the evaluation budget 3× dimension, while
the comparisons in Table 3 were conducted for the evaluation budget 50× dimension.</p>
        <p>
          The results of each of those 12 comparisons were subsequently assessed for statistical significance.
First, the hypothesis that all five considered methods are equivalent was tested by the Friedman test.
With the exception of both comparisons for multi-modal functions with adequate global structure,
the test rejected that hypothesis on the familywise significance level 5%, using the Holm procedure
for multiple-hypothesis correction [
          <xref ref-type="bibr" rid="ref133">133</xref>
          ]. This rejection justified testing the equivalence of any two
among the five methods. We adopted the arguments of [
          <xref ref-type="bibr" rid="ref134">134</xref>
          ] that, in machine learning, the Wilcoxon
signed-rank test is more appropriate for this purpose than the post-hoc tests presented in [
          <xref ref-type="bibr" rid="ref135">135</xref>
          ] and
[
          <xref ref-type="bibr" rid="ref133">133</xref>
          ]. If for particular two methods, the Wilcoxon signed-rank test rejected the hypothesis that they
are equivalent, then in the respective table, their comparison in the row corresponding to the method
that was more frequently better is shown in bold italics.
        </p>
        <p>The results in Tables 2–3 primarily confirm the superior performance of the methods lq-CMA-ES,
and DTS-CM-ES. In the two comparisons based on all 120 noiseless benchmark functions, each of
them is for both considered budgets significantly better not only than default CMA-ES, but also than
CMA-ES surrogate-assisted by the two variants of RAF ensembles. Moreover, lq-CMA-ES is also among
the 10 comparisons based on individual groups of functions 6 times significantly better than default
CMA-ES, and 7 times, respectively 5 times significantly better than CMA-ES surrogate-assisted by RAF,
respectively by RAF-log. For DTS-CMA-ES, the results of the 10 comparisons based on individual groups
of functions are less convincing: 3 times significantly better than default CMA-ES, 3 times than CMA-ES
surrogate-assisted by RAF, and only once than CMA-ES assisted by RAF-log. As to a comparison
between the two variants of RAF ensembles, the diferences among them were not significant apart from
unimodal functions with high conditioning, for which CMA-ES achieves significantly better results if
assisted by RAF than if assisted by RAF-log.</p>
        <p>The diferent progress of optimization performed by each of the compared methods is illustrated,
always in three particular dimensions, by means of optimization-progress plots. They show the average
diference Δ between the optimal and achieved value of the objective function over the 15 COCO
instances. For that illustration, we have chosen the functions 9 (Figure 1), 18 (Figure 2), and 20
(Figure 3). We can see that optimisation using CMA-ES surrogate-assisted by RAF or RAF-log sometimes
leads to similarly fast decrease of the objective function as, or even faster than, optimization using
)f
(10 2
g
o
l 4
)f 0
(10 2
g
o
l
4
6
8</p>
        <p>100 150
Number of evaluations / D
200
250
the state-of-the-art methods DTS-CMA-ES or lq-CMA-ES. In Figure 1, this is the case for RAF-log in
dimension 2. In Figure 2, dimesnion 3, CMA-ES surrogate-assisted by RAF reaches lower values of
the objective function than any other of the compared methods, whereas in dimension 2, CMA-ES
surrogate-assisted by any of RAF or RAF-log leads to a similarly fast decrease of 18 as DTS-CMA-ES
but slower than lq-CMA-ES. Finally, in Figure 3, dimensions 3 and 5, CMA-ES surrogate-assisted by any
of RAF or RAF-log leads to a similarly fast decrease of 18 as lq-CMA-ES, but slower than DTS-CMA-ES.
2
0
()f 2
0
1
g
lo 4
)f 2
(
0
1
log 4
6
8
2
0
6
8
2
0
6
8
)f 2
(
0
1
log 4
2-D
3-D
5-D
0
50</p>
        <p>100 150
Number of evaluations / D
200
250</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The paper was motivated by our opinion that the intense and successful development of artificial
neural networks during the last 15 years suggests that they again have the potential to be important
for active learning in surrogate-assisted BBO. It surveyed possible directions of research into that
potential, including closely connected research into neural-network-based transfer learning for surrogate
modelling. Moreover, it recalled the first published investigations in some of those directions, and added
a new contribution to the emerging mosaic of those investigations.
4
2
0
6
8
4
2
0
6
8
4
2
) 0
f
(10 2
g
o
l
4
6
8
2-D
3-D
5-D
0
50</p>
      <p>100 150
Number of evaluations / D
200
250</p>
      <p>The fact that the main purpose of the experimental section of the paper is to contribute to the mosaic
of emerging investigations should be epmhasized especially in context of the obtained experimental
results. It justifiess that there is no significant diference between using CMA-ES surrogate-assisted by
RAF ensembles and using it alone, as well as that results with RAF-ensemble-based surrogate models are
significantly worse than results with the state-of-the-art surrogate-assisted CMA-ES variants,
lq-CMAES, and DTS-CMA-ES. This is an obvious limitation not only of RAF ensembles, but of all above surveyed
kinds of neural networks that have been so far investigated as surrogate models for CMA-ES. On the
other hand, as the survey has shown, there are many more other possibilities for such investigations
within future research.</p>
      <sec id="sec-5-1">
        <title>Acknowledgemengt</title>
        <p>The research reported in this paper has been supported by the German Research Foundation (DFG)
funded project 467401796, and by the Czech Technical University grant SGS 23/205/OHK3/3T/18. The
authors are very grateful to Jaroslav Langer for his crucial contribution to the RAF experiments.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Employed Benchmarks</title>
      <p>The functions in the bbob suite are divided into five groups:
1. Separable functions (Figure 4).</p>
      <p>• 1: sphere;
• 2: ellipsoidal;
• 3: Rastrigin;
• 4: Büche-Rastrigin;
• 5: linear slope.
• 6: attractive sector;
• 7: step ellipsoidal;
• 8: Rosenbrock;
• 8: Rosenbrock rotated.
• 10: ellipsoidal;
• 11: discus;
• 12: bent cigar;
2. Functions with low or moderate conditioning (Figure 5).
3. Unimodal functions with high conditioning (Figure 6).</p>
      <p>• 13: sharp ridge;
• 14: diferent powers.
4. Multi-modal functions with adequate global structure (Figure 7).</p>
      <p>• 15: Rastrigin;
• 16: Weierstrass;
• 17: Schafers F7 function;
• 18: Schafers F7 function, moderately ill-conditioned;
• 19: composite Griewank-Rosenbrock function F8F2.
• 24: Lunacek bi-Rastrigin.</p>
      <p>Activation Functions Employed to Form an RAF Ensemble
0. In the employed Tensorflow implementation,  = 1.05070098,  = 1.67326324.
• Gauss error function
• Gaussian error linear unit
• Scaled exponential linear unit</p>
      <p>where ,  &gt;
• Softsign activation function
• Hyperbolic tangent
erf() =
{︃∫︀0 e− 2 d</p>
      <p>(1 + erf( √ )).</p>
      <p>2
if  ≥</p>
      <p>0,
softsign() =
tanh() =</p>
      <p>.
(2)
(3)
(4)
(5)
(6)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baerns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Combinatorial Development of Solid Catalytic Materials</article-title>
          .
          <article-title>Design of HighThroughput Experiments, Data Analysis, Data Mining</article-title>
          , Imperial College Press / World Scientific, London,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Booker</surname>
          </string-name>
          , J. Dennis,
          <string-name>
            <given-names>P.</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. V.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trosset</surname>
          </string-name>
          ,
          <article-title>A rigorous framework for optimization by surrogates</article-title>
          ,
          <source>Structural and Multidisciplinary Optimization</source>
          <volume>17</volume>
          (
          <year>1999</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>El-Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <article-title>Metamodeling techniques for evolutionary optimization of computaitonally expensive problems: Promises and limitations</article-title>
          ,
          <source>in: Proceedings of the Genetic and Evolutionary Computation Conference</source>
          , Morgan Kaufmann Publishers,
          <year>1999</year>
          , pp.
          <fpage>196</fpage>
          -
          <lpage>203</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ratle</surname>
          </string-name>
          ,
          <article-title>Kriging as a surrogate fitness landscape in evolutionary optimization</article-title>
          ,
          <source>Artificial Intelligence for Engineering Design, Analysis and Manufacturing</source>
          <volume>15</volume>
          (
          <year>2001</year>
          )
          <fpage>37</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Emmerich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Özdemir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bäck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Giannakoglou</surname>
          </string-name>
          ,
          <article-title>Metamodel-assisted evolution strategies</article-title>
          , in: PPSN, ACM,
          <year>2002</year>
          , pp.
          <fpage>361</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Leary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhaskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <article-title>A derivative based surrogate model for approximating and optimizing the output of an expensive computer simulation</article-title>
          ,
          <source>Journal of Global Optimization</source>
          <volume>30</volume>
          (
          <year>2004</year>
          )
          <fpage>39</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <article-title>Surrogate-assisted evolutionary optimization frameworks for high-fidelity engineering design problems</article-title>
          , in: Y. Jin (Ed.),
          <source>Knowledge Incorporation in Evolutionary Computation</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasheed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vattam</surname>
          </string-name>
          ,
          <article-title>Methods for using surrogate modesl to speed up genetic algorithm oprimization: Informed operators and genetic engineering</article-title>
          , in: Y. Jin (Ed.),
          <source>Knowledge Incorporation in Evolutionary Computation</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sendhof</surname>
          </string-name>
          ,
          <article-title>A framework for evolutionary optimization with approximate iftness functions</article-title>
          ,
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>6</volume>
          (
          <year>2002</year>
          )
          <fpage>481</fpage>
          -
          <lpage>494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Büche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schraudolph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koumoutsakos</surname>
          </string-name>
          ,
          <article-title>Accelerating evolutionary algorithms with Gaussian process fitness function models</article-title>
          ,
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C:
          <article-title>Applications</article-title>
          and Reviews 35 (
          <year>2005</year>
          )
          <fpage>183</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Radi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El Hami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <article-title>CMA evolution strategy assisted by kriging model and approximate ranking</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>48</volume>
          (
          <year>2018</year>
          )
          <fpage>4288</fpage>
          -
          <lpage>4204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bajer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Repický</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Gaussian process surrogate models for the CMA evolution strategy</article-title>
          ,
          <source>Evolutionary Computation</source>
          <volume>27</volume>
          (
          <year>2019</year>
          )
          <fpage>665</fpage>
          -
          <lpage>697</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hanuš</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Koza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tumpach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Interaction between model and its evolution control in surrogate-assisted CMA evolution strategy</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2021</year>
          , p.
          <volume>358</volume>
          (paper no.).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dufossé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          , Augmented Lagrangian,
          <article-title>penalty techniques and surrogate modeling for constrained optimization with CMA-ES</article-title>
          , in: GECCO,
          <year>2021</year>
          , pp.
          <fpage>519</fpage>
          -
          <lpage>527</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sakamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Akimoto</surname>
          </string-name>
          ,
          <article-title>Adaptive ranking-based constraint handling for explicitly constrained black-box optimization</article-title>
          ,
          <source>Evolutionary Computation</source>
          <volume>30</volume>
          (
          <year>2022</year>
          )
          <fpage>503</fpage>
          -
          <lpage>529</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebag</surname>
          </string-name>
          ,
          <article-title>A mono surrogate for objective optimization</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>471</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gibson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Everson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fieldsend</surname>
          </string-name>
          ,
          <article-title>Guiding surrogate-assisted multi-objective optimisation with decision maker preferences</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>786</fpage>
          -
          <lpage>795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koumoutsakos</surname>
          </string-name>
          ,
          <article-title>Local metamodels for optimization using evolution strategies</article-title>
          ,
          <source>in: PPSN</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>939</fpage>
          -
          <lpage>948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Auger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brockhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <article-title>Benchmarking the local metamodel cma-es on the noiseless BBOB'2013 test bed</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1225</fpage>
          -
          <lpage>1232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <article-title>A global surrogate assisted CMA-ES</article-title>
          , in: GECCO,
          <year>2019</year>
          , pp.
          <fpage>664</fpage>
          -
          <lpage>672</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>An adaptive model selection strategy for surrogate- assisted particle swarm optimization algorithm</article-title>
          ,
          <source>in: IEEE SCI</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Doherty</surname>
          </string-name>
          ,
          <article-title>Committee-based active learning for surrogate-assisted particle swarm optimization of expensive problems</article-title>
          ,
          <source>IEEE Transactions on Cybernetics</source>
          <volume>47</volume>
          (
          <year>2017</year>
          )
          <fpage>2664</fpage>
          -
          <lpage>2677</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Papadrakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lagaros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsompanakis</surname>
          </string-name>
          ,
          <article-title>Structural optimization using evolution strategies and neural networks</article-title>
          ,
          <source>Computer Methods in Applied Mechanics and Engineering</source>
          <volume>156</volume>
          (
          <year>1998</year>
          )
          <fpage>309</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ulmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Streichert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zell</surname>
          </string-name>
          ,
          <article-title>Model-assisted steady state evolution strategies</article-title>
          ,
          <source>in: GECCO</source>
          , Springer,
          <year>2003</year>
          , pp.
          <fpage>610</fpage>
          -
          <lpage>621</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. B.</surname>
          </string-name>
          ,
          <article-title>A study on metamodeling techniques, ensembles, and multi-surrogates in evolutionary computation</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>1288</fpage>
          -
          <lpage>1295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bajer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Surrogate model for continuous and discrete genetic optimization based on RBF networks</article-title>
          ,
          <source>in: Intelligent Data Engineering and Automated Learning</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>258</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>L.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          , W. Zhong,
          <article-title>Gaussian process assisted coevolutionary estimation of distribution algorithm for computationally expensive problems</article-title>
          , Journal of Central South University of Technology 19 (
          <year>2012</year>
          )
          <fpage>443</fpage>
          -
          <lpage>452</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>V.</given-names>
            <surname>Volz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Naujoks</surname>
          </string-name>
          ,
          <article-title>Investigating uncertainty propagation in surrogate-assisted evolutionary algorithms</article-title>
          , in: GECCO,
          <year>2017</year>
          , pp.
          <fpage>881</fpage>
          -
          <lpage>888</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Toal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <article-title>Simple surrogate model assisted optimization with covariance matrix adaptation</article-title>
          ,
          <source>in: PPSN</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>184</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Elite-driven surrogate assisted CMA-ES algorithm by improved lower confidence bound method</article-title>
          , in: Engineering with Computers, Springer,
          <year>2022</year>
          , pp.
          <volume>10</volume>
          .
          <issue>1007</issue>
          /s00366- 022-01642-5 (doi).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lum</surname>
          </string-name>
          ,
          <article-title>Combining global and local surrogate models to accellerate evolutionary optimization</article-title>
          ,
          <source>IEEE Transactions on Systems, Man and Cybernetics</source>
          . Part C:
          <article-title>Applications</article-title>
          and Reviews 37 (
          <year>2007</year>
          )
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>B.</given-names>
            <surname>Saini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lópey-Ibañez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Miettinen</surname>
          </string-name>
          ,
          <article-title>Automatic surrogate modelling technique selection based on features of optimization problems</article-title>
          , in: GECCO,
          <year>2019</year>
          , pp.
          <fpage>1765</fpage>
          -
          <lpage>1772</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>N.</given-names>
            <surname>Belkhir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dréo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Savéant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          ,
          <article-title>Per instance algorithm configuration of CMA-ES with limited budget</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>681</fpage>
          -
          <lpage>688</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Repický</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Boosted regression forest for the doubly trained surrogate covariance matrix adaptation evolution strategy</article-title>
          ,
          <source>in: ITAT</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Runarsson</surname>
          </string-name>
          ,
          <article-title>Ordinal regression in evolutionary computation</article-title>
          ,
          <source>in: PPSN</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>1048</fpage>
          -
          <lpage>1057</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebag</surname>
          </string-name>
          ,
          <article-title>Comparison-based optimizers need comparison-based surrogates</article-title>
          ,
          <source>in: PPSN</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>364</fpage>
          -
          <lpage>373</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>A. A.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <article-title>Adaptive function value warping for surrogate model assisted evolutionary optimization</article-title>
          ,
          <source>in: PPSN</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>I.</given-names>
            <surname>Couckuyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gorissen</surname>
          </string-name>
          ,
          <article-title>Automatic surrogate model type selection during the optimization of expensive black-box problems</article-title>
          , in: Winter Simulation Conference,
          <year>2011</year>
          , pp.
          <fpage>4285</fpage>
          -
          <lpage>4293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          <article-title>, Multi-surrogate-based global optimization using a score-based infill criterion</article-title>
          ,
          <source>Structural and Multidisciplinary Optimization</source>
          <volume>59</volume>
          (
          <year>2019</year>
          )
          <fpage>485</fpage>
          -
          <lpage>506</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ostermaier</surname>
          </string-name>
          ,
          <article-title>Completely derandomized self-adaptation in evolution strategies</article-title>
          ,
          <source>Evolutionary Computation</source>
          <volume>9</volume>
          (
          <year>2001</year>
          )
          <fpage>159</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <article-title>The CMA evolution strategy: A comparing review</article-title>
          ,
          <source>in: Towards a New Evolutionary Computation</source>
          , Springer,
          <year>2006</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>M. Wu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Karkar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yakovlev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Gielen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Grout</surname>
          </string-name>
          ,
          <article-title>Network on chip optimization based on surrogate model assisted evolutionary algorithms</article-title>
          , in: IEEE CEC,
          <year>2014</year>
          , pp.
          <fpage>3266</fpage>
          -
          <lpage>3271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schonlau</surname>
          </string-name>
          , W. Welch,
          <article-title>Eficient global optimization of expensive black-box functions</article-title>
          ,
          <source>Journal of Global Optimization</source>
          <volume>13</volume>
          (
          <year>1998</year>
          )
          <fpage>455</fpage>
          -
          <lpage>492</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>J.</given-names>
            <surname>Knowles</surname>
          </string-name>
          ,
          <article-title>ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems</article-title>
          ,
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>10</volume>
          (
          <year>2006</year>
          )
          <fpage>50</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Diouane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Picheny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Le</given-names>
            <surname>Riche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Perrotolo</surname>
          </string-name>
          ,
          <article-title>TREGO: a trust-region framework for eficient global optimization</article-title>
          ,
          <source>Journal of Global Optimization</source>
          <volume>85</volume>
          (
          <year>2022</year>
          )
          <volume>10</volume>
          .1007/s10898-022-01245-w (doi).
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Riche</surname>
          </string-name>
          , E. Touboul,
          <string-name>
            <surname>Making</surname>
            <given-names>EGO</given-names>
          </string-name>
          and
          <article-title>CMA-ES complementary for global optimization</article-title>
          ,
          <source>in: Learning and Intelligent Optimization</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bajer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Doubly trained evolution control for the surrogate cma-es</article-title>
          ,
          <source>in: PPSN</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuen</surname>
          </string-name>
          ,
          <article-title>Black box algorithm selection by convolutional neural network</article-title>
          ,
          <source>in: LOD</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>264</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pikalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mironovich</surname>
          </string-name>
          ,
          <article-title>Automated parameter choice with exploratory landscape analysis and machine learning</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1982</fpage>
          -
          <lpage>1985</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>R.</given-names>
            <surname>Prager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Trautman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kerschke</surname>
          </string-name>
          ,
          <article-title>Towards feature-free automated algorithm selection for single-objective continuous black box optimization</article-title>
          ,
          <source>in: IEEE SCI</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bajer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Knowledge-based selection of gaussian process surrogates</article-title>
          ,
          <source>in: ECML Workshop IAL</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>48</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Repický</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Landscape analysis of Gaussian process surrogates for the covariance matrix adaptation evolution strategy</article-title>
          , in: GECCO, ACM,
          <year>2019</year>
          , pp.
          <fpage>691</fpage>
          -
          <lpage>699</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>R.</given-names>
            <surname>Seiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.V.</given-names>
            and
            <surname>Prager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kerschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Trautmann</surname>
          </string-name>
          ,
          <article-title>A collection of deep learning-based featurefree approaches for characterizing single-objective continuous fitness landscapes</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>657</fpage>
          -
          <lpage>665</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jankovic</surname>
          </string-name>
          , G. Popovski,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eftimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Doerr</surname>
          </string-name>
          ,
          <article-title>The impact of hyper-parameter tuning for landscape-aware performance regression and algorithm selection</article-title>
          ,
          <source>in: GECCO</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>687</fpage>
          -
          <lpage>696</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>R.</given-names>
            <surname>Calandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deisenroth</surname>
          </string-name>
          ,
          <article-title>Manifold Gaussian processes for regression</article-title>
          ,
          <source>in: IJCNN</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>3338</fpage>
          -
          <lpage>3345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          , E. Xing,
          <article-title>Deep kernel learning</article-title>
          ,
          <source>in: ICAIS</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>370</fpage>
          -
          <lpage>378</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>J.</given-names>
            <surname>Koza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tumpach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Combining gaussian processes and neural networks in surrogate modeling for covariance matrix adaptation evolution strategy</article-title>
          , in: IAL Workshop, ECML PKDD,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [58]
          <string-name>
            <surname>J. R</surname>
          </string-name>
          u‌žička, J. Koza,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tumpach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Combining gaussian processes with neural networks for active learning in optimization</article-title>
          ,
          <source>in: ECML Workshop IAL</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>H.</given-names>
            <surname>Salimbeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deisnroth</surname>
          </string-name>
          ,
          <article-title>Doubly stochastic variational inference for deep Gaussian processes</article-title>
          , in: NeurIPS,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>K.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kaski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinonen</surname>
          </string-name>
          ,
          <article-title>Deep convolutional Gaussian processes</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>582</fpage>
          -
          <lpage>597</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hernández-Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Villacampa-Calvo</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Hernández Lobato, Deep Gaussian processes using expectation propagation and Monte Carlo methods</article-title>
          ,
          <source>in: ECML PKDD</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>479</fpage>
          -
          <lpage>494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          [62]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ming</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Williamson</surname>
          </string-name>
          , S. Guillas,
          <article-title>Deep Gaussian process emulation using stochastic imputation</article-title>
          ,
          <source>Techometrics</source>
          <volume>65</volume>
          (
          <year>2022</year>
          )
          <fpage>150</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          [63]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gramacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hugdon</surname>
          </string-name>
          ,
          <article-title>Active learning for deep gaussian process surrogates</article-title>
          ,
          <source>Technometrics</source>
          <volume>65</volume>
          (
          <year>2023</year>
          )
          <fpage>4</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          [64]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jacot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gabriel</surname>
          </string-name>
          , C. Hongler,
          <article-title>Neural tangent kernel: Convergence and generalization in neural networks</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          [65]
          <string-name>
            <given-names>R.</given-names>
            <surname>Novak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alemi</surname>
          </string-name>
          , et al.,
          <article-title>Neural tangents: Fast and easy infinite neural networks in python</article-title>
          , in: ICLR,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          [66]
          <string-name>
            <given-names>A.</given-names>
            <surname>Malinin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gales</surname>
          </string-name>
          ,
          <article-title>Predictive uncertainty estimation via prior networks</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          [67]
          <string-name>
            <given-names>M.</given-names>
            <surname>Biloš</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Charpentier</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Günnemann,</surname>
          </string-name>
          <article-title>Uncertainty on asynchronous time event prediction</article-title>
          , in: NeurIPS,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          [68]
          <string-name>
            <given-names>A.</given-names>
            <surname>Malinin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gales</surname>
          </string-name>
          ,
          <article-title>Reverse KL-divergence training of prior networks: Improved uncertainty and adversarial robustness</article-title>
          , in: NeurIPS,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          [69]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nandy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Towards maximizing the representation gap between in-domain and out-of-distribution examples</article-title>
          , in: NeurIPS,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          [70]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>Uncertainty aware semi-supervised learning on graph data</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          [71]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tumpach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Koza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Neural-network-based estimation of normal distributions in black-box optimization</article-title>
          ,
          <source>in: ESANN</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          [72]
          <string-name>
            <given-names>C.</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Saravia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Allende</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Monge</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Fernández, Parallel approach for ensemble learning with locally coupled neural networks</article-title>
          ,
          <source>Neural Processing Letters</source>
          <volume>32</volume>
          (
          <year>2010</year>
          )
          <fpage>277</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref73">
        <mixed-citation>
          [73]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lakshminaraynan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prityel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Blundell</surname>
          </string-name>
          ,
          <article-title>Simple and scalable predictive uncertainty estimation using deep ensembles</article-title>
          , in: NeurIPS,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref74">
        <mixed-citation>
          [74]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>The</surname>
            <given-names>MBPEP</given-names>
          </string-name>
          :
          <article-title>A deep ensemble pruning algorithm providing high quality uncertainty prediction</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>49</volume>
          (
          <year>2019</year>
          )
          <fpage>2942</fpage>
          -
          <lpage>2955</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref75">
        <mixed-citation>
          [75]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pearce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leibfried</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brintrup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neely</surname>
          </string-name>
          ,
          <article-title>Uncertainty in neural networks: Approximately bayesian ensembling</article-title>
          ,
          <source>in: AISTATS</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref76">
        <mixed-citation>
          [76]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Stoyanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tavakol</surname>
          </string-name>
          ,
          <article-title>Toward robust uncertainty estimation with random activation functions</article-title>
          ,
          <source>in: AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref77">
        <mixed-citation>
          [77]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoek</surname>
          </string-name>
          , et al.,
          <article-title>Deep learning for bayesian optimization of scientific problems with high-dimensional structure</article-title>
          ,
          <source>Transactions on Machine Learning Research</source>
          <volume>1</volume>
          (
          <year>2022</year>
          )
          <article-title>openreview tPMQ6Je2rB</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref78">
        <mixed-citation>
          [78]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tripp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daxberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hernández-Lobato</surname>
          </string-name>
          ,
          <article-title>Sample-eficient optimization in the latent space of deep generative models viaweighted retraining</article-title>
          , in: NeurIPS,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref79">
        <mixed-citation>
          [79]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gillhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ramsauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brandstetter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schäfl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <article-title>A GAN based solver of black-box inverse problems</article-title>
          , in: NeurIPS,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref80">
        <mixed-citation>
          [80]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>OPT-GAN</surname>
          </string-name>
          :
          <article-title>A broad-spectrum global optimizer for black-box problems by learning distribution</article-title>
          ,
          <source>2022. Arxiv</source>
          <volume>2102</volume>
          .
          <year>03888v5</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref81">
        <mixed-citation>
          [81]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Towards learning universal hyperparameter optimizers with transformers</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref82">
        <mixed-citation>
          [82]
          <string-name>
            <given-names>S.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Feurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hollmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>PFNs4BO: In-context learning for Bayesian optimization</article-title>
          , in: ICML,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref83">
        <mixed-citation>
          [83]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Gaussian Processes for Machine Learning</article-title>
          , MIT Press, Cambridge,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref84">
        <mixed-citation>
          [84]
          <string-name>
            <given-names>A.</given-names>
            <surname>Damianou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          , Deep Gaussian processes, in: AISTATS,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref85">
        <mixed-citation>
          [85]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hernandez-Lobato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hernandez-Lobato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <article-title>Deep Gaussian processes for regression using approximate expectation propagation</article-title>
          , in: ICML,
          <year>2016</year>
          , pp.
          <fpage>1472</fpage>
          -
          <lpage>1481</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref86">
        <mixed-citation>
          [86]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cutajar</surname>
          </string-name>
          , E. Bonilla,
          <string-name>
            <given-names>P.</given-names>
            <surname>Michiardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Filippone</surname>
          </string-name>
          ,
          <article-title>Random feature expansions for deep Gaussian processes</article-title>
          , in: ICML,
          <year>2017</year>
          , pp.
          <fpage>884</fpage>
          -
          <lpage>893</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref87">
        <mixed-citation>
          [87]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hebbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brevault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balesdent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Talbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Melab</surname>
          </string-name>
          ,
          <article-title>Eficient global optimization using deep Gaussian processes</article-title>
          , in: IEEE CEC,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref88">
        <mixed-citation>
          [88]
          <string-name>
            <given-names>A.</given-names>
            <surname>Matthews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rowland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <article-title>Gaussian process behaviour in wide deep neural networks</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref89">
        <mixed-citation>
          [89]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hebbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brevault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balesdent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Talbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mela</surname>
          </string-name>
          ,
          <article-title>Bayesian optimization using deep Gaussian processes</article-title>
          ,
          <year>2019</year>
          . ArXiv:
          <year>1905</year>
          .03350v1.
        </mixed-citation>
      </ref>
      <ref id="ref90">
        <mixed-citation>
          [90]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jaillet</surname>
          </string-name>
          , D. Liu,
          <article-title>Convolutional normalizing flows for deep Gaussian processes</article-title>
          , in: IJCNN,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref91">
        <mixed-citation>
          [91]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Finck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Auger</surname>
          </string-name>
          ,
          <string-name>
            <surname>Real-Parameter Black-Box Optimization</surname>
          </string-name>
          Benchmarking 2009:
          <article-title>Noiseless Functions Definitions</article-title>
          ,
          <source>Technical Report</source>
          , INRIA, Paris Saclay,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref92">
        <mixed-citation>
          [92]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Auger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Merseman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tušar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brockhof</surname>
          </string-name>
          ,
          <article-title>COCO: a platform for comparing continuous optimizers in a black box setting</article-title>
          ,
          <source>Optimization Methods and Software</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>114</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref93">
        <mixed-citation>
          [93]
          <string-name>
            <given-names>J.</given-names>
            <surname>Koza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tumpach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holeňa</surname>
          </string-name>
          ,
          <article-title>Using past experience for configuration of Gaussian processes in black-box optimization</article-title>
          ,
          <source>in: LION</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref94">
        <mixed-citation>
          [94]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Novak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schoenholz</surname>
          </string-name>
          , et al.,
          <article-title>Deep neural networks as Gaussian processes</article-title>
          , in: ICLR,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref95">
        <mixed-citation>
          [95]
          <string-name>
            <given-names>R.</given-names>
            <surname>Novak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bahri</surname>
          </string-name>
          , et al.,
          <article-title>Bayesian deep convolutional networks with many channels are Gaussian processes</article-title>
          , in: ICLR,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref96">
        <mixed-citation>
          [96]
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lakshminarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Teh</surname>
          </string-name>
          ,
          <article-title>Bayesian deep ensembles via the neural tangent kernel</article-title>
          , in: NeurIPS,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref97">
        <mixed-citation>
          [97]
          <string-name>
            <given-names>B.</given-names>
            <surname>Paria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pòczos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ravikumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suggala</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Be</surname>
          </string-name>
          greedy -
          <article-title>- a simple algorithm for blackbox optimization using neural networks</article-title>
          ,
          <source>in: ICML Workshop on Adaptive Experimental Design and Active Learning in the Real World</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref98">
        <mixed-citation>
          [98]
          <string-name>
            <given-names>A.</given-names>
            <surname>Malinin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chervontsev</surname>
          </string-name>
          , I. Povilkov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gales</surname>
          </string-name>
          , Regression prior networks,
          <year>2020</year>
          . ArXiv:
          <year>2006</year>
          .11590v2.
        </mixed-citation>
      </ref>
      <ref id="ref99">
        <mixed-citation>
          [99]
          <string-name>
            <given-names>A.</given-names>
            <surname>Amini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Schwarting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soleimany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rus</surname>
          </string-name>
          ,
          <article-title>Deep evidential regression</article-title>
          , in: NeurIPS,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref100">
        <mixed-citation>
          [100]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sensoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kandmir</surname>
          </string-name>
          ,
          <article-title>Evidential deep learning to quantify classification uncertainty</article-title>
          , in: NeurIPS,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref101">
        <mixed-citation>
          [101]
          <string-name>
            <given-names>D.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <article-title>Improving evidential deep learning via multi-task learning</article-title>
          ,
          <source>in: AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref102">
        <mixed-citation>
          [102]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denoeux</surname>
          </string-name>
          ,
          <article-title>An evidential classifier based on Dempster-Shafer theory and deep learning</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>450</volume>
          (
          <year>2021</year>
          )
          <fpage>275</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref103">
        <mixed-citation>
          [103]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ulmer</surname>
          </string-name>
          ,
          <article-title>A survey on evidential deep learning for single-pass uncertainty estimation</article-title>
          ,
          <year>2021</year>
          . ArXiv:
          <volume>2110</volume>
          .
          <year>03051v2</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref104">
        <mixed-citation>
          [104]
          <string-name>
            <given-names>G.</given-names>
            <surname>Shafer</surname>
          </string-name>
          ,
          <source>A Mathematical Theory of Evidence</source>
          , Princeton University Press,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref105">
        <mixed-citation>
          [105]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Causal discovery based on neural network ensemble method</article-title>
          ,
          <source>Journal of Software</source>
          <volume>15</volume>
          (
          <year>2004</year>
          )
          <fpage>1479</fpage>
          -
          <lpage>1484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref106">
        <mixed-citation>
          [106]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>Wrapper approach for learning neural network ensemble by feature selection</article-title>
          ,
          <source>in: Advances in Neural Networks - ISNN 2005</source>
          , Springer, 202, pp.
          <fpage>526</fpage>
          -
          <lpage>531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref107">
        <mixed-citation>
          [107]
          <string-name>
            <given-names>D.</given-names>
            <surname>Partridge</surname>
          </string-name>
          ,
          <article-title>Network generalization diferences quantified</article-title>
          ,
          <source>Neural Networks</source>
          <volume>9</volume>
          (
          <year>1996</year>
          )
          <fpage>263</fpage>
          -
          <lpage>271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref108">
        <mixed-citation>
          [108]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>Observational learning algorithm for an ensemble of neural networks</article-title>
          ,
          <source>Pattern Analysis and Applications</source>
          <volume>5</volume>
          (
          <year>2002</year>
          )
          <fpage>154</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref109">
        <mixed-citation>
          [109]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>An active learning approach for neural network ensemble</article-title>
          ,
          <source>Journal of Computer Research and Development</source>
          <volume>42</volume>
          (
          <year>2005</year>
          )
          <fpage>375</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref110">
        <mixed-citation>
          [110]
          <string-name>
            <given-names>M.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murase</surname>
          </string-name>
          ,
          <article-title>A constructive algorithm for training cooperative neural network ensembles</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          <volume>14</volume>
          (
          <year>2003</year>
          )
          <fpage>820</fpage>
          -
          <lpage>834</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref111">
        <mixed-citation>
          [111]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alhamdoosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Fast decorrelated neural network ensembles with random weights</article-title>
          ,
          <source>Information Sciences 264</source>
          (
          <year>2014</year>
          )
          <fpage>104</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref112">
        <mixed-citation>
          [112]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <article-title>A novel decorrelated neural network ensemble algorithm for face recognition</article-title>
          ,
          <source>Knowledge Based Systems</source>
          <volume>89</volume>
          (
          <year>2015</year>
          )
          <fpage>541</fpage>
          -
          <lpage>552</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref113">
        <mixed-citation>
          [113]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Feature selection based neural network ensemble method</article-title>
          ,
          <source>Journal of Fudan</source>
          University (Natural Sciences)
          <volume>43</volume>
          (
          <year>2004</year>
          )
          <fpage>685</fpage>
          -
          <lpage>688</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref114">
        <mixed-citation>
          [114]
          <string-name>
            <given-names>T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Zhang,</surname>
          </string-name>
          <article-title>Freeway incident detection based on Adaboost RBF neural network</article-title>
          ,
          <source>Computer Engineering and Applications</source>
          <volume>32</volume>
          (
          <year>2008</year>
          )
          <fpage>223</fpage>
          -
          <lpage>225</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref115">
        <mixed-citation>
          [115]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Chen, G. Song, T. Han,
          <article-title>AdaBoost based ensemble of neural networks in analog circuit fault diagnosis</article-title>
          ,
          <source>Chinese Journal of Scientific Instrument</source>
          <volume>4</volume>
          (
          <year>2010</year>
          )
          <fpage>851</fpage>
          -
          <lpage>856</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref116">
        <mixed-citation>
          [116]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ashukha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lyzhov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molchanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vetrov</surname>
          </string-name>
          ,
          <article-title>Pitfalls of in-domain uncertainty estimation and ensembling in deep learning</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref117">
        <mixed-citation>
          [117]
          <string-name>
            <given-names>P.</given-names>
            <surname>McDermott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wikle</surname>
          </string-name>
          ,
          <article-title>Deep echo state networks with uncertainty quantification for spatiotemporal forecasting</article-title>
          ,
          <source>Environmetrics</source>
          <volume>30</volume>
          (
          <year>2019</year>
          )
          <article-title>e2553 (paper no.).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref118">
        <mixed-citation>
          [118]
          <string-name>
            <given-names>D.</given-names>
            <surname>Golovin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Solnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moitra</surname>
          </string-name>
          , G. Kochanski,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karro</surname>
          </string-name>
          , et al.,
          <article-title>Google vizier: A service for black-box optimization</article-title>
          ,
          <source>in: Knowledge Discovery and Data Mining</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1487</fpage>
          -
          <lpage>1496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref119">
        <mixed-citation>
          [119]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yosinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clune</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lipson</surname>
          </string-name>
          ,
          <article-title>How transferable are features in deep neural networks?</article-title>
          , in: NeurIPS,
          <year>2014</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref120">
        <mixed-citation>
          [120]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tzeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          ,
          <article-title>Simultaneous deep transfer across domains and tasks</article-title>
          , in: ICCV,
          <year>2015</year>
          , pp.
          <fpage>4068</fpage>
          -
          <lpage>4076</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref121">
        <mixed-citation>
          [121]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bousmalis</surname>
          </string-name>
          , G. Trigeorgis,
          <string-name>
            <given-names>N.</given-names>
            <surname>Silberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <article-title>Domain separation networks</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref122">
        <mixed-citation>
          [122]
          <string-name>
            <given-names>M.</given-names>
            <surname>Oquab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Laptev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sivic</surname>
          </string-name>
          ,
          <article-title>Learning and transferring mid-level image representations using convolutional neural networks</article-title>
          ,
          <source>in: IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1717</fpage>
          -
          <lpage>1724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref123">
        <mixed-citation>
          [123]
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <article-title>Deep transfer learning with joint adaptation networks</article-title>
          ,
          <source>in: ICML</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3470</fpage>
          -
          <lpage>3479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref124">
        <mixed-citation>
          [124]
          <string-name>
            <given-names>W.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Transfer learning for sequences via learning to collocate</article-title>
          , in: ICLR,
          <year>2019</year>
          , pp.
          <fpage>1487</fpage>
          -
          <lpage>1496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref125">
        <mixed-citation>
          [125]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , X. Cheng, P. Luo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Supervised representation learning: Transfer learning with deep autoencoders</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>4119</fpage>
          -
          <lpage>4125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref126">
        <mixed-citation>
          [126]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , X. Cheng, P. Luo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Supervised representation learning with double encoding-layer autoencoder for transfer learning</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>9</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref127">
        <mixed-citation>
          [127]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tzeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          , T. Darell,
          <article-title>Adversarial discriminative domain adaptation</article-title>
          ,
          <source>in: CVPR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref128">
        <mixed-citation>
          [128]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <article-title>Partial transfer learning with selective adversarial networks</article-title>
          ,
          <source>in: IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2724</fpage>
          -
          <lpage>2732</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref129">
        <mixed-citation>
          [129]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gon</surname>
          </string-name>
          , et al.,
          <article-title>A minimax game for instance based selective transfer learning</article-title>
          ,
          <source>in: KDD</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref130">
        <mixed-citation>
          [130]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Coupled generative adversarial networks</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref131">
        <mixed-citation>
          [131]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Stoyanova</surname>
          </string-name>
          , YanasGH/RAFs,
          <year>2023</year>
          . https://github.com/YanasGH/RAFs.
        </mixed-citation>
      </ref>
      <ref id="ref132">
        <mixed-citation>
          [132]
          <string-name>
            <surname>C. D. Archive</surname>
          </string-name>
          ,
          <article-title>Algorithm data sets for the bbob test suite</article-title>
          ,
          <year>2023</year>
          . https://numbbo.github.io/dataarchive/bbob/.
        </mixed-citation>
      </ref>
      <ref id="ref133">
        <mixed-citation>
          [133]
          <string-name>
            <given-names>S.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>An extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all pairwise comparisons</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>2677</fpage>
          -
          <lpage>2694</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref134">
        <mixed-citation>
          [134]
          <string-name>
            <given-names>A.</given-names>
            <surname>Benavoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Corani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mangili</surname>
          </string-name>
          ,
          <article-title>Should we really use post-hoc tests based on mean-ranks?</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>17</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref135">
        <mixed-citation>
          [135]
          <string-name>
            <given-names>J.</given-names>
            <surname>Demšar</surname>
          </string-name>
          ,
          <article-title>Statistical comparisons of classifiers over multiple data sets</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>7</volume>
          (
          <year>2006</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref136">
        <mixed-citation>
          <article-title>5. Multi-modal functions with weak global structure (Figure 8).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref137">
        <mixed-citation>
          • 20: Schwefel; • 21:
          <string-name>
            <surname>Gallagher's Gaussian</surname>
          </string-name>
          101-me peaks; • 22:
          <string-name>
            <surname>Gallagher's Gaussian</surname>
          </string-name>
          21-hi peaks; • 23: Katsuura; 1),
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>