<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modifications of PI and EI under Gaussian Noise Assumption in Current Optima</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Huabing Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Informatics, University of Edinburgh</institution>
          ,
          <addr-line>Edinburgh</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bayesian optimisation is a widely used tool for the hyper-parameter optimisations of black box functions. It implements a cheaper surrogate model such as Gaussian processes (GPs) to model search space. Acquisition functions on the top of GPs such as Probability Improvement (PI) and Expected Improvement (EI) are used to query the distribution of loss at all unevaluated positions in order to find the best one in theory. Traditionally, both acquisition functions use current optima in computations directly, but GPs assume that observations are noise corrupted. In this work, we mathematically derive modify PI and EI under Gaussian noise assumption. Modified PI and EI are compared with original versions on benchmark functions. We show that modified versions converge faster in same number of iterations and can achieve better performance in complex loss functions with reduced iterations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bayesian Optimization</kwd>
        <kwd>Acquisition Functions</kwd>
        <kwd>Benchmark Functions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Machine learning has achieved phased success. However, almost all machine learning models need
to optimize hyper-parameters, such as Neural Networks, Topic Model and Random Forest. In practice,
tuning the hyper-parameters includes methods such as Grid Search, Random Search [
        <xref ref-type="bibr" rid="ref2">1</xref>
        ], and
Gradientbased Optimizations [
        <xref ref-type="bibr" rid="ref3">2</xref>
        ]. These mehthods are then designed in order to minimize empirical risk with
the desired efficiency or convergence speed. Bayesian Optimization [
        <xref ref-type="bibr" rid="ref1 ref4">3</xref>
        ] is acted as a probabilistic
approach that majorly implements Gaussian Process (GP) and utilizes its property of both prediction
and uncertainty measure to achieve derivative-free optimization. It can be used when the gradient of
function for optimizing is not accessible.
      </p>
      <p>
        For Bayesian optimization, J Snoek et al. summarizes the applications in the field of machine
learning, and numerical simulation shows that Bayesian optimization has the characteristics of high
efficiency and strong convergence [
        <xref ref-type="bibr" rid="ref5">4</xref>
        ]. Martin Pelikan further find that hierarchy can be used to reduce
problem complexity in black box optimization [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ]. K Swersky et al. extends multi-task Gaussian
processes to the framework of Bayesian optimization, and aims to transfer the knowledge gained from
previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently
[
        <xref ref-type="bibr" rid="ref7">6</xref>
        ]. J Snoek et al. further explores the use of neural networks as an alternative to GPs to model
distributions over functions [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ].
      </p>
      <p>In fact, the principle of Bayesian Optimization is like reinforcement learning, it updates the
modelling to hyper-parameters after each evaluation and then calculate the location for the next
evaluation. Acquisition functions are used to calculate the desirability of each unevaluated locations,
which also trades off between exploration and exploitation. Typical Acquisition functions are Upper
Confidence Bound (UCB), Probability of Improvement (PI) and Expected Improvement (EI). However,
the above acquisition functions does not fully take into account the deviation in the machine learning
data collection process, that is, the noise contained in the current optimal. Based on the assumption of
normal noise, we propose the corresponding modified version of PI and EI acquisitions, derive the
corresponding explicit equations, and through a large number of numerical simulations and
comparisons, the results indicates the feasibility of our proposed acquisition function.
Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Algorithms</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Gaussian Process</title>
      <p>
        Gaussian process [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ] can be considered as a proxy for a black-box function which enables
uncertainty quantification. Gaussian process (GP) is infinite-dimensional multivariate Gaussian
distribution. Covariance matrix of this distribution is defined by kernel functions k(⋅,⋅). Imagining 
forms a finite discretization of input space. Assuming the distribution has zero mean, prior draws  can
be simulated:
 j, Matern class of covariance function has the following definition:
      </p>
      <p>
        Statistical assumptions about the GP prior are represented in kernel functions. A commonly adopted
kernel function is Matern kernel [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ], where ν controls the smoothness of gaussian process. Let r =  i −
f| ∼  (0,  )
k(r) =
21−ν √2νr ν
Γ(ν)
( ℓ
) Kν ( ℓ
√2νr
)
      </p>
      <p>Acquisition functions take the mean and variance at each unevaluated point as input and compute a
value indicating how favorable it is to sample at this point. It trades off between exploitation and
exploration:
value returned by acquisition function at candidate points.</p>
      <p>•
•</p>
      <p>Exploitation: looking for locations that minimize the posterior mean μ( ).</p>
      <p>Exploration: looking for locations that maximize posterior variance σ2( )
Given the data C = [( 1, y1), ⋯ , ( t, yt)] observed, the next point  t+1 is chosen by the ranking the
Acquisition functions is defined as the expected utility u at the unevaluated location  :
 t+1 = arg min a( |C)</p>
      <p>a( |C) =  (u( , y)| , C)</p>
      <p>= ∫ u( , y)p(y| , C)dy
with positive parameters ν and length-scale ℓ, Gamma function Γ and modified Bessel function Kν.
Smooth GP kernels assumes that if x and x′ are close by, then f(x) and f(x′) have similar values.</p>
      <p>Given noise observations  1:n at  1:n where yi ∼  (fi, σy2). For a new point  n+1 , the joint
probability distribution is given by</p>
      <p>fn+1
( 1:n ) ∼  ( , [
 + σy2
 ( n+1,  1:n) k( n+1,  n+1)
 ( 1:n,  n+1) ])
where 
over
 (μ( n+1), σ2( n+1))with</p>
      <p>=  ( 1:n,  1:n). After applying the rule for conditional gaussians, we can gather the posterior
function
values
fn+1| 1:n ,
which
follows
a
univariate</p>
      <p>Gaussian
distribution
μ( n+1) =  ( 1:n,  n+1)( + σy2 )−1 1:n
σ2( n + 1) = k( n+1,  n+1)−  ( 1:n,  n+1)
( + σ  )−1 ( n+1,  1:n)
2
y</p>
      <p>GP regression estimates the probability distribution of function values on unevaluated points. For
each prediction location  ∗, mean μ( ∗) gives the best estimate of the function value, and variance
σ2( ∗</p>
      <p>)models the uncertainty at the point. Acquisition functions utilizes the computed distribution to
guide the search for the optimal function value.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Acquisition Functions</title>
      <p>()
()
()
()
()</p>
      <p>The probability p(y| , C) here is gathered from posterior distribution  (μ( ), σ2( )) calculated by</p>
      <sec id="sec-4-1">
        <title>GP regression.</title>
        <p>Probability Improvement (PI), Expected Improvement (EI) and Entropy Search employ different
utility functions. Other acquisition functions such as Upper confidence bound(UCB) directly invoke the
mean and variance instead.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>2.2.1. Probability Improvement</title>
      <p>We can understand utility function as a reward, when f( ) ≤ ỹ, a certain amount of value is
rewarded, here the reward is 1. According to the utility function above, the expected utility x can be
written as the normal commutative density function of ̃y−μ(x):
σ(x)
aPI( ) =  [u( )|C]
= ∫−̃y∞  f( )(μ( ), σ2( ))d f( )
= Φ (̃y−μ( ))PI( )</p>
      <p>σ( )</p>
      <p>PI only cares whether f( ) is greater than ỹ, but does not count the quantity of improvement. This
will result PI very likely to pick points near to previously sampled locations. As the searching trajectory
reach local minimum, it will be stuck here and hardly jump out. Therefore, PI only cares about
exploitation.</p>
    </sec>
    <sec id="sec-6">
      <title>2.2.2. Expected Improvement</title>
      <p>EI leverages better between exploration and exploitation. The amount of improvement with respect
to the recent global optima ỹ − f( )is taken into account. Utility function of EI is defined as:
u( ) = max(0, ỹ − f( ))
Therefore, the expression of expected utility can be derived:
aEI( ) =  [u( )|C]
= ∫−∞∞ max(0, ỹ− f( )) f( )(μ( ), σ2( ))d f( )
= ∫−̃y∞ (ỹ− f( )) f( )(μ( ), σ2( ))d f( )
= (ỹ− μ( ))Φ (̃y−μ( )) + σ( )ϕ (̃y−μ( ))</p>
      <p>σ( ) σ( )
where ϕ(⋅) is the probability density function. In order to get higher value, at the left side of equation,
we want to minimize μ( ); and at the right side, we want to maximize σ( ). A basic equation based
trade off between exploitation and exploration are achieved here.</p>
      <p>
        The trade off between exploration and exploitation can be adjusted by tunning a parameter ξ at the
deduction part(ỹ − μ( )− ξ). Larger ξ will favour exploration in early steps and exploitaion later does
not work well experimentally[
        <xref ref-type="bibr" rid="ref10">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>2.2.3. Modified Probability Improvement</title>
      <p>If evaluations are noise corrupted yi|fi ∼  (fi, σy2), the current loss optimum ỹ is not a reliable
sample. Instead of using the optimum directly, we consider to use the posterior distribution
 (μ( ̃), σ2( ̃)) at the current optimum. PI can be modified under the noise corrupted conditions in
order to increase the robustness at sampling process. Let
• k( ,  ) denotes the posterior variance σ2( )of an unevaluated point  computed from
gaussian process.
• k( ̃,  ̃) denotes the posterior variance σ2( ̃)of loss optimum .
• k( ,  ̃) denotes the posterior covariance between unevaluated point and loss optimum.
According to the rule of variance deduction of two dependent random variables:
()
()
()</p>
      <p>Var[X − Y] = Var[X] + Var[Y] − 2 × Cov[X, Y]</p>
      <sec id="sec-7-1">
        <title>Distribution of f( )− f( ̃) can be derived:</title>
        <p>f( )−f(̃)(μ( )− μ( ̃), k( ,  )+ k( ̃,  ̃)− 2k( ,  ̃))</p>
        <p>Utility function of Modified Probability Improvement (MPI) is rewritten as:</p>
        <p>0, if( )− f( ̃) &gt; 0
u( ) = {1, if( )− f( ̃) ≤ 0
function for MPI can be derived:</p>
        <p>Since the utility function only counts the improvement when f( )− f( ̃) ≤ 0, PI can be written as
the probability of f( )− f( ̃) ≤ 0. As if X ∼  (μ, σ2), then ℙ(X &lt; x) = Φ(x−μ). Cumulative density
σ
aMPI(x) = ℙ(f( )− f( ̃) ≤ 0)
= Φ (
= Φ (</p>
        <p>0−(μ( )−μ(̃)) )
√k( , )+k(̃,̃)−2k( ,̃)</p>
        <p>μ(̃)−μ( ) )
√k( , )+k(̃,̃)−2k( ,̃)</p>
        <p>Performance of modified versions of PI and EI are compared with the traditional PI and EI on 3
selected 2D benchmark functions. Variables including kernel functions, kernel parameters and position
of pre-samplings are controlled to be the same for each set of experiment. We will visualise sampling
position and global optima in search space, and current minimal loss at each iteration. Performance of
4 acquisition functions on each benchmark function will be discussed by sections.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>3.1. Testing on Spere Function</title>
      <p>
        Sphere function[
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] has 1 global minima. It is bowl-shaped, convex and unimodal. Sphere function
in d dimensions is:
      </p>
      <p>
        A lemma of expectation on max function applied on normal distributed random variables [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ] can
be directly employed to get the expression of MEI:
      </p>
      <p>u( ) = max(0, f( ̃)− f( ))
If s ∼  (μ, σ2)
 [max(0, s)]
= ∫0∞ s (S; μ, σ2)ds
= Φ (μ)μ + ϕ (μ)σ
σ σ</p>
      <p>We already know the mean and variance of normal distribution of f( )− f( ̃)from equation 11.
Mean of f( ̃)− f( ) is μ( ̃)− μ( ) , and the variance remines the same. Let ρ denotes
√k( ,  )+ k( ̃,  ̃)− 2k( ,  ̃), applying Lemma from equation 15:</p>
    </sec>
    <sec id="sec-9">
      <title>2.2.4. Modified Expected Improvement</title>
      <p>Same as MPI, ỹ is replaced by the posterior distribution at  ̃ in Modified Expected Improvement
(MEI). Expression of utility function is:
3. Experiments
aMEI
=  [u(x)|C]
= Φ(μ(̃)−μ( ))(μ( ̃)− μ( ))+ ϕ(μ(̃)−μ( ))ρ
ρ ρ
()
()
()
()
()
()
()
()</p>
      <p>Figure 1 shows the contour of this function. Sampling locations and loss in 45 iterations for 4
acuqisation functions are in table 1, where star points represents the global optima and blue points are
the sampling locations. We will compare acquisition functions in pair. PI performed competitively in
the given environment, its sampling trajectory to the minima almost follows the gradient direction. After
it reaches the global minima, it only sample the locations close to it. MPI shows similar performance,
the difference is that it takes longer time (more iterations) to reach optima, and it will occationally jump
out and sample locations far from current minima. EI and MEI puts more leverage at exploration side.
Both of them will search globally before start to exploit near to loss optimum. Unlike EI, MEI converges
faster after it had sampled locations close to global minima, it does not frequently jump out and
searching locations far from current optima.</p>
    </sec>
    <sec id="sec-10">
      <title>3.2. Testing on Six-Hump Camel Function</title>
      <p>
        Six-hump camel function[
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] has 6 local minima, and 2 of them are global minima. Six-hump camel
function in 2 dimensions is defined as:
()
      </p>
    </sec>
    <sec id="sec-11">
      <title>3.3. Testing on Rastrigin Function</title>
      <p>
        Rastrigin function[
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] is a multimodal function, and its local minimas grid distribute through out the
search space. It only has 1 global minima at the center. Rastrigin function in d dimensional is defined
as:
f( ) = 10d + ∑id=1 [xi2 − 10cos(2πxi)]
()
      </p>
    </sec>
    <sec id="sec-12">
      <title>3.3.2. Testing on Rastrigin Function in 100 Iterations</title>
      <p>Table 7 shows the sampling locations and loss of 4 acquisition functions in 100 iterations. PI and
MPI can sample locations near to global optima, but only PI actually exploits at the optima. EI and MEI
exploits at several good local optimas close to global optima but did not exploit at global optima. All 4
acquisation functions do explore search space with a number of sampling locations. PI spends more
iterations to get a relatively small loss, both MPI and MEI converge faster than PI and EI.</p>
    </sec>
    <sec id="sec-13">
      <title>3.4. Experiment Summary</title>
      <p>In simple loss functions with only a small number of minimas, MPI performs better than PI, EI and
MEI. MEI is the worst one with much bigger loss and high standard deviations. In complicated loss
functions with insufficient iterations, MEI and MPI is better than EI and PI. With sufficient iterations,
EI is better than other acquisition functions, and MEI is the worst. In most of the conditions, loss of
MPI and MEI converge faster than PI and EI.</p>
    </sec>
    <sec id="sec-14">
      <title>4. Conclusions</title>
      <p>This paper discusses the acquisition function in Bayesian Optimization in machine learning
applications. Based on the traditional acquisition function, the systematic noise between the observation
data and the ground truth is not fully considered. When the noise satisfies the Gaussian distribution
assumption, we propose modified acquisition functions for EI and PI respectively. In addition, we
believe that the following perspectives can be used as future work:
• When the number of iterations increases beyond a threshold, we should consider using a more
complex hypothesis space to construct the prediction of unknown points, such as Gaussian mixture
distribution or depth neural network with complex structure.
• When calculating the collection function of a point, the information of nearby points should
be weighed at the same time, which can be realized by an algorithm similar to random forest, in
which the nearby points will be assigned to a leaf node.
• When the data contains non Gaussian noise, acquisition functions should be constructed
correspondingly to achieve better balancing the exploration and exploitation, so as to improve the
optimization efficiency.</p>
    </sec>
    <sec id="sec-15">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>3.3.1. Testing on Rastrigin Function in 45 Iterations</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , “
          <article-title>Random search for hyper-parameter optimization</article-title>
          .,
          <source>” Journal of machine learning research</source>
          , vol.
          <volume>13</volume>
          , no.
          <issue>2</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , “
          <article-title>Gradient-based optimization of hyperparameters,” Neural computation</article-title>
          , vol.
          <volume>12</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>1889</fpage>
          -
          <lpage>1900</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pelikan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cantú-Paz</surname>
          </string-name>
          , et al., “Boa:
          <article-title>The bayesian optimization algorithm</article-title>
          ,”
          <source>in Proceedings of the genetic and evolutionary computation conference GECCO-99</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>525</fpage>
          -
          <lpage>532</lpage>
          , Citeseer,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Adams</surname>
          </string-name>
          , “
          <article-title>Practical bayesian optimization of machine learning algorithms</article-title>
          ,
          <source>” Advances in neural information processing systems</source>
          , vol.
          <volume>25</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pelikan</surname>
          </string-name>
          , “
          <article-title>Hierarchical bayesian optimization algorithm,” in Hierarchical Bayesian optimization algorithm</article-title>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>129</lpage>
          , Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <article-title>“Multi-task bayesian optimization</article-title>
          ,”
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Rippel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Satish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sundaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Patwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Prabhat</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Adams</surname>
          </string-name>
          , “
          <article-title>Scalable bayesian optimization using deep neural networks</article-title>
          ,
          <source>” in International conference on machine learning</source>
          , pp.
          <fpage>2171</fpage>
          -
          <lpage>2180</lpage>
          , PMLR,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          and
          <string-name>
            <surname>C. K. I. Williams</surname>
          </string-name>
          ,
          <article-title>Gaussian process for Machine Learning</article-title>
          . The MIT Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brochu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Cora</surname>
          </string-name>
          , and N. De Freitas, “
          <article-title>A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning</article-title>
          ,
          <source>” arXiv preprint arXiv:1012.2599</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nadarajah</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Kotz</surname>
          </string-name>
          , “
          <article-title>Exact distribution of the max/min of two gaussian random variables,” IEEE Transactions on very large scale integration (VLSI) systems</article-title>
          , vol.
          <volume>16</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Molga</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Smutnicki</surname>
          </string-name>
          , “
          <article-title>Test functions for optimization needs</article-title>
          .” https://robertmarks.org/Classes/ENGR5358/Papers/functions.pdf year=
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>