<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Methods in Pool-based Realtime Algorithm Configuration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jasmin Brandt</string-name>
          <email>jasmin.brandt@upb.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elias Schede</string-name>
          <email>elias.schede@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shivam Sharma</string-name>
          <email>sshivam95@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktor Bengs</string-name>
          <email>viktor.bengs@ifi.lmu.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eyke Hüllermeier</string-name>
          <email>eyke@ifi.lmu.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Tierney</string-name>
          <email>kevin.tierney@uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Algorithm Configuration, Pool-based Realtime Algorithm Configuration, multi-armed bandits</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Decision and Operation Technologies Group, Bielefeld University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Paderborn University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Informatics, LMU Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LWDA'23: Lernen</institution>
          ,
          <addr-line>Wissen, Daten, Analysen</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Munich Center for Machine Learning</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Realtime algorithm configuration is concerned with the task of designing a dynamic algorithm configurator that observes sequentially arriving problem instances of an algorithmic problem class for which it selects suitable algorithm configurations (e.g., minimal runtime) of a specific target algorithm. The Contextual Preselection under the Plackett-Luce (CPPL) algorithm maintains a pool of configurations from which a set of algorithm configurations is selected that are run in parallel on the current problem instance. It uses the well-known UCB selection strategy from the bandit literature, while the pool of configurations is updated over time via a racing mechanism. In this paper, we investigate whether the performance of CPPL can be further improved by using diferent bandit-based selection strategies as well as a ranking-based strategy to update the candidate pool. Our experimental results show that replacing these components can indeed improve performance again significantly.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Algorithm Configuration (AC) is the task of automated search for a high-quality configuration
of a given parameterized target algorithm. Such parameterized algorithms are often used to
solve computationally hard problems like Boolean satisfiability problems (SAT), constraint
satisfaction problems or vehicle routing problems. Finding good parameter configurations has
a significant influence on the performance of these algorithms regarding their runtimes and/or
their solution quality. However, searching for good configurations by hand is a complex or even
infeasible task that motivates the development of automated AC methods.</p>
      <p>The field of AC can be distinguished into two problem variants regarding the learning setting,
namely ofline learning and realtime algorithm configuration (RAC). The former corresponds
to the batch learning scenario in machine learning: One has access to (a batch of) data in the
CEUR
Workshop
Proceedings
performance can refer to runtime, solution quality, storage complexity, etc. The aim is thus to
ifnd a parameter configuration that generalizes across the whole problem instance distribution.</p>
      <p>In contrast, the RAC problem is an online learning problem: There is generally no access to
a data set in advance, and problem instances arrive sequentially for which parameters must
be chosen in real time. For example, imagine a logistics company that is constantly faced
with solving vehicle routing problems (VRPs), each of which requiring a high-quality solution
as quickly as possible. The goal is thus to choose the best parameter configuration for the
current problem instance in each time step. This is a challenging task as the nature of the
incoming problem instances, i.e., the distribution over the problem class, might change over
time (drastically or gradually). In the VRP example, the choice could depend, for instance, on
the current weather and trafic situation, both of which change over time.</p>
      <p>
        Thanks to modern, multi-core computer architectures, the computation of a suitable solution
can be done in parallel, i.e., one can apply several diferent parameter configurations to the
current problem instance simultaneously. In light of this, it makes sense to maintain a pool
of (temporarily) good parameter configurations that is updated from time to time to react to
possible changes in the problem distribution. Current algorithm configurators for the RAC
problem all use such an approach, but difer in terms of the key operations such as updating the
pool or choosing configurations from the pool (see Chapter 7 in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
      </p>
      <p>
        The most recent approach for the RAC problem in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] uses methodologies from the
multiarmed bandit literature for these key operations. More specifically, their approach is based
on the idea that once a parameter configuration has found a solution, the others running in
parallel can be stopped. This results in a preference observation in form of a top-1 ranking or
winner feedback. Accordingly, the question of how to select a good subset from the pool can
be viewed as a specific bandit problem, namely the preselection bandit [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this learning
scenario, the learner sequentially (pre-)selects a subset of objects (choice alternatives) for a
user, from which the user selects a single object that has a certain probability of being the most
preferred. The learner tries to find out the user’s preferences from the provided feedback, and its
main task corresponds to preselecting subsets that eventually lead to highly preferred choices
of the user. Since context information, e.g., about the user or the objects, is often available, the
preselection bandit setting has been extended in this respect in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Considering the parameter
configurations in the pool as the objects to be selected and the problem instances as the users,
the CPPL method in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] leverages the UCB strategy suggested in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to select configurations
from the pool, while discarding configurations from the pool based on the racing principle [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Even though the UCB strategy is a well-established family of selection strategies in the bandit
literature and the CPPL algorithm based on it also yields first promising results for the RAC
problem, the question is whether other selection strategies can improve performance. For
example, it has been empirically observed in various works that Bayesian selection strategies
or stochastic selection strategies in general can improve the performance in the respective
application scenarios, such as news article recommendation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], planning problems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or
online algorithm selection [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This motivates us to try one strategy at a time from the class of
Bayesian or stochastic selection strategies recently proposed in the bandit literature, once in
their context-dependent as well as context-free variants. In addition, we also investigate a new
variant of performing the pool update within CPPL, i.e., removing configurations from the pool
over time, which is based on the used ranking of objects in the respective selection strategies.
In our experimental study, we find that both strategies combined with the new variant of the
pool update can indeed improve the performance of CPPL.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Problem Formulation</title>
      <p>
        In the following, we define the algorithm configuration (AC) problem and adopt the notation in
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Let  be a parameterized algorithm, called the target algorithm in the following, and ℐ the
space of problem instances that  can solve. We denote the diferent internal parameters of 
as   each with domain Θ for  ∈ {1, … , }, such that Θ = Θ1 × ⋯ × Θ is the set of all feasible
parameter configurations for  called the configuration or search space. Running  on a specific
problem instance  ∈ ℐ with a specific parameter configuration  ∈ Θ results in a cost (, ) that
is specified by an unknown and possibly stochastic cost function  ∶ ℐ × Θ → ℝ . The costs can,
for example, correspond to the runtime of using  with configuration  on instance  , but also
to other metrics like the solution quality or the memory usage.
      </p>
      <p>In the realtime algorithm configuration (RAC) scenario the problem instances arrive
sequentially and the goal is to design a realtime algorithm configurator that chooses each time a
configuration that performs best in terms of low cost each time. Unlike the classical variant of
algorithm configuration, where a fixed but unknown distribution over the problem instances is
assumed, in the RAC setting the distribution might change over time.</p>
      <p>In the special case of pool-based RAC, the realtime algorithm configurator maintains a pool
of configurations  = { 1, … ,   } from which it selects then a subset of configurations that
can be run in parallel on the current problem instance. The size of the subset is, for instance,
determined by the number of cores of the available computing resources. Typically, the parallel
computing process is stopped as soon as an algorithm configuration of the selected subset
terminates. In this case, we get two types of information: Explicit numerical information,
namely the cost of the first-terminated configuration, and implicit information, namely that the
remaining configurations would have incurred higher costs. In the following, we shall focus on
runtime as the costs, since this is the natural choice when using such a racing strategy with
early stopping.</p>
    </sec>
    <sec id="sec-4">
      <title>3. The CPPL Framework</title>
      <p>In pool-based RAC the goal is to find an appropriate tradeof between (i) gathering more
information about the quality of the configurations that are contained in the pool and (ii)
simultaneously selecting configurations from the pool that performed well in the past to ensure
low costs on the recently seen problem instance. This motivates the use of multi-armed bandit
(MAB) algorithms that are designed to solve exactly this exploration-exploitation tradeof.</p>
      <p>
        However, the observed runtimes of the configurations are right-censored since we usually use
a timeout in the AC setting and in addition, we stop the run once the first configuration in the
selected subset finishes and thus cannot observe the exact runtimes of all other configurations.
These censored observations can introduce a bias in the estimated mean runtimes, see also [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To circumvent this problem, we can simply interpret the observations as a winner
feedback and use preference-based bandit methods to deal with this.
      </p>
      <sec id="sec-4-1">
        <title>3.1. Contextual Preselection Bandits</title>
        <p>The above problem can be reduced to a contextual preselection problem, a sequential online
decision-making problem. We are provided with context information X
 = (x,1 , … , x, ) in
each time step  ∈ ℕ , where each x, contains the contextual information for one of  diferent
alternatives called arms that we denote in the following by their indices [] ∶= {1, … , } . Using
this context, the learner has to select a subset   ⊂ [] of size |  | =  &lt;  in each time step  ∈ ℕ .
After the choice of the subset, the winner information among the selected arms is observed.</p>
        <p>We consider the contextual preselection problem under the Plackett-Luce (PL) model. In this
parameterized probability distribution on the set of all rankings over the arms, each arm  ∈ []
has an underlying utility or strength, represented by a parameter   ∈ ℝ. The probability under
a PL model for a ranking  ∶ [] → []</p>
        <p>is
ℙ( | ) =</p>
        <p>∏=1 ∑=   −1()</p>
        <p>.</p>
        <p>−1()
  =   (X) = exp(w⊤x ),
To take the context information x ∈ ℝ for each arm  ∈ [] into account, we replace the utility
for each arm  ∈ []</p>
        <p>with the following log-linear function
function for a weight vector w for the observation that arm  ̂∈  ⊂ []
where w ∈ ℝ is a fixed but unknown weight vector. We can easily derive the log-likelihood
wins with context
information X by
ℒ (w | , ̂, X) = w⊤x ̂− log (∑∈
exp(w⊤x )) .</p>
        <p>
          Note that the probability to observe a winner under a PL model is the same as selecting an
object out of a choice set in the well-known multinomial logit (MNL) model in discrete choice
models [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>The goal in our setting is to select, in each time step  ∈ ℕ , the subset   ⊂  of the recent
pool of parameter configurations that contains the best possible configuration for the currently
observed context X . In addition, we regularly update the pool  by discarding the worst
parameter configurations and fill up the newly available spots with new ones. In this way, we
aim to to fill the pool  with increasingly good configurations from which our preselection
strategy can select from in each time step. The overall goal is that the selected subset always
contains a configuration whose cost function is as low as possible.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. The Algorithmic Framework</title>
        <p>
          A state-of-the-art algorithm to solve the pool-based RAC problem is the Contextual
Preselection with Plackett-Luce (CPPL) algorithm [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. It was the first step in the use of multi-armed
bandit methods for RAC by successfully connecting RAC to the online contextual preselection
bandit setting. Experimental results have shown that CPPL can efectively compete with the
performance of other RAC methods like ReACTR [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The CPPL algorithm is shown in detail
in Algorithm 1 and follows the racing principle. First, a problem instance  ∈ ℐ is observed in
1: Initialize  random parameterizations  = { 1, … ,   } ⊂ Θ and initialize ŵ1 randomly
w, ), where x, = Φ( (  ),  ())
for all  = 1, … , 
Run parameterizations of   to solve , observe parameterization  ̂∈   terminating first
w, ← UPDATE_WEIGHTS() for each  ∈
        </p>
        <p>Generate | | new parameterizations using the genetic approach as described.
Algorithm 1 CPPL(Θ, , , ℐ ,  )
2: w1, = ŵ1 for each  ∈ 
3: for  = 1, 2, … do
line 4, and the contextual information is built with a joint feature map of the instance features
 () and the configuration features  (  ) for each   ∈  . In our setting, the joint feature map is
modeled with a second-degree polynomial consisting of all possible polynomial combinations
of the features with degree less than or equal to 2, i.e. for x ∈ ℝ and y ∈ ℝ we have
Φ(x, y) = (1,  1, … ,   ,  1, … ,   ,  12, … ,  2,  12, … ,   2,  1 1,  1 2, … ,     ).</p>
        <p>By means of the contextual preselection method, a subset from the pool is chosen in each time
step in line 6. Each configuration in this subset is then executed in parallel according to the
racing principle and the winner feedback is observed in line 7. After updating the estimation
of the weight vector w in line 8, the pool of configurations  is also updated in line 10 using
genetic engineering. Strictly speaking, a uniform crossover of the two configurations that are
ranked best by the model is executed. For diversification, single genes (parameters) are mutated
and some configurations are generated at random with low probability. All newly generated
configurations are ranked by the recently learned model and only the best individuals are kept
in the pool.</p>
        <p>
          The original version of CPPL uses an upper confidence bound (UCB) method for the contextual
preselection of the configurations. UCB implements the principle of optimism in the face of
uncertainty and solves the exploration-exploitation tradeof by constructing (context-dependent)
confidence intervals around the estimated (context-dependent) utilities and choosing the most
optimistic objects according to the intervals. More specifically, the ARM_SELECTION strategy
in line 6 to choose the subset   is
  ← argmax⊂, ||=
∑∈ ( ̂, +  , ),
where ̂, are the estimated utilities from line 5 for each configuration in  and  ,
=  ⋅  (, ,
x
are confidence bounds. Here,  &gt; 0 is a suitable constant and  is a function depending on the
gradient and the Hessian matrix of the log-likelihood function with respect to the observation
at time step  (see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for details).
        </p>
        <p>The estimator of the unknown weight vector is updated in UPDATE_WEIGHTS using the
Polyak-Ruppert averaged stochastic gradient descent method as follows.</p>
        <p>w+1, ←  w+1, + ŵ ++11, with ŵ +1, = ŵ, +  ( + 1) − ∇ℒ (ŵ , |   ̂,  , X ) for each  ∈   .
Moreover, DISCARD_SET in line 9 also depends on the confidence bounds. To be more precise,
all configurations that have a smaller upper confidence bound than the lower confidence bound
of another configuration in the pool are discarded from the pool  . Formally,
 ← {  ∈  | ∃  ≠   s.t. , ̂ +  , }.</p>
        <p>̂ −  , ≥ ,</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Modifying CPPL Components</title>
      <p>The above CPPL algorithm already works well in practice, and the question is whether one
can adjust the two crucial components, the selection strategy and the pool update, to achieve
improvements. To this end for the former, we consider several bandit-based selection strategies
that have been proposed recently. For the pool update, we use a ranking-based strategy to
update the candidate pool based on the internal rankings of the configurations that all selection
strategies maintain.</p>
      <sec id="sec-5-1">
        <title>4.1. Preselection Strategies</title>
        <p>In the following, we take a look at some recent multi-armed bandit algorithms as an alternative
to the UCB approach for the subset selection in CPPL.</p>
        <p>
          CoLSTIM A recently proposed method for regret minimization in a dueling bandit setting
with context information is the CoLSTIM algorithm [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. It assumes an underlying linear
stochastic transitivity model (LSTM) with contextualized utilities for the winning probabilities,
and even experimentally outperforms Bayesian methods. By means of a thresholded noise term
 , sampled from an underlying perturbation distribution  at time step  for each configuration
 ∈  , the skill of each configuration is computed as the estimated utility value and a perturbed
confidence width, generated by multiplying the estimated confidence bounds with the sampled
perturbation variable. Introducing such a perturbation variable allows us to consider the
accuracy of the imitated contextualized LSTM. Note that LSTMs are a general class of probability
models for a winner feedback scenario that also contain MNL as a special case for the problem
of choosing one object out of a choice set with the Gumbel distribution as the underlying
distribution for the perturbation variable.
        </p>
        <p>The detailed ARM_SELECTION strategy is shown in Algorithm 2. Note that we need a
bounding constant  ℎ to threshold the generated perturbation variable for technical reasons.
In our experiments, we choose a large value as a default for  ℎℎ . The estimator of the
underlying weight for each configuration is computed similarly as in the UCB approach, so we
can use an analogue update rule to UPDATE_WEIGHTS.</p>
        <p>w+1, ←  +1 ŵ+1, with ŵ +1, = ŵ, +  ( + 1) − ∑
w, + +1
  ∈  ∇ (ŵ , |   ̂,{  ̂,  }, X )
∀ ∈  
Algorithm 2 CoLSTIM ARM_SELECTION
1: Sample  ,̃ ∼ 
 , ← min { ℎ , max{− ℎ
,  ,̃ }} ∀ ∈ 
2: Choose   ← argmax⊂,||=
∑∈
( ̂, +  , ⋅
√ , M−1x
x
Algorithm 3 TS-MNL UPDATE_WEIGHTS
2: for  ∈   do
1: M+1 = M + ∑∈ 
x
,</p>
        <p>x</p>
        <p>,
3:
4:
5: end for</p>
        <p>Sample w+1, ∼  ( w+1, ,  2M−+11 )
and M+1 ← M + ∑
  ∈ 
(x, 
− x,  )(x,</p>
        <p>− x,  ) ,
w+1, ←  +1
w, + +1
ŵ+1,
where we initialize M1 = diag(1, … , 1). Note that we use another loss function  here to
incorporate the underlying contextual linear stochastic transitivity model. More specifically,
we assume that the probability that configuration  is preferred over configuration  for a
given comparison function  is defined as
ℙ(1[≻]
= 1|X) =  (  (X) −   (X)). That leads to
the following log-likelihood function for a weight vector w and an observation in the form of
(, {, },</p>
        <p>X)
(w | , {, },</p>
        <p>X) = 1[≻] log( ( w (x − x ))) + (1 − [1≻] )log( ( w (x − x ))).</p>
        <p>The features of both the problem instance and the configuration can also be ignored entirely
by considering a suitable dummy encoding for the context vectors. We will call this selection
strategy simply CoLSTIM-mo-f.</p>
        <p>TS-MNL</p>
        <p>
          The multinomial logit (MNL) bandit problem is a class of bandit problems related to
the preselection bandits, in which each object (arm) is equipped with a revenue and,
accordingly, the goal is to maximize the cumulative expected revenue. Nevertheless, the algorithmic
approaches for tackling the MNL bandits are quite similar to those in the preselection bandits.
The Thompson Sampling (TS) MAB algorithm takes a Bayesian perspective, by assuming a
prior loss distribution for every object (configuration), and in each time step selects an object
according to its probability of being optimal. Thus, the exploration-exploitation trade-of is
tackled through randomization coming from the posterior distribution. With this in mind,
we modify the TS-MNL algorithm with optimistic sampling proposed in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] so that it can be
applied to a preselection bandit problem. We define an
        </p>
        <sec id="sec-5-1-1">
          <title>ARM_SELECTION strategy in which</title>
          <p>we do not need confidence bounds anymore as 
 ← argmax⊂, ||=
∑∈
̂.
,
In the procedure UPDATE_WEIGHTS, the covariance matrix</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>M and the estimated mean of the</title>
          <p>weight vector w needs to be updated to learn the underlying multivariate normal distribution of
the weight vector. After that, we can simply sample the weight vector
w that is used to compute
the utilities ̂from the updated distribution. The procedure is given in detail in Algorithm 3.
Algorithm 4 ISS UPDATE_WEIGHTS
1:  +1, ←  , + 1,  +1, ←  , for winning configuration 
2:  +1, ←  , +  ,  +1, ←  , for all  ∈   ⧵ {}
3: Sample  +1, ∼ Beta( +1, + 1,  +1, + 1) for each  ∈</p>
          <p>
            In Algorithm 3,  &gt; 0 is a hyperparameter that needs to be chosen, for which smaller values
(i.e., ≤ 2) are preferred. This is due to its similarity to the hyperparameter of the Thompson
sampling approach in [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], where it has been empirically observed that smaller values of  tend
to lead to better results.
          </p>
          <p>
            ISS Independent Self Sparring (ISS) is another algorithm in the spirit of Thompson Sampling
that was proposed in [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] to solve the regret minimization task in a multi-dueling bandit
scenario. However, it is a context-independent algorithm that simply keeps a belief on the
pairwise winning probabilities of the underlying object (configurations), i.e., the probability
that one object will win against another. For this purpose, a Beta prior distribution is used, as
it is a conjugate prior and leads to a convenient update of the posterior distribution. In the
UPDATE_WEIGHTS procedure, the Beta distribution for each configuration in the pool is
updated as shown in Algorithm 4.
          </p>
          <p>In the ARM_SELECTION strategy, we do not consider the contextual information anymore
and simply use the samples from the Beta distribution as underlying utility. More specifically,
we have   ← argmax⊂, ||=</p>
          <p>∑∈  , .</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Pool Update</title>
        <p>Since we also compute confidence bounds in CoLSTIM, in principle we could also use the same
condition to determine the DISCARD_SET as in the original framework of CPPL. Formally,
this would look like
 ←</p>
        <p>{  ∈  | ∃  ≠   s.t. ̂, − √x, M−1x, ≥ ̂, + √x, M−1x, } .</p>
        <p>
          However, such a racing strategy is not possible for TS-MNL and ISS. As an alternative, we
can use a discard method based on rankings. In the ranking-based discard approach, a certain
percentage  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] of the configurations with the lowest rankings according to the estimated
utilities, as determined by the model, are discarded after each iteration. More specifically, the
DISCARD_SET in TS-MNL is selected as follows
 ←
argmin⊂̃ , | | =̃⌊⋅||⌋
∑∈  ̃ ̂,, resp. for ISS  ←
argmin⊂̃ , | | =̃⌊⋅||⌋
∑∈  ̃  , .
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Related Work</title>
      <p>
        The literature for ofline AC methods can be partitioned into model-based, non-model-based
and hybrid approaches. In non-model-based methods, racing is often used, in which a set of
configurations race against each other on the same problem instances and poorly performing
ones are iteratively eliminated. Methods using this include F-Race [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], its successor Iterated
F-Race [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and GGA [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. For a detailed overview over AC methods see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The field of RAC developed to address the case of AC where the set of problem instances
arrives sequentially. The ReACT algorithm [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and its successor ReACTR [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] run a set
of configurations in parallel on the instances to be solved. Both discard poorly performing
configurations and replace them by new ones, but ReACTR incorporates in addition a ranking
technique and a crossover mechanism to generate new configurations.
      </p>
      <p>
        In the (basic) MAB problem, we consider a set of  diferent alternatives (called arms) from
which one must be chosen in each time step. Next, a reward is observed for the pulled arm,
generated from a fixed but unknown reward distribution for each arm. The goal is to maximize
the reward or equivalently, to minimizing the regret that is defined as the diference between the
sum of the rewards that we observe when we always pull the best arm and the sum of rewards
we actually observe. Since MAB methods have already been successfully applied for algorithm
selection [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and hyperparameter optimization [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose to also use them for RAC. To
deal with the recently observed problem instances in the RAC problem, they consider a special
case of MABs, called Contextual Bandits [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Moreover, they select in each time step a whole
subset of arms respectively configurations which is covered by the preselection bandit scenario
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A combination of both the contextual bandits and the preselection bandits is considered in
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>6. Experimental Comparison</title>
      <p>We test the outlined bandit approaches on several synthetic datasets and target algorithms. We
address the following research questions: Q1: How sensitive are CoLSTIM and TS-MNL to
variations in their hyperparameter? Q2: How should configurations be discarded? Q3: Which
bandit method provides the smallest overall cumulative runtime?</p>
      <sec id="sec-7-1">
        <title>6.1. Experimental Setup</title>
        <p>
          Datasets and Solvers We synthetically create three datasets, one for the CVRP using the
CPLEX [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] solver, and two for SAT, using the Glucose [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] and Cadical [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] solvers. We seek
instances that are neither trivial nor impossible to solve. To accomplish this, we execute each
generated instance using the solvers’ default configuration and retain only those instances that
can be solved within a time range of 5 to 600 seconds. Each dataset contains a total of 1000
instances.
        </p>
        <p>
          We introduce structural drift into all of our datasets. For SAT, we use the SHA-1 generator [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ],
which encodes attacks on the cryptographic hash function SHA-1 as a SAT problem. In the case
of Glucose, we set the message-bits to 32 and the number of rounds to 22. To introduce drift, the
hash-bits are increased by 8, starting from 64, every 78 instances. For Cadical, the message-bits
are set to 0, and the number of rounds is set to 21. The hash-bits are increased by 5, starting
from 115, every 100 instances. Glucose in version 4.2.1 has 16 continuous, 8 categorical, and 7
binary parameters. Cadical version 1.5.2 contains 30 continuous and 15 binary parameters. We
use the SAT features listed in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
        <p>Cadical
Glucose
CPLEX
Mean
mechanisms using diferent values for  and  .</p>
        <p>
          We create CVRP instances as in [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. Customer locations are uniformly sampled at random
within the unit square, and the demand at each customer is sampled uniformly between 1 and
10. We find that instances with
        </p>
        <p>
          ≥15 customers are hard for the default configuration of CPLEX
to solve. To incorporate drift, we increment the vehicle capacity by one every 100 instances,
starting from 12. CPLEX version 22.1.1.0 is employed, which has 35 continuous, 6 binary, and
54 categorical parameters. Instances features are implemented as in [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ].
        </p>
        <p>
          Initialization We compare CoLSTIM [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], its feature free variant (CoLSTIM-wo-f), CPPL [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
ISS [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], TS-MNL, ReACTR [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and a random strategy against each other. For all methods, the
pool size  is set to 32 and the subset size  to 16. To ensure a fair and consistent evaluation, all
methods begin with an identical pool of configurations. To set the PCA dimensions used by
methods to reduce the features, we follow [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and use a value of 3 for the PCA dimension of
instance features and 5 for the PCA dimension of algorithm features. To calibrate the PCA we
use the features of the first 100 instances. For CoLSTIM we set  to be the standard Gumbel
distribution. For ISS, we adopt the recommended value of  = 3.5 . Similarly, for CPPL, we
use the values  = 0.2 ,  = 1 and  = 0.001 , as specified in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. As a baseline comparison, we
include a random strategy where  = 16 configurations are randomly selected from the pool in
each iteration. Finally, to control the execution time, a timeout of 300 seconds is set.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>6.2. Q1: How sensitive are CoLSTIM and TS-MNL to variations in their hyperparameter?</title>
        <p>The goal of this experiment is to identify appropriate hyperparameter values for CoLSTIM and
TS-MNL. To this end, we run experiments with diferent values for  and  for CoLSTIM and , 
and  for TS-MNL, where we use the first 200 instances of each dataset. The mean runtime is
shown in Table 1a and Table 1b for CoLSTIM. Due to limited space, we are unable to display the
results for TS-MNL 1. The mean runtime encompasses both the time taken to solve an instance
and the time needed for the model update after each solved instance. For CoLSTIM we achieve
the best average runtime across our datasets by setting  = 1 and  = 0.1 , although we note that
these are determined independently from each other. Consequently, these values are selected
for subsequent experiments for CoLSTIM. For TS-MNL we set  = 10 ,  = 0.001 and  = 0.0001 .
1Additional results and code are available under https://github.com/DOTBielefeld/colstim.</p>
        <p>Discard 0
CoLSTIM 57.89
CoLSTIM-wo-f 59.02
ISS 57.19
TS-MNL 56.15
CPPL 55.50
ReACTR 59.79
Random 77.52</p>
      </sec>
      <sec id="sec-7-3">
        <title>6.3. Q2: How should configurations be discarded?</title>
        <p>To determine a high-quality approach for discarding configurations from the pool, we conducted
similar experiments as before. Specifically, we compare the racing strategies of CPPL and
CoLSTIM, with the discard method based on rankings as described in section 4.2. Note that a
discard value of 0 maintains a fixed pool of configurations across all instances, while a value of
1 completely replenishes the pool after each instance. We further note that ReACTR, ISS and
TS-MNL use a ranking-based discard as a default strategy.</p>
        <p>Table 2 presents the results for diferent discard strategies obtained on CVRP15 with CPLEX.
Due to limited space, we are unable to display the results for the other two datasets here 2.
However, the results obtained from those datasets align with what is shown. Based on the
ifndings, we set CoLSTIM to discard 20 percent of the pool after each instance, for ISS and
ReACTR we set this value to 40 percent for the feature free versions of CoLSTIM and TS-MNL
we use a value of 10 percent. For CPPL, we use the confidence bounds-based discard strategy.</p>
      </sec>
      <sec id="sec-7-4">
        <title>6.4. Q3: Which bandit method provides the smallest cumulative runtime?</title>
        <p>To determine which method performs best over all, we test the methods in question using all
available instances of the respective datasets. Our experiments are performed on a 32 core
AMD EPYC Milan 7763 2.45 GHz processor. We note that due to computational constraints
we only perform one run per dataset and method. Throughout the figures, we present the
accumulated runtime, which includes both the time taken to solve each instance and the time
required for model updates. This approach reflects real-world scenarios where instances arrive
rapidly, requiring prompt model updates. In addition, we display the average runtime across all
instances and the number of timeouts.</p>
        <p>Figure 1 showcases our results across the three benchmarks. While ISS does not take instance
nor configuration features into account, it is able to outperform other methods that leverage
this information on two of the three test cases. The observation that instance and configuration
information may not boost performance in our set-up is strengthened when comparing CoLSTIM
to CoLSTIM-mo-f. CoLSTIM is only clearly better on the Glucose dataset, which happens to
be the most challenging test case among the three. This efect may very well be due to high
2We refer to the online appendix for additional results
35
s
nd30
o
c
e
s
d25
n
a
s
u
o
h20
it
n
e
im15
t
n
u
r
ilte10
v
a
u
m5
u
C
0</p>
        <p>CoLSTIM
ReACTR
ISS
CPPL
Random strategy
CoLSTIM-mo-f
TS-MNL
0
s150
d
n
o
c
se125
d
n
a
s
ou100
h
iite75
n
m
t
n
u
re50
v
litum25
a
u
C
0
0
200 400 600 800 1000</p>
        <p>Instance
0
200 400 600 800 1000</p>
        <p>Instance
0
information loss induced by the PCA. Among the tested methods, TS-MNL exhibits the weakest
performance. It is constantly outperformed by all other bandit-based methods and for the
Cadical and Glucose test cases even performs worse than the random strategy. Notably, all
methods except for TS-MNL efectively handle the dataset drift, as evidenced by the absence
of sudden spikes in the accumulated runtime at the drift points. The time taken to update the
model after each instance is similar over all discussed bandit methods and datasets.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion</title>
      <p>In this work, we established and compared diferent approaches for the contextualized
preselection in the CPPL framework to automatically search for good parameter configurations
in an online manner. Our experiments have shown that recent approaches like CoLSTIM or
in particular ISS outperform the preselection method based on upper confidence bounds. In
addition, our experiments show that features seem to be not necessary for a good performance.
However, these results could also be reasoned by an inappropriate choice of the feature map or
the PCA method for dimension reduction of the feature vectors.</p>
      <p>In future work, we can further improve the joint feature map Φ and the dimensionality reduction
of the features which was done with a simple PCA. Moreover, we can elaborate on how to create
new configurations for the pool. Instead of genetic approaches, we can think about a kind of a
continuity assumption in the space of configurations and do a guided search to find the optima.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the research training group “Dataninja” (Trustworthy AI
for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded
by the German federal state of North Rhine-Westphalia. The authors gratefully acknowledge
the funding of this project by computing time provided by the Paderborn Center for Parallel
Computing (PC2).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Schede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brandt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tornede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tierney</surname>
          </string-name>
          ,
          <article-title>A Survey of Methods for Automated Algorithm Configuration</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>75</volume>
          (
          <year>2022</year>
          )
          <fpage>425</fpage>
          -
          <lpage>487</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>El</surname>
          </string-name>
          Mesaoudi-Paul,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weiß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tierney</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pool-Based Realtime Algorithm Configuration: A Preselection Bandit</surname>
            <given-names>Approach</given-names>
          </string-name>
          , Springer International Publishing,
          <year>2020</year>
          , p.
          <fpage>216</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          , E. Hüllermeier, Preselection Bandits,
          <source>in: Proceedings of the 37th International Conference on Machine Learning</source>
          , volume
          <volume>119</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>778</fpage>
          -
          <lpage>787</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Mesaoudi-Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          , E. Hüllermeier,
          <article-title>Online Preselection with Context Information under the Plackett-Luce Model</article-title>
          , CoRR abs/
          <year>2002</year>
          .04275 (
          <year>2020</year>
          ). URL: https: //arxiv.org/abs/
          <year>2002</year>
          .04275.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Maron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Moore</surname>
          </string-name>
          , Hoefding Races:
          <article-title>Accelerating Model Selection Search for Classification and Function Approximation</article-title>
          ,
          <source>in: Proceedings of Advances in Neural Information Processing Systems (NIPS)</source>
          ,
          <year>1994</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Chapelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>An Empirical Evaluation of Thompson Sampling</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>24</volume>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Posterior sampling for Monte Carlo planning under uncertainty</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>48</volume>
          (
          <year>2018</year>
          )
          <fpage>4998</fpage>
          -
          <lpage>5018</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tornede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Hüllermeier,</surname>
          </string-name>
          <article-title>Machine Learning for Online Algorithm Selection under Censored Feedback</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>10370</fpage>
          -
          <lpage>10380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brandt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haddenhorst</surname>
          </string-name>
          , E. Hüllermeier,
          <article-title>Finding Optimal Arms in Nonstochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>35</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>20621</fpage>
          -
          <lpage>20634</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brandt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haddenhorst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tierney</surname>
          </string-name>
          ,
          <string-name>
            <surname>AC-Band</surname>
          </string-name>
          :
          <article-title>A Combinatorial Bandit-Based Approach to Algorithm Configuration</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>37</volume>
          (
          <year>2023</year>
          )
          <fpage>12355</fpage>
          -
          <lpage>12363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Train</surname>
          </string-name>
          ,
          <source>Discrete Choice Methods with Simulation</source>
          , 2 ed., Cambridge University Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fitzgerald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Malitsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>O'Sullivan, ReACTR: realtime Algorithm Configuration through Tournament Rankings</article-title>
          ,
          <source>in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence</source>
          , AAAI Press,
          <year>2015</year>
          , pp.
          <fpage>304</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bengs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saha</surname>
          </string-name>
          , E. Hüllermeier,
          <article-title>Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models</article-title>
          ,
          <source>in: International Conference on Machine Learning</source>
          , volume
          <volume>162</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1764</fpage>
          -
          <lpage>1786</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] M.
          <article-title>-h.</article-title>
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Iyengar, Thompson Sampling for Multinomial Logit Contextual Bandits</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yue</surname>
          </string-name>
          <article-title>, Multi-dueling Bandits with Dependent Arms</article-title>
          ,
          <source>in: Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence</source>
          , AUAI Press,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Birattari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Stützle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Paquete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Varrentrapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Racing</given-names>
            <surname>Algorithm for Configuring Metaheuristics</surname>
          </string-name>
          ,
          <year>2002</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Birattari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Balaprakash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Stützle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F-</given-names>
            <surname>Race and Iterated F-Race</surname>
          </string-name>
          : An Overview, Springer Berlin Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ansótegui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tierney</surname>
          </string-name>
          ,
          <article-title>A gender-based genetic algorithm for the automatic configuration of algorithms</article-title>
          , in: I. P. Gent (Ed.),
          <source>Principles and Practice of Constraint Programming - CP 2009</source>
          , Springer Berlin Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>142</fpage>
          -
          <lpage>157</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fitzgerald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Malitsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. O</given-names>
            <surname>'Sullivan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tierney</surname>
          </string-name>
          , React:
          <article-title>Real-Time Algorithm Configuration through Tournaments</article-title>
          ,
          <source>in: Proceedings of the Seventh Annual Symposium on Combinatorial Search</source>
          , AAAI Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tropper</surname>
          </string-name>
          ,
          <article-title>Optimizing time warp simulation with reinforcement learning techniques</article-title>
          ,
          <source>in: Proceedings Winter Simulation Conference</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>577</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jamieson</surname>
          </string-name>
          , G. DeSalvo,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rostamizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Talwalkar</surname>
          </string-name>
          , Hyperband:
          <string-name>
            <given-names>A Novel</given-names>
            <surname>Bandit-Based Approach</surname>
          </string-name>
          to Hyperparameter Optimization,
          <source>Journal of Machine Learning Research</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Dani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Kakade</surname>
          </string-name>
          ,
          <article-title>Stochastic Linear Optimization under Bandit Feedback</article-title>
          , in: Annual Conference Computational Learning Theory,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>IBM</surname>
          </string-name>
          ,
          <source>ILOG CPLEX optimization studio 22.1</source>
          .1:
          <string-name>
            <surname>User's manual</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://www.ibm. com/docs/en/icos/22.1.
          <article-title>1?topic=cplex-optimizers.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>G.</given-names>
            <surname>Audemard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <article-title>On the glucose sat solver</article-title>
          ,
          <source>International Journal on Artificial Intelligence Tools</source>
          <volume>27</volume>
          (
          <year>2018</year>
          )
          <fpage>1840001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Biere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fazekas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fleury</surname>
          </string-name>
          , M. Heisinger, CaDiCaL, Kissat, Paracooba,
          <source>Plingeling and Treengeling entering the SAT Competition</source>
          <year>2020</year>
          , in: T. Balyo,
          <string-name>
            <given-names>N.</given-names>
            <surname>Froleyks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heule</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Järvisalo</surname>
          </string-name>
          , M. Suda (Eds.),
          <source>Proc. of SAT Competition 2020 - Solver and Benchmark Descriptions</source>
          , volume
          <string-name>
            <surname>B-</surname>
          </string-name>
          <year>2020</year>
          -1 of Department of Computer Science Report Series B, University of Helsinki,
          <year>2020</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Nossum</surname>
          </string-name>
          ,
          <article-title>Instance Generator for Encoding Preimage, Second-Preimage, and Collision Attacks on SHA-1, Proceedings of the SAT competition (</article-title>
          <year>2013</year>
          )
          <fpage>119</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hoos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Leyton-Brown</surname>
          </string-name>
          , Features for sat, University of British Columbia„ Tech.
          <string-name>
            <surname>Rep</surname>
          </string-name>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kool</surname>
          </string-name>
          , H. van Hoof,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , Attention, Learn to Solve Routing Problems!, in: International Conference on Learning Representations,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rasku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kärkkäinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Musliu</surname>
          </string-name>
          ,
          <article-title>Feature extractors for describing vehicle routing problem instances</article-title>
          , in: OASICS,
          <string-name>
            <surname>Dagstuhl</surname>
            <given-names>Publishing</given-names>
          </string-name>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>