<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CoCoMaMa: Contextual Combinatorial Multi-Armed Bandit Router for Multi-Agent Systems with Volatile Arms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jonathan Rau</string-name>
          <email>j.rau.1@tu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Bader</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Wiesner</string-name>
          <email>wiesner@tu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Odej Kao</string-name>
          <email>odej.kao@tu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universität Berlin</institution>
          ,
          <addr-line>Straße des 17. Juni 135, 10623 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Agentic Large Language Models (LLMs) are designed for specialized objectives using fine-tuning, prompting techniques, and tool calling to outperform general-purpose models in their expert domains. Standardization eforts like the Agent2Agent Protocol could drastically increase the number and heterogeneity of experts available via the Web. A router is required to find the best agent for any given task. However, existing LLM routing methods use a fixed-sized pool of models and often rely on ofline training data such as benchmarks. We propose CoCoMaMa and Neural-CoCoMaMa, a combinatorial contextual volatile multi-armed bandit approach that leverages similarities between tasks and agents by learning on online feedback. It can handle volatile arms by incorporating agent cards as defined by the Agent2Agent Protocol without requiring changes to the internal structures or retraining. Our experimental evaluation shows that CoCoMaMa and Neural-CoCoMaMa achieve better results than respective state-of-the-art algorithms using the LLM routing dataset SPROUT and a novel extended version of SPROUT with synthetic specialized agents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multi-Agent Systems</kwd>
        <kwd>Multi-Armed Bandit</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Agent routing</kwd>
        <kwd>Online learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        options are usually referred to as sleeping [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or volatile bandits [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Applying such MAB algorithms
to the routing problem in hMAS has not been examined to the best of our knowledge.
      </p>
      <p>In this paper, we propose using Contextual Combinatorial MAB with volatile arms to route tasks to
agents based on their agent card, theoretically enabling an infinite number of volatile agents to enter
and leave the pool without retraining a router.</p>
      <p>
        Contributions. This paper makes the following contributions:
• We present CoCoMaMa, a novel MAB approach that learns from online feedback and eficiently
explores and exploits similarities between tasks and agents in high-dimensional context spaces
by adaptively discretizing the context space following statistically informed decisions.
• We propose Neural-CoCoMaMa, which improves the CoCoMaMa method by leveraging the
benefits of neural networks while maintaining exploration behavior.
• We evaluate our approaches using the LLM routing dataset SPROUT [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and a novel setup with
synthetic specialized agents and compare them to three state-of-the-art contextual combinatorial
volatile MAB algorithms.
      </p>
      <p>• We provide an open-source implementation of our CoCoMaMa methods 1.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        We outline related work focusing on LLM ensemble methods. Chen et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] provide a classification
with three groups: route to an expert before the inference step, combine multiple models during
inference within the model architecture, and combine the results of diferent models after inference.
Treating Web Agents as black boxes rules out the possibility of ensemble methods applied during
inference. Thus, that path is neglected in the remainder of this work.
      </p>
      <p>
        Before inference: Shnitzer et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] propose repurposing benchmark datasets to learn router models
for LLM selection by training a classifier for each candidate LLM. Many similar approaches are proposed
to route to an expert from a fixed size of candidate models [
        <xref ref-type="bibr" rid="ref22 ref23 ref24 ref25 ref26 ref27 ref28">22, 23, 24, 25, 26, 27, 28</xref>
        ], while some of
them also aim to balance cost and performance. Online algorithms could also be used to train the
router. Sikeridis et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] propose using reinforcement learning to train a router based on online
user or AI feedback [
        <xref ref-type="bibr" rid="ref30 ref31">30, 31</xref>
        ]. There is also recent work looking into various bandits for online LLM
routing [
        <xref ref-type="bibr" rid="ref19 ref25 ref32 ref33 ref34">32, 19, 25, 33, 34</xref>
        ]. Many of them are creating a task requirement vector using an embedding
model like [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] and formulate the routing problem as a contextual bandit [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        After inference: Cascading [
        <xref ref-type="bibr" rid="ref36 ref37 ref38">36, 37, 38</xref>
        ] can be used to escalate a task to a model with higher costs and
higher expected quality, in case the answer of the initial model does not meet quality requirements.
This requires feedback on the quality, which could be obtained by asking users. Though using Large
Reasoning Models as a Judge is also viable [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] with limitations [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Majority voting to select the best
answer is presented in Agent Forest [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. Regenerating an answer after querying and ranking multiple
agents was shown by Lv et al. [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]
      </p>
      <p>Contrary to the solutions above, our solution considers metadata from agent cards and is built with
high and volatile amounts of agents as routing targets in mind.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Formulation</title>
      <p>Consider a sequence of tasks indexed by time steps  ∈ {1, 2, . . . ,  }. A task is a natural-language
user query or intent that requires at least one agent response, but more responses do not hurt, e.g.,
weather retrieval or booking assistance. For each task , there is a set of available agents . Each agent
 ∈  has distinct capabilities described by its agent card . Agents may appear in multiple rounds,
but can only be selected once per round. To ensure distinguishability, all agents in a round must have
unique agent cards. If an agent’s capabilities or metadata change (e.g., through an update), it receives a</p>
      <sec id="sec-3-1">
        <title>1https://github.com/dos-group/CoCoMaMa</title>
        <p>new agent card and is treated as a distinct agent. However, similar capabilities yield similar embeddings,
allowing the router to transfer prior knowledge through semantic similarity in the context space.</p>
        <p>
          Both the task  and an agent card  can be mapped into a multi-dimensional context space. By
combining the context of task  with that of , we obtain the context for the arm ,. The true
expected performance of agent  on task , denoted  ,, is initially unknown. After the agent provides
an answer, a “judge” infers  , by assigning a continuous score in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. This feedback can come from
users or other evaluation methods [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <p>Because invoking and scoring agents typically incurs cost, we impose a fixed budget  that limits
how many agents may be selected at each round. Following MAB terminology, we call the subset of
chosen agents a super arm, denoted  ⊆  , with || = . The reward on task  is given by
() = max  ,,</p>
        <p>∈
reflecting the requester’s interest in only the best individual performance among the selected agents. The
regret at task  is then the diference between the maximum achievable reward, i.e., max∈  ,, and
the actual reward (). The router’s objective is to select each  in order to minimize the cumulative
regret over all  rounds, i.e.,</p>
        <p>min ∑︁[︁ max  , − ( )]︁.</p>
        <p>=1 ∈</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <p>The core idea of contextual, combinatorial, volatile MAB algorithms applied on hypermedia Multi-Agent
Systems (hMAS) is to continuously learn and refine the understanding of the conceptual requirements
per task and capabilities of each agent based on feedback. E.g., we might have observed that weather
agent A provided a good result for the task "What is the weather going to be like in Bologna tomorrow?".
Then, the weather agent A might also perform well on the task "What is the weather going to be like
in Rome tomorrow?", because the tasks are very similar to each other. Later, we observe that weather
agent A performs badly on the task "What is the weather going to be like in Berlin?", but we also
tried weather agent B and it provides a good result. Conclusively, a router might learn that requests
for weather information in Italy should be routed to weather agent A, and requests for locations in</p>
      <sec id="sec-4-1">
        <title>Germany should be routed to weather agent B. Therefore, we need to extract features describing each task and agent that allow us to exploit semantic similarities. This is described in 4.1. Next, algorithms that are capable of exploring the capabilities of agents and exploiting good task-agent assignments are covered in 4.2.</title>
        <p>
          4.1. Constructing the Context Space
To apply contextual bandit algorithms efectively, both the task and the agent must be mapped and
combined into suitable feature vectors that semantically describe how a specific task is assigned to a
specific agent. Using pre-trained Sentence Transformers [
          <xref ref-type="bibr" rid="ref35 ref41">35, 41</xref>
          ] to produce compact embeddings out
of a task is an established practice in LLM-routing (e.g. [
          <xref ref-type="bibr" rid="ref22 ref24">24, 22</xref>
          ]).
        </p>
        <p>We propose creating feature vectors for the agent cards using the same method. This yields a pair of
vectors for each task-agent combination, where semantically similar tasks and agent cards have similar
embeddings, e.g., a high cosine similarity. The two embedding vectors are concatenated to form the
unified context , for the task-arm pair. This preserves all the available information, contrary to
adding or multiplying the vectors or applying similarity metrics such as the Euclidean distance.
4.2. The Contextual, Combinatorial, Volatile Multi-Armed Bandits
Three state-of-the-art algorithms that support the contextual, combinatorial and volatile setting were
identified and are presented briefly. This is followed by the introduction of our CoCoMaMa and</p>
      </sec>
      <sec id="sec-4-2">
        <title>Neural-CoCoMaMa algorithms.</title>
        <p>
          4.2.1. CC-MAB
4.2.2. ACC-UCB
Chen et al. [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] propose splitting the context space into evenly sized non-overlapping regions and
balancing exploration of unknown regions and exploitation of known regions with high expected
reward in their CC-MAB algorithm. Whenever an arm is played, statistics for the respective context
region are updated.
        </p>
        <p>
          Nika et al. [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] introduce the Adaptive Contextual Combinatorial Upper Confidence Bound
(ACCUCB) algorithm, which uses a tree-based approach to iteratively partition the context space into
non-overlapping regions of varying sizes using sets of hypercubes to define a region. A set containing a
single hypercube is split by creating non-overlapping sets of hypercubes with half the side length. E.g., a
context region containing a 2x2 chessboard could be split into sets based on rows, columns, or black and
white tiles. Using sets of hypercubes to define regions consumes many resources in high-dimensional
context spaces. E.g., using "all-MiniLM-L6-v2" [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] as an embedding model yields a 768-dimensional
context space, which would be split into 2768 hypercubes. Initializing that many objects is not feasible
on standard hardware (assuming 64GB memory as of 2025).
        </p>
        <p>
          We change the implementation of ACC-UCB by using hyperrectangles, defined by a center vector
and a length vector, to mark context regions. Nodes are split at the center along the dimension with the
highest length (random selection to break ties). We term that variant High-Dimensional-ACC-UCB
(HD-ACC-UCB) for the remainder of this work. It efectively just adds a small constraint to the core
concept of Nika et al. [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ]: splitting a region into "black and white tiles" is prohibited.
4.2.3. Neural-MAB
4.2.4. CoCoMaMa
Lin et al. [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] follow a greedy selection strategy in their Neural-MAB algorithm using two neural
networks with one hidden layer each to predict the reward of individual arms and the super arm.
We hypothesize that making statistically informed decisions on the split condition and the split location
can yield better results on high-dimensional context spaces. Especially in the field of hMAS, we expect
many heterogeneous tasks and versatile agents, which require many dimensions to capture their
nuances. Thus, we propose the CoCoMaMa Algorithm 1 as an improvement to HD-ACC-UCB. We
maintain for each leaf node ℎ, ∈ , the following metrics:
• ¯(ℎ,) ∈ R: running mean of the arms (i.e., the average context vector).
• ¯(ℎ,) ∈ R: running mean of the reward.
• Cov(ℎ,) ∈ R: running covariances between each context dimension and the reward.
• Var(( ℎ,)) ∈ R: running variance of the reward.
• (ℎ,) ∈ R: number of times the node has been played.
        </p>
        <p>
          • (ℎ,): parent node with all the associated metrics above at the state when it was split.
For each newly observed data point (︀ ,,  ,︀) , the statistics for the node can be updated using
Welford’s Algorithm [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ]. We introduce the combined confidence of a node and its parent node, defined
as (ℎ,, (ℎ,)) := √︁ (ℎ,2)+log((ℎ,) . The node index is defined as:
(ℎ,) := max
{︃¯(ℎ,) + (ℎ,, (ℎ,)),
¯((ℎ,)) + (ℎ,, (ℎ,))
(1)
A high variance of rewards for a node could indicate that splitting the node could yield a good and a
bad performing region. Therefore, it seems desirable to split the nodes with the highest potential based
on the variance. A node is split if the variance of rewards of a node is  times bigger than the weighted
average variance of rewards of all leaf nodes:
        </p>
        <p>·  ·


Var(( ℎ,)) &gt;
∑︁ () · Var (())
With  being defined as a hyperparameter. Frequent splitting could yield branches with relatively
high confidence values (ℎ,) for each individual node in a branch. Adding the following dampening
condition circumvents overcommitment of the algorithm to explore and split such branches:
(ℎ,) &gt; ((ℎ,))
* = arg max ⃒⃒ Cov(ℎ,)⃒⃒</p>
        <p>∈[]
We propose selecting the dimension * with the highest running absolute covariance between context
and reward:
(2)
(3)
(4)
and splitting the region at the running mean of the arms ¯(ℎ,) in the respective dimension * .
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:</p>
        <p>end if
end for
15: end for</p>
      </sec>
      <sec id="sec-4-3">
        <title>Algorithm 1 CoCoMaMa-Algorithm</title>
      </sec>
      <sec id="sec-4-4">
        <title>Require: budget , split parameters , ,</title>
        <p>2: for  = 1, 2, ...,  do
1: Initialize: ¯0, Var0, 0 = 0, 1 = {0,1}
1, root 0,1
Observe available agents  and the task 
Construct arm contexts , for each agent in</p>
      </sec>
      <sec id="sec-4-5">
        <title>Compute indices according to (1) for each arm</title>
        <p>Select arm Set  based on indices and budget 
Play arm Set  and observe rewards  ,
for node ℎ, ∈  do
Identify set of selected nodes</p>
        <p>Update metrics for node</p>
        <p>if ((2) and (3)) or ∞(ℎ,) ≤  1 ℎ then
+1 ← split at</p>
        <p>¯(ℎ,) on dimension * (4)
4.2.5. Neural-CoCoMaMa
^(,) + (ℎ,, (ℎ,)), where ℎ, corresponds to the node the arm is in.</p>
        <p>
          Instead of using ¯(ℎ,) in the calculation of the index in Equation 1, we propose predicting the
expected reward of an arm ^(,) using a neural net with a single hidden layer as in Neural-MAB [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ],
which learns each time an outcome is observed. The index is then defined per arm as (,) :=
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>
        We evaluate the outlined algorithms on two datasets. The first dataset has been introduced by Somerstep
et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and is enriched by agent cards in this work. It has a fixed size of general-purpose LLMs as
agents and highlights the ability of the algorithms to identify and exploit the better-performing agents
from a set of options with similar descriptions. For the second dataset, we add synthetic specialized
agents and derive the performance based on a mathematical definition of a task-agent fit and the
base-agent score from the first dataset [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The second experiment outlines the capabilities of the
algorithms to identify and exploit specialized agents.
5.1. Routing on SPROUT
      </p>
      <p>HD-ACC-UCB
CoCoMaMa (ours)
CC-MAB
Neural-CoCoMaMa (ours)
Neural-MAB</p>
      <p>Random
8
tt
o6
p
u
tr
e
g
rvee4
lit
a
u
m
u
C2
0
2.5
tto2.0
p
u
t
reg1.5
e
r
e
v
it
lau1.0
m
u
C
0.5
0.0
4
ttoup3
tr
e
g
e
r
ilte2
v
a
u
m
u1
C
0
1.75</p>
      <sec id="sec-5-1">
        <title>The SPROUT [19] dataset provides quality scores for answers from 13 diferent LLMs on over 40000</title>
        <p>queries from 6 benchmarks. Agent cards for each model were created based on public announcements
from the respective providers and are shown in Annex A. A random router and an oracle router, which
always greedily selects the agents with the highest true mean in each round, are used as naive baselines.
The HD-ACC-UCB, CC-MAB and Neural-MAB algorithms serve as the state-of-the-art baselines. The
experiments are conducted 10 times each with the same sequential ordering of tasks for the budgets
1,2,3,4. The decisions made by the algorithms are compared with the optimal solution made by the
oracle router. Making sub-optimal decisions yields regret and should be minimized.</p>
        <p>The plots in Figure 1 show that CC-MAB yields the same regret as the random router. CoCoMaMa
yields less cumulative regret than HD-ACC-UCB for all tested budgets. This supports our prior
hypothesis that making statistically informed splitting decisions can increase performance. Neural-CoCoMaMa
achieves even better results for all budgets, which could be attributed to a faster learning rate, as weights
on all input dimensions of the neural net can be updated after each observation, while splitting of
nodes only takes place under certain conditions. Furthermore, nodes are split on just one dimension.
Neural-CoCoMaMa matches the performance of Neural-MAB for a budget of 2 and 3, and only shows a
slightly higher regret for the other budgets.</p>
        <p>The agent selection rates in Figure 2 show that Neural-MAB does not spend significant amounts
to explore all agents and exploits the same 3 agents instead. This is indicated by the selection rates
close to 1.0, where the sum of all selection rates should sum up to the budget 3. Always picking the
t
n
e
g
A
claude-3-5-sonnet-v1
titan-text-premier-v1</p>
        <p>gpt-4o
gpt-4o-mini
granite-3-2b
granite-3-8b
llama-3-1-70b
llama-3-1-8b
llama-3-2-1b
llama-3-2-3b
llama-3-3-70b
llama-3-405b
mixtral-8x7b-v01
-ACC-UCCBoCoMaMa
(ours)
ral-CoCoM
Neu
aMa
(ours)</p>
        <p>Neu
ral-MAB
same 3 agents is not a bad strategy in this case, as each of them is among the best performing agents
in over 70% of the tasks. In 1% of the tasks, Mixtral is the unique best-performing model, but it is
never selected by Neural-MAB. All other algorithms follow a design where they are actively trying to
explore cases where other models might perform better than their known best educated guesses. Our
CoCoMaMa methods spend more efort on exploration than the greedy Neural-MAB, and provide a
sharper distinction between good and bad performing agents compared to HD-ACC-UCB.
5.2. Routing on SPROUT with Specialized Experts
The SPROUT dataset does not contain many queries, where a unique best-performing agent can be
identified and the average response quality of many models is high. This will most likely not be the
case for highly specialized WebAgents. Therefore, we are adding synthetic specialized agents to the</p>
      </sec>
      <sec id="sec-5-2">
        <title>SPROUT dataset.</title>
        <p>Let  denote the index of the task , where we begin adding new specialized agents. If  ≥  , a new
agent is added every  rounds to all following sets of available agents . If  &gt;  , the new agent is a
strong expert, and a weak expert otherwise. A new expert is always based on a random base agent by
copying their agent card embedding. The value at  random dimensions is set to 1 for strong experts,
and 0.9 for weak experts, to signal their specialization in certain areas. The possible expert dimensions
are limited to 50% of the used dimensions from the embeddings. The true reward of an agent doing
a task depends 80% on the task-agent fit and 20% on the base agent score. The task-agent fit  , is
computed based on matching the value  at the task embedding at the dimension with the highest value,
with the respective value  at the same dimension at the agent card embedding using the following
equation:
 , =
{︃(5 ·  · 
0
0
5000 10000 15000 20000 25000 30000</p>
        <p>Arriving task (t)
0</p>
        <p>, where  denotes the logistic function. Using 0.9 for weak experts and 100 should mimic the
behavior that innovative agents based on new technologies promising full integration are introduced
and advertised, but they have flaws due to being early adopters. The second generation overcomes those
issues. The base agent score is taken from the SPROUT dataset. It is multiplied by 0.1 for specialized
agents in case the task-agent fit is below 0.6. Reducing the base agent score for specialists should
mimic behavior, where an agent is tasked to answer "I don’t know" on questions outside their domain.
Duplicated agents in  are not permitted and the generation of a specialized agent is skipped for the
respective round.</p>
        <p>The experimental results using  = 2000,  = 200,  = 6000,  = 5 for expert generation are shown
in Figure 3 displaying the average reward. For both budgets  = 1 and  = 4, the optimal average
reward achieved by an oracle router increases continuously from below 0.2 to 0.45 after the strong
experts are introduced at  = 6000. Many algorithms yield decreasing average rewards after the weak
experts are introduced at  = 2000 and do not select strong experts for tasks in their respective domains
often enough to show an increase in average reward for a budget of 1. However, as the chance rises to
randomly pick a good task-agent match for a higher budget, all algorithms except Neural-MAB show
increasing average performance after  = 6000 for the budget 4. The curves for Neural-CoCoMaMa
show that it is resilient to the introduction of weak experts and is the best algorithm at exploring and
exploiting the strong experts.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The CoCoMaMa approach shows that statistically informed splitting of nodes can yield better results
than the respective baseline HD-ACC-UCB. However, the learning rate is limited as we only split
at one dimension each time. Neural-MAB is much faster at converging on a well-performing agent.
Though it might become trapped at a local optimum and never escapes it due to a lack of exploration.
Neural-CoCoMaMa combines eficient exploration and a fast learning rate. It outperforms all other
methods on SPROUT with synthetic agents. It matches the performance of the best-performing
stateof-the-art algorithm, Neural-MAB, on the basic SPROUT dataset, while ofering more explainability
regarding the routing decision. Before letting an agent execute a task, the expected performance of
the agent (in all neural-based approaches) and historical insights (in CoCoMaMa-based approaches
and also HD-ACC-UCB and CC-MAB to a limited extent) can be communicated to the client. E.g., the
CoCoMaMa approaches can provide historical variance, average performance and confidence scores
for the context region of the task-agent pair. This allows operators to escalate tasks to humans before
wasting resources on an agent when expected performance is low and highly variable.</p>
      <p>A current limitation is that playing a good task-agent match at least once is required to start learning.
Drastically decreasing the chance of finding a fit by increasing the amount of agents and making the
required specializations more granular (more embedding dimensions, lower  ) would require a lot of data
to train an eficient router. Providing better data in the agent cards may mitigate the problem. The agent
cards already contain an example request. This is useful for humans, but it only resembles very small
data points for a router. In the future, the agent developers could define entire context regions using
hyperrectangles and provide respective average performance, confidence, and variance. Furthermore,
diferent clients could share their aggregated reviews over the web. New federated learning approaches
may then be used at the router to overcome cold starts and data scarcity.</p>
      <p>
        Berners-Lee et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] stress the importance of ontologies and linked data for knowledge
representations in the Semantic Web. Ciortea et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] hint at designing structured query capabilities to search
for the matching agent in a web-scale hMAS containing billions of agents. The CoCoMaMa approach
does not make using structured queries obsolete. Instead, those approaches can go hand in hand, as
ifltering using a query could drastically narrow down the action space per task and CoCoMaMa can
then step in to decide on the best agent based on historical feedback. Future work could combine both
approaches and further improve the results in experiments similar to Section 5.2. Learning based on the
feedback could also uncover routing policies, which are not possible to identify using structured queries,
as the data in the agent card might not be that informative. This can be observed in the experiments in
      </p>
      <sec id="sec-6-1">
        <title>Section 5.1.</title>
        <p>
          Next, the A2A Protocol [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] does not specify how incurring costs should be communicated. Doing so
would allow the router to balance between costs and performance. Using monetary budget constraints
per task instead of budgets for the amounts of agents to play would open an interesting research area
for routing in hMAS.
        </p>
        <p>
          Lastly, the algorithms may be hardened on edge cases. E.g., an agent with underlying randomness
might yield rewards with a high variance, causing the router to split often without getting any
information gain. Variance-aware UCB Algorithms [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ] are an active research domain and may also
be incorporated to further improve the CoCoMaMa methods. Adversarial providers of agents might
attempt to craft their agent cards to benefit from high expected rewards in already well-established
domains to generate trafic for their mediocre services or badmouth competition. A good router should
be robust enough to recover from such attacks. Transferring the insights from trust scores and digital
signatures to agent routing in hMAS is also open future work.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>
        In this work, we introduced CoCoMaMa and Neural-CoCoMaMa, two contextual combinatorial volatile
multi-armed bandit approaches tailored for the dynamic and heterogeneous landscape of agentic LLMs.
By leveraging task-agent similarities and online feedback, our methods address key limitations in
existing routing strategies, particularly their reliance on static model pools and ofline training data.
Our approach is compatible with the A2A Protocol [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and accommodates agent volatility through
standardized agent cards, making it a promising fit for scalable, decentralized hMAS.
      </p>
      <p>
        Experimental results on the SPROUT [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] dataset demonstrated equal performance to the best
performing state-of-the-art method while improving the explainability of the routing decisions.
NeuralCoCoMaMa is the only method capable of exploring and exploiting strong niche experts without
sufering as much from the introduction of weak experts to the pool of available agents in our second
experimental dataset.
      </p>
      <p>Importantly, our approach complements rather than replaces structured search methods.
Combining CoCoMaMa with query-based filters could significantly reduce the action space, enabling
eficient feedback-driven selection within semantically scoped agent sets. Additionally, integrating
cost-awareness, variance-sensitive strategies, and trust mechanisms will be essential for deploying
robust routing in open, adversarial, or resource-constrained environments.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID
414984028 – SFB 1404 FONDA</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to: Grammar
and spelling check, paraphrase and reword. Further, the authors used perplexity.ai for agent cards in
Annex A in order to: Drafting content. After using these tools/services, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Agent Cards</title>
      <p>
        Three agent cards are shown below. They were built to resemble a typical agent card similar to the
example given in the A2A specification [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. They were not optimized to accomplish automated
toolcalling [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and may lack required information to make technically correct calls. All agent cards are
included as JSON files in the uploaded supplementary material and will be made publicly accessible
after acceptance.
      </p>
      <p>Listing 1: Claude 3.5 Sonnet Agent Card
},
"version": "3.5",
"documentationUrl":
"https://docs.aws.amazon.com/bedrock/latest/userguide/modelparameters-claude.html",
"capabilities": {
},
{
}</p>
      <p>Listing 2: GPT-4o Agent Card
"name": "GPT-4o",
"description": "OpenAI’s versatile, high-intelligence flagship model that accepts
both text and image inputs with a 128K context window",
"url": "https://platform.openai.com/docs/models/gpt-4o",
"provider": {
"organization": "OpenAI",
"url": "https://openai.com"
},
"version": "2024-11-20",
"documentationUrl": "https://platform.openai.com/docs/models/gpt-4o",
"capabilities": {
"streaming": true,
"pushNotifications": false,
"stateTransitionHistory": false
},
"defaultInputModes": ["text/plain", "image/png", "image/jpeg"],
"defaultOutputModes": ["text/plain", "application/json"],
},
{
},
{
},
{
}
"id": "image-understanding",
"name": "Image Understanding",
"description": "Analyze and interpret images to provide relevant information and
insights",
"tags": ["vision", "image-analysis", "multimodal"],
"examples": [
"What’s in this image?",
"Describe what you see in this chart",
"Help me understand what this diagram is showing"
],
"outputModes": ["application/json", "text/plain"]
"id": "reasoning",
"name": "Complex Reasoning",
"description": "Handle complex problem-solving tasks requiring multi-step
reasoning",
"tags": ["reasoning", "problem-solving", "analysis"],
"examples": [
"Solve this multi-step math problem",
"Help me debug this programming issue",
"Analyze the logical fallacies in this argument"
"name": "GPT-4o mini",</p>
      <p>Listing 3: GPT-4o mini Agent Card
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
},
"authentication": {</p>
      <p>"schemes": ["Bearer"]
},
"defaultInputModes": ["text/plain", "image/png", "image/jpeg"],
"defaultOutputModes": ["text/plain", "application/json"],
"skills": [
{
"id": "text-generation",
"name": "Text Generation",
"description": "Generate coherent and contextually relevant text based on input
prompts",
"tags": ["text-generation", "conversation", "content-creation"],
"examples": [
"Write a short story about time travel",
"Draft an email to a colleague about project updates",
"Create a product description for an e-commerce site"
"id": "image-understanding",
"name": "Image Understanding",
"description": "Analyze and interpret images to provide relevant information",
"tags": ["vision", "image-analysis", "multimodal"],
"examples": [
"What objects are in this image?",
"Describe what you see in this photo",
"What text is shown in this screenshot?"</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          , I. Shafran,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , React:
          <article-title>Synergizing reasoning and acting in language models</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Y. Cheng, J.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S. K. S.</given-names>
          </string-name>
          <string-name>
            <surname>Yau</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <article-title>Metagpt: Meta programming for multi-agent collaborative framework</article-title>
          ,
          <source>arXiv preprint arXiv:2308.00352 3</source>
          (
          <issue>2023</issue>
          )
          <article-title>6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. La</given-names>
            <surname>Malfa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shadbolt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <article-title>A scalable communication protocol for networks of large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2410.11905</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Treude</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <article-title>Llm-based multi-agent systems for software engineering: Literature review, vision and the road ahead</article-title>
          ,
          <source>ACM Transactions on Software Engineering and Methodology</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-M. Chan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
          </string-name>
          , et al.,
          <article-title>Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents</article-title>
          ,
          <source>arXiv preprint arXiv:2308.10848 2</source>
          (
          <issue>2023</issue>
          )
          <article-title>6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Foerster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clune</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <article-title>The ai scientist: Towards fully automated open-ended scientific discovery</article-title>
          ,
          <source>arXiv preprint arXiv:2408.06292</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          , Web semantic,
          <source>Scientific American</source>
          <volume>284</volume>
          (
          <year>2001</year>
          )
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mirhoseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Maziarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Outrageously large neural networks: The sparsely-gated mixture-of-experts layer</article-title>
          ,
          <source>arXiv preprint arXiv:1701.06538</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chang</surname>
          </string-name>
          , et al.,
          <article-title>A survey of ai agent protocols</article-title>
          ,
          <source>arXiv preprint arXiv:2504.16736</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Google</surname>
          </string-name>
          , A2A: Agent2Agent Protocol, https://github.com/google/A2A,
          <year>2025</year>
          . Accessed:
          <fpage>2025</fpage>
          -04-21.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Bao,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shiwei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Qing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <article-title>Tptu-v2: Boosting task planning and tool usage of large language model-based agents in real-world industry systems</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>385</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          , Gorilla:
          <article-title>Large language model connected with massive apis</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>37</volume>
          (
          <year>2024</year>
          )
          <fpage>126544</fpage>
          -
          <lpage>126565</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ciortea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gandon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Boissier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>A decade in hindsight: the missing bridge between multi-agent systems and the world wide web</article-title>
          ,
          <source>in: AAMAS 2019-18th International Conference on Autonomous Agents and Multiagent Systems</source>
          ,
          <year>2019</year>
          , p.
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Robbins</surname>
          </string-name>
          ,
          <article-title>Asymptotically eficient adaptive allocation rules</article-title>
          ,
          <source>Advances in applied mathematics 6</source>
          (
          <year>1985</year>
          )
          <fpage>4</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pál</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pál</surname>
          </string-name>
          ,
          <article-title>Contextual multi-armed bandits</article-title>
          ,
          <source>in: Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics</source>
          , JMLR Workshop and Conference Proceedings,
          <year>2010</year>
          , pp.
          <fpage>485</fpage>
          -
          <lpage>492</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <article-title>Combinatorial multi-armed bandit: General framework and applications</article-title>
          , in: International conference on machine learning,
          <source>PMLR</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niculescu-Mizil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Regret bounds for sleeping experts and bandits</article-title>
          ,
          <source>Machine learning 80</source>
          (
          <year>2010</year>
          )
          <fpage>245</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bnaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puzis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felner</surname>
          </string-name>
          ,
          <article-title>Volatile multi-armed bandits for guaranteed targeted social crawling</article-title>
          .,
          <string-name>
            <surname>AAAI (Late-Breaking Developments</surname>
          </string-name>
          )
          <volume>2</volume>
          (
          <year>2013</year>
          )
          <fpage>16</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Somerstep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Polo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. F. M. de Oliveira</surname>
            , P. Mangal,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Bhardwaj</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Yurochkin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Maity</surname>
          </string-name>
          ,
          <article-title>Carrot: A cost aware rate optimal router</article-title>
          ,
          <source>arXiv preprint arXiv:2502.03261</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Harnessing multiple large language models: A survey on llm ensemble</article-title>
          ,
          <source>arXiv preprint arXiv:2502.18036</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shnitzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Soule</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Solomon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yurochkin</surname>
          </string-name>
          ,
          <article-title>Llm routing with benchmark datasets</article-title>
          ,
          <source>in: NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jain</surname>
          </string-name>
          , T.-Y. Tung,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Kofman</surname>
          </string-name>
          , RoRF - Open Source LLM Router, https://www.notdiamond.ai/ blog/rorf,
          <year>2024</year>
          . Accessed:
          <fpage>2025</fpage>
          -03-25.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kwok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Routerdc:
          <article-title>Query-based router by dual contrastive learning for assembling large language models</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>37</volume>
          (
          <year>2024</year>
          )
          <fpage>66305</fpage>
          -
          <lpage>66328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Q. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bieker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Keigwin</surname>
          </string-name>
          , G. Ranganath,
          <string-name>
            <given-names>K.</given-names>
            <surname>Keutzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <article-title>Routerbench: A benchmark for multi-llm routing system</article-title>
          ,
          <source>arXiv preprint arXiv:2403.12031</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Llm bandit: Cost-eficient llm generation via preference-conditioned dynamic routing</article-title>
          ,
          <source>arXiv preprint arXiv:2502.02743</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , W. Cheng,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Mixllm:
          <article-title>Dynamic routing in mixed large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2502.18482</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Almahairi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Wu</surname>
          </string-name>
          , W.-L. Chiang,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Kadous</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Stoica</surname>
          </string-name>
          ,
          <article-title>Routellm: Learning to route llms from preference data</article-title>
          ,
          <source>in: The Thirteenth International Conference on Learning Representations</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Routing to the expert: Eficient rewardguided ensemble of large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2311.08692</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sikeridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramdass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pareek</surname>
          </string-name>
          , Pickllm:
          <article-title>Context-aware rl-assisted large language model routing</article-title>
          ,
          <source>arXiv preprint arXiv:2412.12170</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , W.-L. Chiang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Xing</surname>
          </string-name>
          , et al.,
          <article-title>Judging llm-as-a-judge with mt-bench and chatbot arena</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>46595</fpage>
          -
          <lpage>46623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ziems</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Eicher-Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Metoyer</surname>
          </string-name>
          ,
          <article-title>Limitations of the llm-as-a-judge approach for evaluating llm outputs in expert knowledge tasks</article-title>
          ,
          <source>in: Proceedings of the 30th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>952</fpage>
          -
          <lpage>966</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          , N. Liu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Kabb:
          <article-title>Knowledgeaware bayesian bandits for dynamic expert coordination in multi-agent systems</article-title>
          ,
          <source>arXiv preprint arXiv:2502.07350</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Convergence-aware online model selection with time-increasing bandits</article-title>
          ,
          <source>in: The Web Conference</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hoveyda</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. P. de Vries</surname>
            , M. de Rijke,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Oosterhuis</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hasibi</surname>
          </string-name>
          , Aqa:
          <article-title>Adaptive question answering in a society of llms via contextual multi-armed bandit</article-title>
          ,
          <source>arXiv preprint arXiv:2409.13447</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>N.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jitkrittum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Rawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Menon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Language model cascades: Token-level uncertainty and beyond</article-title>
          ,
          <source>arXiv preprint arXiv:2404.10136</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <article-title>Large language model cascades with mixture of thoughts representations for cost-eficient reasoning</article-title>
          ,
          <source>arXiv preprint arXiv:2310.03094</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Less is more: Using multiple llms for applications with lower costs</article-title>
          ,
          <source>in: Workshop on eficient systems for foundation models@ ICML2023</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <article-title>More agents is all you need</article-title>
          ,
          <source>arXiv preprint arXiv:2402.05120</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Urg: A unified ranking and generation method for ensembling language models</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics ACL</source>
          <year>2024</year>
          ,
          <year>2024</year>
          , pp.
          <fpage>4421</fpage>
          -
          <lpage>4434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <article-title>OpenAI, text-embedding-3 models</article-title>
          , https://platform.openai.com/docs/guides/embeddings,
          <year>2024</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-20.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Contextual combinatorial multi-armed bandits with volatile arms and submodular reward</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>31</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Elahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tekin</surname>
          </string-name>
          ,
          <article-title>Contextual combinatorial volatile multi-armed bandit with adaptive discretization</article-title>
          ,
          <source>in: International Conference on Artificial Intelligence and Statistics</source>
          , PMLR,
          <year>2020</year>
          , pp.
          <fpage>1486</fpage>
          -
          <lpage>1496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Y.</given-names>
            <surname>Noh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Joe-Wong</surname>
          </string-name>
          ,
          <article-title>A neural-based bandit approach to mobile crowdsourcing</article-title>
          ,
          <source>in: Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Ivliev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Shagraev</surname>
          </string-name>
          ,
          <article-title>Welford's algorithm for weighted statistics</article-title>
          ,
          <source>in: 2021 3rd International Youth Conference on Radio Electronics</source>
          , Electrical and Power Engineering (REEPE), IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>On the precise asymptotics and refined regret of the variance-aware ucb algorithm</article-title>
          ,
          <source>arXiv preprint arXiv:2412.08843</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <article-title>"name": "Claude 3.5 Sonnet", "description": "Anthropic's Claude 3.5 Sonnet model for advanced natural language understanding and generation", "url": "https://aws.amazon.com/bedrock/claude/", "provider": { "organization": "Anthropic via AWS", "url": "https://aws.amazon.com/bedrock/" "description": "A more efficient and cost-effective version of GPT-4o with similar capabilities", "url": "https://platform.openai.com/docs/models", "provider": { "organization": "OpenAI", "url": "https://openai.com" }, "version": "1.0.0", "capabilities": { "streaming": true, "pushNotifications": false, "stateTransitionHistory": false</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>