<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nitin Vetcha</string-name>
          <email>nitinvetcha@iisc.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dianbo Liu</string-name>
          <email>dianbo@nus.edu.sg</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computational and Data Sciences, Indian Institute of Science</institution>
          ,
          <addr-line>Bangalore, Karnataka</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dy namic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adapta tion. Traditional fine-tuning (FT) struggles to adapt to non stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To ad dress these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended au tonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidating a strong prior over common-sense knowledge making it efective for transfer-learning. By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling eficient test-time adaptation to unseen domains. Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory bufer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). Experiments demonstrate that SOLAR outperforms strong baselines on commonsense, mathemati cal, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Continual Adaptation</kwd>
        <kwd>Lifelong Learning</kwd>
        <kwd>Self-Evolution</kwd>
        <kwd>Test-Time Adaptation</kwd>
        <kwd>Transfer-Learning</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Large Language Models (LLMs) possess remarkable emer gent abilities due to massive pretraining.
However, deploy ing them in streaming environments reveals a critical weak ness which is the inability
to adapt to non-stationary data distributions (concept drift) without expensive retraining or human
intervention. While Parameter-Eficient Fine-Tuning (PEFT) techniques like LoRA (Hu et al. 20222)
reduce the parameter update volume, they remain static solutions that do not inherently address the
stability-plasticity dilemma central to Continual Learning (CL). Existing adaptation strategies often rely
on generic, hand crafted heuristics that fail to generalize across the shifting temporal dependencies of
real-world streams. This disconnect necessitates a system that can not only adapt parameters on the
lfybut also learn how to adapt based on accumulating experience. We propose that the high-dimensional
weight space of an LLMcontains rich meta-knowledge that, if navi gated autonomously, can yield
bespoke adaptation strategies for novel tasks. This motivates our primary research question:
RQ: Can LLMs learn to modify their internal representation space autonomously to handle
concept drift, analogous to how humans assimilate and restructure knowledge in lifelong learning
scenarios?
To answer this, we investigate the cognitive science of life long learning. As humans, we do not
merely memorize new data, instead we restructure our internal schematics in order to be able to
accommodate new information while simultaneously retaining prior heuristics. This process is what
has essentially enabled humans to navigate non-stationary environments. For instance, a student adapts
their study strategy based on the nature of a new subject (plasticity) without unlearning how to study
generally (stability). Current LLM adaptation, by contrast, is often rigid since models consume task
data “as-is”, failing to develop bespoke internal trans formation strategies. To replicate this cognitive
lfexibility, we introduce SOLAR (Self-Optimizing Lifelong Autonomous Reasoner). It functions as a
meta-learning agent that decouples rapid task adaptation (streaming machine learning) from long-term
strategy retention (continual learning). By discovering and validating parameter-level modifications,
SOLAR enables eficient adaptation to unseen tasks while populating a persistent knowledge base to
mitigate catastrophic forgetting. This work thus bridges the gap between static parameter generation
and dynamic, lifelong self-evolution. Further more, by grounding the search space in neural network
weights, we target generalized principles of model capability rather than task-specific memorization. Just
as scaling laws (Kaplan et al. 2020) predict performance based on size, we posit that predictable
weightmodification patterns exist that allow for rapid, data-eficient adaptation to concept drift, minimizing
the lag between detecting a distributional shift and deploying an updated model. The remainder of
this paper is organized as follows. In Sec tion 2, we highlight the motivation for our approach in
detail. Section 3 presents the literature survey conducted, Section 4 contains the methodology with
implementation specifics in Section 5. Experimental results are provided in Section 6 and in Section 7,
we present our concluding remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>Our primary motivation stems from human psychology and pedagogy. For example, consider a human
student who is preparing to take an end-sem examination of a machine learning course. Quite often,
students tend to rely on their prior prepared notes for preparation. These notes are often derived from
the lecture content, textbooks or information available on the internet. Thus instead of relying on the
raw content, students assimilate and rewrite the information in the form of notes as per their own
intrinsic reasoning skill and aptitude. This improves the capability of students to comprehend the
content better and therefore respond well to the exam questions. This phenomenon of reinterpreting
and augmenting external knowledge in a way that is easier to understand as well as developing the
necessary skill-sets is not limited to just taking exams, but seems to be universally true of human
learning across tasks. Furthermore, depending on one’s interests, humans assimilate information in
diferent ways - some might condense the information into a visual diagram, some into text, or some
might rely more on concrete mathematical descriptions. Such restructuring or development of internal
knowledge as well as assimilation or rewriting of external information, as part of the learning process
is in contrast with how LLMs undergo currently training and adaptation. Given a new task, current
LLMs consume and learn from the task data "as-is" via finetuning or in-context learning. The issue
with this, just like in the human setting, such data may not be in an optimal format (or volume) for
learning, or there might not be the relevant skill-set developed to learn it and current approaches do
not enable models to develop bespoke strategies for how to best transform themselves internally or
even learn from their training data. In this work, we therefore investigate the question as to if it is
possible for even LLMs, analogous to humans, to suggest strategies by themselves which can enable
them to perform better on a given task.</p>
      <p>
        A secondary source of motivation as to why we ground our strategy search space in the neural
network weights is because unlike task-specific knowledge, the weight-level meta-knowledge represents
generalized principles about how neural network parameters relate to model capabilities, thereby
providing crucial insights for self-evolving agents. There are several prior research works which have
shown that there exists a positive correlation between types of neural network weight patterns and
downstream model performance characteristics. For example, scaling laws research [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has demonstrated
that there are predictable relationships between model size and performance. Similarly, structured
sparsity learning gives an indication so as to how particular weight patterns can be useful for developing
more eficient representations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        Test-Time Training (TTT) is a recently emerging class of approaches which updates model weights
at inference time using techniques such as input perplexity or cross-entropy minimization on only
unlabeled test data enabling self supervised enhancement of LLM performance [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] or via reinforcement
learning by utilizing the priors in the pre-trained models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or by using reflection and verifier-driven
sample selection [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] or by using a task-specific curriculum [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or by using a mixture-of-expert based
model merging [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. An alternative approach is to scale inference compute at test time as well using
for example ensemble approaches such as majority voting. While test-time approaches is a promising
option, such a computational overhead might not be necessary always and it often fails in cases where
data is scarce or quality of unlabeled data is poor.
      </p>
      <p>
        Adversarial Fine Tuning is another emerging class of techniques where in two LLM instances are
made to debate with each other about a topic or one instance serves as a challenger or teacher and the
other instance serves as a solver or student to generate synthetic data, either from unlabeled prompts
or even from scratch itself and use approaches like majority voting to create pseudo-labels which can
further be used for updating model’s knowledge accordingly [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. This can also be done by some
additional fine-tuning using information which is available in the LLM’s context as well [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] similar to
knowledge distillation. Recent works include SQLM [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], R-Zero [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], TT-SI [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], SIRLC [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. While
this is an eficient approach in data scarce domains where TTT fails, it is not always eficient as there
are certain challenging domains which require mastering novel reasoning skills and it is well known
that scaling data isn’t suficient in this regimes such as mathematics [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Reinforcement Learning (RL) is a well established approach for pushing the capabilities of LLMs and
recent works such as SEAL [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], RLAIF [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], SRLM [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and Memento [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], which uses a memory-based
online RL policy have shown promising potential in the low-cost continual adaptation of LLMs. In RL,
meta-learning has been used as well in order train agents in scenarios where it needs to learn novel
tasks quickly [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. SOLAR can be seen as thus following meta-learning principles since it learns an
adaptation strategy i.e., how to generate efective self weight update using a meta optimization loop.
Closely, related are self-referential systems as well which learn to update their own parameters as in
[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and self-evolving agents which enable LLM to improvise by autonomously acquiring, refining
and learning from experiences generated by the model itself [
        <xref ref-type="bibr" rid="ref25 ref26">25, 26</xref>
        ]. While RL based approaches are
quite good, its often challenging to achieve convergence and design optimal policies which are eficient
in terms of compute and time as well.
      </p>
      <p>
        Parameter Generation is another research direction which has seen several pioneering works such as
RPG [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], DnD [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], T2L [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], ORAL [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], COND P-DIFF [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. DnD generates task-specific parameters
from unlabeled prompts without per-task training via a prompt-conditioned hyper-convolutional
decoder while T2L does the same but uses a hyper-network and task description instead. ORAL
leverages architectural and textual conditioning for flexible, scalable LoRA parameter adaptation. RPG
introduces a recurrent difusion architecture for scalable unconditional LoRA parameter generation.
COND P-DIFF applies conditional latent difusion for controllable LoRA parameter synthesis with
strong cross-domain generalization. An associated direction is model merging as well, which facilitates
generalization to unseen tasks via multi-task learning [
        <xref ref-type="bibr" rid="ref32 ref33">32, 33</xref>
        ]. While these works have been efective,
the limitation is that these are static parameters which once generated do not undergo any further
modification but this feature is crucial for domains requiring the implicit meta-knowledge.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        In this section, we describe the framework of our proposed approach (see Figure 1). SOLAR starts by
treating the LLMs own weights as environment variables to explore, upon which it would systematically
propose scientific hypotheses to modify the internal representation space appropriately so as to adapt
the LLM to the unseen task. A major challenge for the design, therefore, is the high dimensionality and
non-convexity of the LLM weight space itself which makes the initialization and subsequent exploration
process extremely complex. To overcome this, we work only with low-rank parameters [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] which
constitutes a much smaller fraction (∼ 1% ) of the original model’s weights. In addition, to avoid the
limitations arising from selecting a single starting point, which might not be optimal to wiggle around,
we prefer to sample from a plausible weight distribution space. This step is essential to eliminate the
risk of non-convergence. To get this initial distribution for weights i.e., self-weight sampling, we refer to
prior works in large-scale LLM parameter generation and use a convolution-based decoder architecture
as the backbone for SOLAR’s exploration point initializer.
      </p>
      <p>
        Once the weights have been initialized1 for exploration, SOLAR then uses a foundation-model-based
agent, which is for now simply an LLM trained using reinforcement learning (RL) to come up with
probable hypotheses at inference time for weight-space exploration using test-time scaling and compute.
To however, facilitate the training process, it is necessary to first curate by hand a seed knowledge base,
consisting of either proven or plausible weight modification strategies, which will then serve as the
action space for LLM’s initial stages of exploration during RL training. This would be a multi-stage
recipe consisting of three distinct progressively harder levels. Level I consists of training the LLM
to produce only single valid and eficient self-edits (a self-edit as the name suggests is basically a
modification strategy proposed by an LLM to update its own weights depending on the task) from
among the ones present in the initial knowledge base. Level II comprises of training the model to
output chain-of-self-edits, since coupling strategies sequentially is also helpful (moreover if viewed
in a abstract sense, it can be considered in efect as a single complex edit which can be decomposed
into simpler instances). Level III is a significantly challenging aspect both for the LLMs as well as from
implementation perspective as well, which is basically letting LLMs to explore the hypothesis space in
its entirety, thereby going beyond human-crafted approaches. A positive performance in Level III would
be a significant leap as it could possibly open up new frontiers in training and fine-tuning paradigms as
has been similarly done in other areas as well such as neural architecture search [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] and optimization
[
        <xref ref-type="bibr" rid="ref37">37</xref>
        ].
      </p>
      <p>
        After plausible hypothesis have been generated by the foundation model-based agent and implemented,
its necessary to test the hypothesis. For this purpose, we create a separate evaluation split if available.
However, since SOLAR is designed to adapt LLMs eficiently to unseen tasks as well, the dataset for
evaluation itself would be generated on the fly using adversarial approaches involving multiple instances
of an LLM, one proposing and one solving questions on a particular topic as in SQLM [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or R-Zero [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ].
Once the hypothesis has been tested and is found to be valid (as in it improves performance in some
predetermined metric such as accuracy on the eval set), it would be then added back into the knowledge
base, thereby enriching the action space of LLM for future iterations. In order to prevent catastrophic
forgetting, SOLAR implements a meta-level weight regularization technique as well. Therefore, by
automating the process of self-improvement using principled methodologies and meta-knowledge in a
scientific manner (i.e., propose, validate and accept hypotheses), SOLAR provides a holistic framework
towards the next generation of AI generating AI agents, because as soon as web-scale data corpora is
exhausted, progress will hinge on a model’s capacity to generate its own high-utility training signal.
1These weights can optionally be encoded into a structured representation correlated with network performance like world
models such as JEPA [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Implementation</title>
      <sec id="sec-5-1">
        <title>5.1. Architecture</title>
        <p>
          Primary architectural detail in SOLAR’s framework is the design of the weight-space exploration
initializer. As mentioned in Section 4, we use a convolution based decoder model for this purpose. We
assume that we have access to either the unseen task’s description or atleast a handful of unlabeled
examples representative of its requirements. We then send them to an open-sourced text encoder for
embedding extraction. This extraction process can be formally represented as,  = Encoder(, ),
where Encoder(·, ·) denotes the embedding extraction function parameterized by  , and  represents
the extracted embedding corresponding to prompt . We use an encoder-based language model
architecture for this purpose i.e., Sentence-BERT (all-MiniLM-L6-v2 specifically) [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] 2.
Next, following [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], is the parameter tokenization process (see Figure 2), which is done so as to
preserve both the layer-wise distribution and the cross-layer correlations. Specifically, (i) weights are
split according to their layer indices, (ii) layer-wise normalization is applied to mitigate distribution
shifts, (iii) parameters are sliced into non-overlapping tokens with uniform size, and (iv) a lightweight
permutation state (encoded as a one-hot vector) is used to alleviate symmetry issues [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ] when collecting
multiple checkpoints. Additionally, 2D position embeddings (first dimension encodes layer index, while
second dimension captures the token’s in-layer position). [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] are employed to ensure the network
retains positional awareness of each token within the entire set. In our case, each LoRA matrix is of
shape 8 × 896 , which is then split into 7 smaller chunks, each with a shape of 8 × 128 , which is then
ifnally padded to a uniform size of 10 × 130.
        </p>
        <p>
          Say, the dimension of prompt embeddings is [, , , ] where , ,  and  denote batch size,
length of prompt batch (i.e., number of prompts), sequence length, and hidden dimension, respectively.
The decoder (see Figure 3) consists of multiple sequential layers, each performing 5 2D convolutions.
These convolutions are divided into three categories: i) width convolution that operates on (, )
2It is to be noted that BERT’s supported sequence length is only 512 and for longer sequences, padding should be done.
However, in our use case, maximum sequence length is only 384 and thus padding is not necessary.
dimension, ii) height convolution that operates on (,  ) dimension) iii) layer-wise convolution
that on (, ) dimension) , with notations Conv , Conv , and Conv. Each layer consists of two
Conv , two Conv and one Conv. Given this, the forward operation of the decoder block is,
 = Conv1 (Conv1 (−1 ))
  = Conv2 (︁ Conv2 (−1 )︁)
 = Conv (︁ ( +   + )/3
︁)
where  is hidden state output by the  th layer, 0 is prompt embedding encoded by the condition
extractor, and  is learnable bias. Through this process, input is transformed from dimension [, , , ]
to [,  ′, ′, ′] which is then compatible to be converted into a flattened LoRA adapter for the LLM
3. In this work, the base LLM used is Qwen2.5-0.5B-Instruct [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] and LoRA is applied to the linear
projection layers within both the self-attention mechanism and the MLP blocks of the transformer
architecture. Specifically, this includes the query, key, value and output projections in attention blocks,
as well as the gate, up and down projections in MLP blocks.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Training</title>
        <p>
          In this work, we focus on the domain of common-sense reasoning and select 4 datasets for evaluation,
namely HellaSwag [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ], BoolQ [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] as well as the challenge and easy set of AI2 Reasoning Challenge
(ARC) [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ]. ARC dataset contains grade-school level, multiple-choice science questions. HellaSwag
instructs models to select from choices that best finish the sentence among ground truth and an
adversarial set of machine-generated wrong answers. BoolQ is a question answering dataset for yes/no
questions containing various factual problems. We use existing checkpoints of these datasets 4 (batch
size was 32 and number of samples was 5000) which have been collected by first pretraining on the
target dataset for 75 steps with a learning rate of 1e-4 and then performing fine-tuning on the target
dataset for 50 additional steps with a learning rate of 1e-5, while saving a checkpoint at each step.
3In our present implementation, the entire flow is (128,384,384)
→ (1024,25,200) → (1024,10,200) → (2048,10,200) → (4296,8,128)
4For training however, even Open-Book Question Answering or OBQA [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ], Physical Interaction: Question Answering or
PIQA [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ] and WinoGrande [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ] have been used as well. OBQA aims to promote research in advanced question-answering
with salient facts summarized as an open book. PIQA focuses on everyday situations with a preference for a typical solutions.
WinoGrande features a fill-in-a-blank task with binary options for commonsense reasoning questions.
        </p>
        <p>→ (128,200,300) → (128,100,256) → (256,50,200) → (512,50,200)
Subsequently, prompt-checkpoint pairing is done as follows. Given a dataset  , it is first divided it
into non-overlapping prompt batches [1, · · · ,  , · · · ,   ]. Denote the trained LLM checkpoints of
this dataset as  = [1, · · · ,   , · · · ,   ]. Then randomly a batch of prompts and a corresponding
checkpoint is picked to create a pair {,  }, which then serves as an input-output data point for
training the decoder. The objective function for training is the mean squared error (MSE) loss between
the output from the decoder’s last block for a particular prompt batch and the training checkpoint
associated with it.</p>
        <p>
          Next crucial step is the hand-crafting of seed knowledge base. To this end, we identify five primary
families of strategies5, each containing its own sub-strategies as well, namely
• Test-Time Training (TTT) using input perplexity minimization [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] or via reinforcement
learning [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for example by using self-reflection and verification loops like GEPA [
          <xref ref-type="bibr" rid="ref49">49</xref>
          ], ReflectEvo
[
          <xref ref-type="bibr" rid="ref50">50</xref>
          ], REVISE [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] or Instruct-of-Reflection [
          <xref ref-type="bibr" rid="ref51">51</xref>
          ]. It could also involve prompt optimization using
frameworks like TextGrad [
          <xref ref-type="bibr" rid="ref52">52</xref>
          ] or CAST [
          <xref ref-type="bibr" rid="ref53">53</xref>
          ]
• Post-training data-free LoRA modifications such as mixing LoRA subspaces obtained by weight
decomposition of constituent matrices [
          <xref ref-type="bibr" rid="ref54">54</xref>
          ] or bounding norm of selected parameters [
          <xref ref-type="bibr" rid="ref55">55</xref>
          ] or
evening merging multiple task-specific LoRA adapters [56]
• SQLM [57], R-Zero [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] or SEAL [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] like reinforcement learning based frameworks which
enable LLMs to self-adapt by generating their own finetuning data and update directives (another
example is TT-SI [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ])
• Test-Time Scaling (TTS) using either a router or an ensemble approach i.e., we generate and
perform inference with multiple adapters obtained by using diferent representative prompt batches
and to obtain the final prediction, select either the most confident prediction (max_confidence) or
by a majority vote or sum_logprobs (i.e., sum log probabilities across adapters per prediction and
pick the one with highest total logprob)
• Latent Space (LS) Approaches which aim at working or modifying the internal layers [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] or hidden
activations [58] directly of the LLM. It may also involve decoding algorithms which modify the
sampling procedure itself [59, 60]. We consider them as part of latent space family because they
5Unfortunately, there are no research works highlighting approaches for optimizing the performance of LoRA’s obtained via
the process of parameter generation, thereby posing a major challenge in identification of plausible strategies, which had to
be cherry-picked via trial and error.
        </p>
        <p>tamper with internal probability distribution of next-tokens unlike other families which modify
the parameters explicitly.</p>
        <p>
          We first formulate the objective for outer-loop RL training which generates adaptation strategies AS,
as in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Let  denote the parameters of the language model LM . In order to adapt to an unseen
dataset (task) , SOLAR requires as specified in Section 4,  which is a context containing information
relevant to the task and  which is the evaluation strategy and metric used to assess the model’s
downstream adaptation. Based on , SOLAR generates an AS and updates its parameters accordingly
 ′ ← Update(, AS) . We thus have an RL setup i.e., the model takes an action (generating AS), receives
a reward  based on LM ′ ’s performance on  and updates its policy to maximize expected reward,
ℒRL( ) := −E (,)∼
[︁
        </p>
        <p>EAS∼LM   (·|) [(AS, ,  )]
]︁
It is to be noted that the reward assigned to a given action depends on the model parameters  at the
time the action is taken (since  is updated to  ′, which is then evaluated). An implication of this is
that the while modeling the RL state, one must therefore include  in the policy’s parameters as well
along with , even though the policy’s observation is limited to  (because it is extremely infeasible to
directly place  in the LLM’s context window). Therefore, the (state, action, reward) triples which have
been collected by using an older model weights,  old, will not be aligned for the current model  current.
Hence, an on-policy approach should be adapted, by which adaptation strategies are sampled from and,
even more importantly, the rewards itself will be calculated using the current model.
In particular, the specific on-policy approach used is ReST EM [61] where samples are first generated 6
from the current model and are filtered by using binary feedback [ (AS, ,  ) is 1 if on  , AS improves
LM  ’s performance and is 0 otherwise]. The model is then fine-tuned on these samples and this
continues in an iterative manner (See Algorithm 1).</p>
        <p>
          A subtle detail, which hasn’t yet been covered is the exact nature of the adaptation strategy itself.
This depends on the particular strategy family being used, however the format is consistent across
all which is basically a JSON object specifying the particular configurations to be used 7. It contains
a field, family which takes values TTT, LoRA and TTS. Currently, the following choices have been
experimented
• For TTT, we use [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and the corresponding JSON object has fields ttl_steps (number of training
steps in the TTL loop), learning_rate, batch_size and shuffle_data (boolean variable).
• For LoRA modifications, we use two-subspace (TS) mixing version from [
          <xref ref-type="bibr" rid="ref54">54</xref>
          ] and the corresponding
JSON object has only a single field, namely lambda which is a hyperparameter determining the
ratio in which the two resulting subspaces must be mixed.
• For TTS, we use either an ensemble or router approach. In the router approach (see Figure 4),
we basically sample multiple prompt batches and choose that batch whose average of similarity
scores8 of individual prompts (M1) or averaged prompt embedding (M2), is closest to that of the
question at test time. The corresponding JSON object has fields num_prompt_batches
(indicating the number of prompt batches to be sampled from the test split of unseen dataset) and method
6Currently, only a deterministic number of samples are being generated, 15 to be precise. This could however be improvised
to be dynamic in future version of the work wherein samples would continue to be generated until a particular confidence
threshold, as determined by the model itself is reached instead. The same is true for number of iterations as well which is
just 2 for now.
7Since the model being used is Qwen2.5-0.5B-Instruct, it was facing dificulty in following instructions given in the prompt
for generation of structured outputs even after temperature alteration. In such cases, verification and formatting was done
by using Qwen2.5-7B-Instruct instead.
8Cosine similarity and Euclidean distance were tested and the latter was found to perform better empirically. Thus,
avg_sim_score and avg_prompt_embed. use euclidean distance by default. Alternatively, measure of similarity can
also be included as a new field but hasn’t been explored in the current work.
which can take one of five values - avg_sim_score, avg_prompt_embed, max_confidence,
majority_vote or (summing log probabilities) i.e., sum_logprobs (former two belong to router
approach and the latter three constitute the ensemble approach).
        </p>
        <p>
          • For LS, we use [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and the corresponding JSON object has fields times and learning_rate.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <p>
        6.1. Setup
As described in Section 5, the base LLM used is Qwen2.5-0.5B-Instruct, domain is
common-sensereasoning and evaluation datasets are ARC-c, BoolQ, HellaSwag and ARC-e. Baselines used include
quite recent works such as DnD [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], Test-Time Learning (TTL) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Decoupled and Orthogonal Merging
(DOM)9 [62] and average of task-specific training LoRA’s [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. On one extreme, TTL uses the entire
unlabeled corpus of the training LoRA’s in addition to the 128 unlabeled examples from the target
dataset as seen by SOLAR. On the other extreme, instead of using the unlabeled corpus, DOM merges
all the 7 training LoRA’s inclusive of the target set.
      </p>
      <sec id="sec-6-1">
        <title>6.2. Hardware</title>
        <p>All experiments were conducted on a high-performance computing node running Ubuntu 22.04.1. The
backend processor was EPYC 8434P, which had 48 physical cores (96 logical threads), 256 GB of system
RAM and a maximum clock speed of 2.5 GHz. Four NVIDIA RTX A6000 GPUs, each with 48 GB of
dedicated VRAM were utilized. Python version used was 3.12.11 and GPU-accelerated tasks were
managed using CUDA version 12.4.
9DOM is a data-free framework for LoRA merging. It separates parameters into magnitude and direction components and
merges them independently, thereby reducing the impact of magnitude diferences on the directional alignment of the
merged models, thus helping in preserving task information. It also uses a data-free, layer-wise gradient descent method
with orthogonal constraints to mitigate interference during the merging of direction components. For evaluation on a target
dataset, LoRA’s of remaining datasets are merged and used.</p>
        <p>Algorithm 1 Sequential Multi-Level RL Loop for Adaptation Strategy (AS) Generation of SOLAR
1: Input: Base LM , dataset context , evaluation metric  , initial knowledge base 
2: Init: Low-rank adapter generator , sampled adapters  ← Sampler(, )
3: Level I (Single-edit self-training):
4: for iteration  = 1, . . . , 1 do
5: Propose single-edit AS from 
6: Apply AS and obtain weights
7: Evaluate
8: Compute reward
9: if  &gt; threshold1 then
10:  ← RL_Update(, , AS)
11: end if
12: end for
13: Level II (Chained/compositional strategies):
14: for iteration  = 1, . . . , 2 do
15: Propose chain of edits
16: Sequentially apply chain
17: Evaluate final weights
18: Compute reward
19: if  &gt; threshold2 then
20: Add chain to KB
21:  ← RL_Update(, , AS)
22: end if
23: end for
24: Level III (Open-ended exploration):
25: for iteration  = 1, . . . , 3 do
26: Generate unconstrained AS
27: Validate (syntax/safety); if invalid continue
28: Apply AS conservatively (strong meta-reg)
29: Evaluate and compute reward
30: if  &gt; threshold3 then
31:  ←  ∪ {AS};  ← RL_Update(, , AS)
32: else
33: Penalize harmful proposals in policy update
34: end if
35: end for
36: Return: Refined parameters  * , enriched KB *</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.3. Results</title>
        <p>AS ∼ LM  (, )
 ′ ← ApplyStrategy(, AS, )</p>
        <p>Ans ∼ LM  ′ (· |  )</p>
        <p>← (Ans,  )
 0 ← ;</p>
        <p>AS = [1, . . . , ],  ∈ 
 ← ApplyStrategy( −1 , , )</p>
        <p>Ans ∼ LM   (· |  )</p>
        <p>← (Ans,  )
 ←  ∪ {AS}
AS ∼ LM  (· | ) (novel structure)</p>
        <p>′ ← ApplyStrategy(, AS, )
Ans ∼ LM  ′ (· |  );  ← (Ans,  )
The major results of this work are presented in Table 1 wherein we conduct experiments of 5 benchmarks
which are in the domain of common-sense reasoning and also on 5 out-of-domain benchmarks namely
GSM-MC and MATH-MC 10 to evaluate mathematical reasoning, DivLogicEval [66] for logical reasoning,
SocialIQA [67] for reasoning about social interactions and CodeMMLU [68] for reasoning about
coderelated tasks. It can be seen that SOLAR in its initial version itself outperforms the task-specific training
LoRA’s, TTL, DOM and even DnD by a significant margin, showcasing the promising potential it is
capable of, if further levels of RL training11 is completed as well.</p>
        <p>Following were the adaptation strategies identified, which enabled SOLAR to reach the accuracy levels
presented,
10GSM-MC and MATH-MC are multiple choice versions of the standard GSM-8K [63] and MATH [64] datasets. They were
selected for two reasons - ease of evaluation and correlation with performance on their subjective counterparts [65].
11This might be quite time-intensives however with current version itself taking around 4 days using 2 A6000 GPU’s. The
reason for using only 2 despite 4, is because Qwen family has 14 attention heads and the vllm serves used for improved
eficiency in inference requires this number to be divisible by the number of GPU’s which is only possible if either 2 or 7 are
available.
1e-5, "batch_size”: 4, "shuffle_data”: True}
• For ARC-c and SocialIQA, it was LS family with configuration {“ times”: 5, "learning_rate”:
0.1}
• For BoolQ, GSM-MC and MATH-MC, it was LoRA family with TS-mixing strategy and the
configuration was {“lambda”: 0.5}
• For HellaSwag, DivLogicEval and CodeMMLU, it was TTS family. Ex:, for Hellaswag, the
corresponding configuration was {“ num_prompt_batches”: 20, "method”: max_confidence},
indicative of the ensemble approach</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.4. Ablation Study</title>
        <p>
          A primary efect we would like to isolate and study is that of initial prompt batch provided to start the
LLM adaptation process using SOLAR. It would be ideal if SOLAR results in similar performance even
if a highly representative, diverse and influential prompt batch is used. For this purpose, inspired by
[
          <xref ref-type="bibr" rid="ref53">53</xref>
          ], we use the following strategy for prompt filtering and selection (see Figure 5).
We first model inter-prompt relations as a directed graph  = (V, E, P), wherein each prompt is
encoded as a vector by using Sentence-BERT. Each vertex  ∈ V denotes a prompt (sample), a directed
edge (, ) ∈ E connects  to its neighbor  , and weight (, ) ∈ P is the cosine similarity of their
embeddings. For each node , an  is computed as shown below so that nodes with higher average
similarity make more connections.
        </p>
        <p>=</p>
        <p>1
|V| − 1
∑︁ (, ),  = ⌈ ·   · (|V| − 1)⌉
̸=
Samples are then scored by by (1) influence and (2) diversity. The influence score () is obtained by
a difusion simulation 12. For this, first initialize an active set active = {}, then iteratively sample
an active node  and attempt to activate each neighbor  ∈ 1() with probability (, ). Newly
activated nodes join active. This process is repeated until no active nodes remain. Let () be the total
number of visited nodes. Diversity penalty () measures overlap with already selected nodes:
() = −</p>
        <p>∑︁   ⃒⃒ () ∩ selected⃒⃒ ,  () = () +  ()
=1
12The simulation is run 20 times and is then averaged to obtain the final value.
Finally, greedy graph search is done to select the final prompt subset . For this, start with  = ∅ and
at each round pick
* = arg max  (),</p>
        <p>∈/
* is then added to  and diversity penalties only for neighbors of * are updated13. This process
continues until || reaches the target size which in our case is 128.</p>
        <p>Fortunately, the influence of the initial prompt batch was marginal (with just a 0.3% improvement
in accuracy when averaged across all evaluation datasets), indicating that SOLAR can eficiently adapt
LLMs to unseen datasets without the requirement of high-quality or manually curated dataset. Only a
handful of unlabeled prompt instances which are merely indicative of the task sufice.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this paper, we introduce SOLAR which is a novel paradigm for Streaming and Continual Learning
by empowering LLMs to autonomously discover and retain parameter- level adaptation strategies. By
bridging the gap between rapid test-time adaptation (plasticity) and long-term meta-knowledge retention
(stability), SOLAR addresses the core challenges of deploying agents in non-stationary environments.
While currently reliant on a seed knowledge base, the framework lays the groundwork for fully
autonomous, self-evolving systems capable of navigating the open-ended drifts of the real world.
Another key tradeof is that of real- time adaptation versus computation. While SOLAR’s training phase
is compute-intensive, the inference-time application of learned strategies is rapid. By pre-compiling
complex adaptation routines into the knowledge base, SOLAR shifts the computational burden from the
streaming phase to the ofline meta-learning phase. This allows the agent to react to concept drift in
near real-time by simply retrieving and applying a cached strategy, rather than performing expensive
gradient descent from scratch every time.
13Note that the influence scores are precomputed.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Acknowledgments</title>
      <p>The authors would like to thank Professor Sashikumaar Ganesan, from the Department of Computational
and Data Science at Indian Institute of Science, Bangalore for feedback and additional compute resources
required to execute this project.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Large Language Models (GPT-5.2, Claude Opus
4.5 and Gemini-3) as a writing assistant tool for drafting content, to generate literature review, for
abstract drafting, to paraphrase and reword, to improve writing style, for grammar and spelling check
as well as to generate the images used in the paper. The process was interactive. After writing the
core content, the authors used LLMs with specific prompts to refine the text. These prompts included
requests to “check for grammatical errors,” “rephrase this sentence for clarity,” “make this paragraph
more concise,” or “suggest alternative phrasing to improve flow.” The LLMs were not used to generate
any scientific ideas, experimental results, data analysis or other core intellectual contributions of the
paper. After using these tool(s)/service(s), the authors reviewed and edited the content as needed and
take full responsibility for the publication’s content.
//arxiv.org/abs/2501.19050. arXiv:2501.19050.
[56] Z. Zhao, T. Shen, D. Zhu, Z. Li, J. Su, X. Wang, K. Kuang, F. Wu, Merging loras like playing
lego: Pushing the modularity of lora to extremes through rank-wise clustering, 2024. URL: https:
//arxiv.org/abs/2409.16167. arXiv:2409.16167.
[57] L. Chen, M. Prabhudesai, K. Fragkiadaki, H. Liu, D. Pathak, Self-questioning language models,
2025. URL: https://arxiv.org/abs/2508.03682. arXiv:2508.03682.
[58] G. Zhang, F. Meng, G. Wan, Z. Li, K. Wang, Z. Yin, L. Bai, S. Yan, Latentevolve: Self-evolving
test-time scaling in latent space, 2025. URL: https://arxiv.org/abs/2509.24771. arXiv:2509.24771.
[59] A. Karan, Y. Du, Reasoning with sampling: Your base model is smarter than you think, 2025. URL:
https://arxiv.org/abs/2510.14901. arXiv:2510.14901.
[60] Z. Wang, D. Ma, X. Huang, D. Cai, T. Lan, J. Xu, H. Mi, X. Tang, Y. Wang, The end of manual
decoding: Towards truly end-to-end language models, 2025. URL: https://arxiv.org/abs/2510.26697.
arXiv:2510.26697.
[61] A. Singh, J. D. Co-Reyes, R. Agarwal, A. Anand, P. Patil, X. Garcia, P. J. Liu, J. Harrison, J. Lee,
K. Xu, A. Parisi, A. Kumar, A. Alemi, A. Rizkowsky, A. Nova, B. Adlam, B. Bohnet, G. Elsayed,
H. Sedghi, I. Mordatch, I. Simpson, I. Gur, J. Snoek, J. Pennington, J. Hron, K. Kenealy, K. Swersky,
K. Mahajan, L. Culp, L. Xiao, M. L. Bileschi, N. Constant, R. Novak, R. Liu, T. Warkentin, Y. Qian,
Y. Bansal, E. Dyer, B. Neyshabur, J. Sohl-Dickstein, N. Fiedel, Beyond human data: Scaling
selftraining for problem-solving with language models, 2024. URL: https://arxiv.org/abs/2312.06585.
arXiv:2312.06585.
[62] S. Zheng, H. Wang, C. Huang, X. Wang, T. Chen, J. Fan, S. Hu, P. Ye, Decouple and
orthogonalize: A data-free framework for lora merging, 2025. URL: https://arxiv.org/abs/2505.15875.
arXiv:2505.15875.
[63] K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton,
R. Nakano, et al., Training verifiers to solve math word problems, arXiv preprint arXiv:2110.14168
(2021).
[64] D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, J. Steinhardt, Measuring
mathematical problem solving with the math dataset, arXiv preprint arXiv:2103.03874 (2021).
[65] Z. Zhang, Z. Jiang, L. Xu, H. Hao, R. Wang, Multiple-choice questions are eficient and robust llm
evaluators, arXiv preprint arXiv:2405.11966 (2024).
[66] T. T. Chung, L. Liu, M. Yu, D.-Y. Yeung, Divlogiceval: A framework for benchmarking logical
reasoning evaluation in large language models, arXiv preprint arXiv:2509.15587 (2025).
[67] M. Sap, H. Rashkin, D. Chen, R. LeBras, Y. Choi, Socialiqa: Commonsense reasoning about social
interactions, arXiv preprint arXiv:1904.09728 (2019).
[68] D. N. Manh, T. P. Chau, N. Le Hai, T. T. Doan, N. V. Nguyen, Q. Pham, N. D. Bui, Codemmlu: A
multi-task benchmark for assessing code understanding capabilities of codellms, CoRR (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Scaling laws for neural language models</article-title>
          , arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>08361</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Learning structured sparsity in deep neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>29</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Chen,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shuai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Test-time learning for large language models</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.20633. arXiv:
          <volume>2505</volume>
          .
          <fpage>20633</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Qi, Slot:
          <article-title>Sample-specific language model optimization at test-time</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.12392. arXiv:
          <volume>2505</volume>
          .
          <fpage>12392</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Sheng,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ma</surname>
          </string-name>
          , L. Yuan,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Ttrl:
          <article-title>Test-time reinforcement learning</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2504.16084. arXiv:
          <volume>2504</volume>
          .
          <fpage>16084</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M. M. Moradi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Amer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Mudur</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Y. Liu, W. Ahmed,
          <article-title>Continuous self-improvement of large language models by test-time training with verifier-driven sample selection</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.19475. arXiv:
          <volume>2505</volume>
          .
          <fpage>19475</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tack</surname>
          </string-name>
          ,
          <article-title>Revise: Learning to refine at test-time via intrinsic selfverification</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2502.14565. arXiv:
          <volume>2502</volume>
          .
          <fpage>14565</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hübotter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Diaz-Bone</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Hakimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <article-title>Learning on the job: Test-time curricula for targeted reinforcement learning</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2510.04786. arXiv:
          <volume>2510</volume>
          .
          <fpage>04786</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bertolissi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hübotter</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Hakimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <article-title>Local mixtures of experts: Essentially free test-time training via model merging</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.14136. arXiv:
          <volume>2505</volume>
          .
          <fpage>14136</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Band</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Candès</surname>
          </string-name>
          , T. Hashimoto, Synthetic continued pretraining,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2409.07431. arXiv:
          <volume>2409</volume>
          .
          <fpage>07431</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O</given-names>
            <surname>'Brien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <article-title>Self-updatable large language models by integrating context into model parameters</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2410.00487. arXiv:
          <volume>2410</volume>
          .
          <fpage>00487</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ping</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , T. Ji, Loki:
          <article-title>Low-damage knowledge implanting of large language models</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.22120. arXiv:
          <volume>2505</volume>
          .
          <fpage>22120</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Tanaka, New News:
          <article-title>System-2 fine-tuning for robust integration of new knowledge, 2025</article-title>
          . URL: https://arxiv.org/abs/2505.
          <year>01812</year>
          . arXiv:
          <fpage>2505</fpage>
          .
          <year>01812</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Prabhudesai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fragkiadaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <article-title>Self-questioning language models</article-title>
          ,
          <source>arXiv preprint arXiv:2508.03682</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          , R-zero:
          <article-title>Self-evolving reasoning llm from zero data</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2508.05004. arXiv:
          <volume>2508</volume>
          .
          <fpage>05004</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Acikgoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hakkani-Tür</surname>
          </string-name>
          , G. Tur,
          <article-title>Self-improving llm agents at test-time</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2510.07841. arXiv:
          <volume>2510</volume>
          .
          <fpage>07841</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>J.-C. Pang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Language model selfimprovement by reinforcement learning contemplation, 2023</article-title>
          . URL: https://arxiv.org/abs/2305. 14483. arXiv:
          <volume>2305</volume>
          .
          <fpage>14483</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kadavath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Basart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinhardt</surname>
          </string-name>
          ,
          <article-title>Measuring mathematical problem solving with the math dataset</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2103.03874. arXiv:
          <volume>2103</volume>
          .
          <fpage>03874</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zweiger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          , E. Akyürek,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Self-adapting
          <source>language models</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2506.10943. arXiv:
          <volume>2506</volume>
          .
          <fpage>10943</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wermter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Curriculum-rlaif: Curriculum alignment with reinforcement learning from ai feedback</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.20075. arXiv:
          <volume>2505</volume>
          .
          <fpage>20075</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Y.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sukhbaatar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          , Self-rewarding
          <source>language models</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2401.10020. arXiv:
          <volume>2401</volume>
          .
          <fpage>10020</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Memento:
          <article-title>Fine-tuning llm agents without fine-tuning llms</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2508. 16153. arXiv:
          <volume>2508</volume>
          .
          <fpage>16153</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mendonca</surname>
          </string-name>
          , Y. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          , S. Levine,
          <article-title>Meta-reinforcement learning of structured exploration strategies</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1802</year>
          .07245. arXiv:
          <year>1802</year>
          .07245.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>Irie</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Schlag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Csordás</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>A modern self-referential weight matrix that learns to modify itself, 2022</article-title>
          . URL: https://arxiv.org/abs/2202.05780. arXiv:
          <volume>2202</volume>
          .
          <fpage>05780</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-E.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>A survey on self-evolution of large language models</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2404.14387. arXiv:
          <volume>2404</volume>
          .
          <fpage>14387</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>ang Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Juan</surname>
          </string-name>
          , H. Liu, S. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Xiang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey of self-evolving agents:</article-title>
          <source>On path to artificial super intelligence</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2507.21046. arXiv:
          <volume>2507</volume>
          .
          <fpage>21046</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schürholt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <article-title>Recurrent difusion for large-scale parameter generation</article-title>
          ,
          <source>arXiv preprint arXiv:2501.11587</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schürholt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Drag-</surname>
          </string-name>
          and
          <article-title>-drop llms: Zero-shot prompt-to-weights</article-title>
          ,
          <source>arXiv preprint arXiv:2506.16406</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Charakorn</surname>
          </string-name>
          , E. Cetin,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          , R. T. Lange,
          <string-name>
            <surname>Text-</surname>
          </string-name>
          to-lora: Instant transformer adaption,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2506.06105. arXiv:
          <volume>2506</volume>
          .
          <fpage>06105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>R. M. S. Khan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , Oral:
          <article-title>Prompting your large-scale loras via conditional recurrent difusion</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2503.24354. arXiv:
          <volume>2503</volume>
          .
          <fpage>24354</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>You</surname>
          </string-name>
          , Conditional lora parameter generation,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2408.01415. arXiv:
          <volume>2408</volume>
          .
          <fpage>01415</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ma,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Icm-fusion: In-context meta-optimized lora fusion for multi-task adaptation</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/ 2508.04153. arXiv:
          <volume>2508</volume>
          .
          <fpage>04153</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          , In-context
          <source>meta lora generation</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/ abs/2501.17635. arXiv:
          <volume>2501</volume>
          .
          <fpage>17635</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Lora: Low-rank adaptation of large language models</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2022</year>
          , p.
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , A path towards autonomous
          <source>machine intelligence version 0</source>
          .9.
          <issue>2</issue>
          ,
          <fpage>2022</fpage>
          -
          <volume>06</volume>
          -27, Open Review
          <volume>62</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          , P. Liu,
          <article-title>Alphago moment for model architecture discovery</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2507.18074. arXiv:
          <volume>2507</volume>
          .
          <fpage>18074</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Holt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fanconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Foerster</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van der Schaar</surname>
          </string-name>
          , R. T. Lange,
          <article-title>Discovering preference optimization algorithms with and for large language models</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv. org/abs/2406.08414. arXiv:
          <volume>2406</volume>
          .
          <fpage>08414</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          , R-zero:
          <article-title>Self-evolving reasoning llm from zero data</article-title>
          ,
          <source>arXiv preprint arXiv:2508.05004</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kunin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sagastuy-Brena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ganguli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Yamins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          ,
          <article-title>Neural mechanics: Symmetry and broken conservation laws in deep learning dynamics</article-title>
          , arXiv preprint arXiv:
          <year>2012</year>
          .
          <volume>04728</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minderer</surname>
          </string-name>
          , G. Heigold,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          , et al.,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>11929</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Qwen</surname>
            , :,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hui</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Men</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Qiu</surname>
          </string-name>
          ,
          <source>Qwen2.5 technical report</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2412.15115. arXiv:
          <volume>2412</volume>
          .
          <fpage>15115</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zellers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bisk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <source>Hellaswag: Can a machine really finish your sentence?</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1905</year>
          .07830. arXiv:
          <year>1905</year>
          .07830.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>C.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kwiatkowski</surname>
          </string-name>
          , M. Collins,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>Boolq: Exploring the surprising dificulty of natural yes/no questions</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1905</year>
          .10044. arXiv:
          <year>1905</year>
          .10044.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Cowhey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Khot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sabharwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schoenick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tafjord</surname>
          </string-name>
          ,
          <article-title>Think you have solved question answering? try arc</article-title>
          ,
          <source>the ai2 reasoning challenge</source>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1803</year>
          .05457. arXiv:
          <year>1803</year>
          .05457.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mihaylov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Khot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sabharwal</surname>
          </string-name>
          ,
          <article-title>Can a suit of armor conduct electricity? a new dataset for open book question answering</article-title>
          , arXiv preprint arXiv:
          <year>1809</year>
          .
          <volume>02789</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bisk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zellers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Bras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Piqa: Reasoning about physical commonsense in natural language</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1911</year>
          .11641. arXiv:
          <year>1911</year>
          .11641.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sakaguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Bras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bhagavatula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Winogrande:</surname>
          </string-name>
          <article-title>An adversarial winograd schema challenge at scale</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1907</year>
          .10641. arXiv:
          <year>1907</year>
          .10641.
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Soylu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ziems</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Opsahl-Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shandilya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Ryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Dimakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Stoica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          , Gepa:
          <article-title>Reflective prompt evolution can outperform reinforcement learning</article-title>
          ,
          <year>2025</year>
          . URL: https: //arxiv.org/abs/2507.19457. arXiv:
          <volume>2507</volume>
          .
          <fpage>19457</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , Reflectevo:
          <article-title>Improving meta introspection of small llms by learning self-</article-title>
          <string-name>
            <surname>reflection</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.16475. arXiv:
          <volume>2505</volume>
          .
          <fpage>16475</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Wu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <article-title>Instruct-of-reflection: Enhancing large language models iterative reflection capabilities via dynamic-meta instruction</article-title>
          ,
          <year>2025</year>
          . URL: https: //arxiv.org/abs/2503.00902. arXiv:
          <volume>2503</volume>
          .
          <fpage>00902</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yuksekgonul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <source>Textgrad: Automatic "diferentiation" via text</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2406.07496. arXiv:
          <volume>2406</volume>
          .
          <fpage>07496</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lv</surname>
          </string-name>
          , X. Cheng, J.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W. X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Wen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J. Zhou,
          <article-title>Enhancing cross-task transfer of large language models via activation steering</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2507.13236. arXiv:
          <volume>2507</volume>
          .
          <fpage>13236</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <article-title>Mixture-of-subspaces in low-rank adaptation</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2406.11909. arXiv:
          <volume>2406</volume>
          .
          <fpage>11909</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dvijotham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. R.</given-names>
            <surname>Manchester</surname>
          </string-name>
          ,
          <article-title>Norm-bounded low-rank adaptation</article-title>
          ,
          <year>2025</year>
          . URL: https:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>