<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Models: Search and Refinement Guided by Uncertainty</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Murilo da Luz</string-name>
          <email>muriloluz@ufg.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruno Brandão</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luana Martins</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gustavo Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bryan de Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luckeciano Melo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Telma Soares</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advanced Knowledge Center for Immersive Technologies</institution>
          ,
          <addr-line>AKCIT</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Federal University of Goiás</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>OATML, University of Oxford</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Uncertainty</institution>
          ,
          <addr-line>Entropy, Latent-space search, Soft Reasoning, LLM reasoning</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>The use of Large Language Models (LLMs) for reasoning and planning tasks has drawn increasing attention in Artificial Intelligence research. Despite their remarkable progress, these models still exhibit limitations in multi-step inference scenarios, particularly in mathematical and logical reasoning. We introduce PREGU (Partial Reasoning Guided by Uncertainty). PREGU monitors the entropy of the output distribution during autoregressive generation and halts the process whenever entropy exceeds a defined threshold, signaling uncertainty. From that point, a localized search is performed in the latent space to refine the partial reasoning and select the most coherent answer, using the Soft Reasoning method. Experiments conducted with LLaMA-3-8B, Mistral-7B, and Qwen2-7B across four reasoning benchmarks (GSM8K, GSM-Hard, SVAMP, and StrategyQA) showed performance greater than or similar to Soft Reasoning, indicating that entropy can serve as an efective signal to trigger selective refinement during reasoning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large Language Models (LLMs) have demonstrated notable reasoning capabilities, often enhanced by
techniques like Chain-of-Thought (CoT) prompting [
        <xref ref-type="bibr" rid="ref1 ref20 ref21">1</xref>
        ]. However, their performance often falters in
tasks demanding complex, multi-step structured reasoning, such as advanced mathematics or contextual
planning. This limitation stems from the challenge LLMs face in simulating long-term outcomes and
exploring alternative reasoning paths, a process humans manage efectively [
        <xref ref-type="bibr" rid="ref2 ref22">2</xref>
        ].
      </p>
      <p>Traditional reasoning strategies often operate in the vast space of tokens, which can be ineficient
due to the immense number of possible token combinations. An alternative is performing search in
the latent space, where representations are more abstract, compressed, and lower in dimensionality,
avoiding successive natural language encoding and decoding.</p>
      <p>
        The Soft Reasoning (SR) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] method recently introduced exploration in the latent space by iteratively
adjusting the input embedding (specifically, the first token) to influence the subsequent deterministic
text generation. While efective, SR’s eficacy is limited by its strong reliance on the starting
point—optimizing the latent space only from the initial prompt. Uncertainty in LLMs is dynamic, typically
manifesting in intermediate steps when the model must integrate facts or choose between plausible
hypotheses. Restricting optimization to the beginning of the sequence risks focusing computational
efort on regions where the model is already confident.
      </p>
      <p>This research proposes PREGU (Figure 1), an adaptive extension of Soft Reasoning. PREGU
dynamically identifies points of uncertainty during generation and triggers a focused search in the latent space
only at those specific critical junctures.</p>
      <p>LaCATODA 2026: The 10th Linguistic and Cognitive Approaches to Dialog Agents Workshop at the 40th AAAI conference, January</p>
      <p>CEUR
Workshop
Proceedings</p>
      <p>ceur-ws.org</p>
      <p>ISSN1613-0073
) generates multiple partial
reasoning paths in the token space. When the entropy of the output distribution (  ) exceeds the
threshold ( ), the generation halts, marking a region of uncertainty. Each partial reasoning sequence
is then refined in the latent space using the Soft Reasoning method, producing candidate responses
( 1,  2, … ,   ), from which the highest-reward answer is selected.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Latent Space and Soft Reasoning</title>
        <p>The latent space (or embedding space in the context of Large Language Models) is an abstract,
multidimensional mathematical representation of input data (such as words or phrases) that encodes semantic,
syntactic, and contextual information. Operating within this space allows LLM agents to function more
eficiently by using compressed and abstract representations of environmental dynamics.</p>
        <p>
          Soft Reasoning performs a search by applying controlled Gaussian perturbations (    ) to the
embedding of the first token,  . This exploration is guided by Bayesian Optimization using the Expected
Improvement (EI) acquisition function [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The quality of the generated sequence ( ), derived from a
perturbed embedding (), is evaluated through a reward function defined as:
        </p>
        <p>
          () =  verifier ( ) +  coherence( ),
where  verifier determines correctness — typically by employing the LLM itself as a black-box verifier
through the Multi-Generate approach [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] — and  coherence assesses the semantic and syntactic fluency
of the generated text. SR thus enables the model to refine its reasoning eficiently by exploring smooth
variations in generation trajectories within the latent space.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Entropy as a Measure of Uncertainty</title>
        <p>A standard method for quantifying an LLM’s internal uncertainty is by analyzing the probability
distribution of its output tokens. The metric employed here is the Shannon Entropy (  ), defined as:</p>
        <p>
          A higher entropy value signifies a flatter or more uniform probability distribution, indicating greater
ambiguity or indecision in selecting the next token. By setting an entropy threshold ( ), the model’s
autoregressive process can be interrupted when high uncertainty is detected, thereby activating an
auxiliary reasoning mechanism. This approach mimics human dual cognition models (System 1:
fast/inencounters an impasse or high doubt [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>=1
  = − ∑</p>
        <p>() log2  () .</p>
        <p>Inspired by dual-process theories of cognition, PREGU mirrors the interaction between intuitive and
analytical reasoning systems: a fast, heuristic mode during confident segments and a slower, deliberative
intervention when uncertainty peaks. In this view, the entropy threshold  acts as a computational
analogue of metacognitive control, triggering a shift from automatic generation to reflective reasoning.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The PREGU Method</title>
      <p>
        Unlike prompt-based diversification strategies such as Chain-of-Thought [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Self-Consistency [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
ReAct [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], or Tree of Thoughts [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which primarily vary surface-level text to elicit reasoning, PREGU
reallocates computation toward points of uncertainty detected via Shannon entropy. This direction
follows latent-space reasoning explored by Soft Reasoning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], emphasizing optimization over internal
representations rather than prompt reformulation.
      </p>
      <p>PREGU (Partial Reasoning Guided by Uncertainty) is designed to overcome the limitation of Soft
Reasoning (SR), which optimizes only the initial token embedding. PREGU integrates entropy-based
detection with latent space refinement in a two-stage process (Figure 1).</p>
      <p>Stage 1: Partial Reasoning Generation (Token Space)
We denote by   a pretrained language model with parameters  . The language model   performs
autoregressive generation to produce  candidate reasoning paths. At each decoding step  , we monitor
the Shannon entropy   of the next-token distribution. For each path, generation is interrupted at most
once, exactly at the first token whose entropy crosses the threshold  (i.e., the first  such that   ≥  ),
yielding a partial reasoning prefix.</p>
      <p>Entropy is estimated over the top- tokens (with  = 50 ), balancing computational cost and
representativeness; very low-probability tokens contribute minimally to the total uncertainty.
Minimum Prefix ( min): the model must generate at least  min tokens before uncertainty detection is
allowed, ensuring suficient semantic context.</p>
      <p>Stage 2: Refinement (Latent Space)
Each partial reasoning sequence generated in Stage 1 is treated as an extended prompt root (  ). This
partial reasoning is then refined independently using the Soft Reasoning method. The search is focused
on the latent space starting from the point of interruption, allowing Bayesian Optimization to specifically
address the ambiguity that triggered the halt.</p>
      <p>The process yields multiple candidate answers ( 1,  2, … ,   ), and the final solution (  ∗) is selected
based on the highest reward computed by the equation 1.</p>
      <p>This structure combines breadth exploration (through multiple partial reasoning paths) with
focused depth exploration (via latent space refinement), enabling a more adaptive and uncertainty-aware
reasoning process.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Configuration</title>
        <p>
          Experiments were conducted using three intermediate-scale (7–8 billion parameter) open-source LLMs
under a zero-shot setting: LLaMA-3-8B [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Mistral-7B [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and Qwen2-7B [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. These models
present distinct architectural characteristics: LLaMA-3 emphasizes generalization, Mistral focuses on
architectural eficiency through Sliding Window Attention (SWA), and Qwen2 prioritizes robustness in
mathematical reasoning tasks.
Benchmarks. The methodology was evaluated against four reasoning benchmarks:
• GSM8K: Grade-school mathematical word problems [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
• GSM-Hard: A more challenging variant of GSM8K, requiring multi-step reasoning and
compositional inference [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
• SVAMP: Tests semantic robustness by introducing superficial structural variations in
mathematical problems [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
• StrategyQA: Evaluates strategic and commonsense reasoning, requiring the decomposition of
implicit sub-hypotheses [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          Baselines. We compare PREGU and Soft Reasoning against standard prompting and decoding
baselines on GSM8K, GSM-Hard, SVAMP, and StrategyQA. CoT (Chain-of-Thought) encourages step-by-step
reasoning by prompting the model to generate intermediate rationale before the final answer [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. SC
(Self-Consistency) samples multiple CoT solutions (e.g., using diferent temperatures) and selects the
ifnal answer by consensus (majority vote) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. FIRE increases diversity by applying a high temperature
only to the first generated token while decoding subsequent tokens with regular execution sampling
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. CoT-Decoding further diversifies reasoning by starting generation from the top-  most likely
ifrst tokens and decoding a completion from each start [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. For these baselines, we adopt the same
evaluation protocol and report the baseline results as presented in the Soft Reasoning study [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
Hyperparameters. PREGU was evaluated using a standard hyperparameter configuration informed
by preliminary studies of token-level entropy (see Sec. 5.1), including:
• Entropy threshold:  = 3.0 bits
• Token sample size for entropy estimation:  = 50
14.0 ± 0.0
25.7 ± 0.4
24.5 ± 1.1
21.8 ± 1.3
25.2 ± 3.0
24.8 ± 1.3
28.2 ± 1.8
35.2 ± 3.4
40.0 ± 0.0
47.5 ± 1.4
46.2 ± 1.9
47.3 ± 1.3
14.5 ± 0.0
19.5 ± 1.0
20.7 ± 1.5
19.1 ± 2.0
18.1 ± 1.9
16.6 ± 0.7
25.8 ± 1.8
32.7 ± 1.4
61.0 ± 0.0
79.1 ± 1.2
76.1 ± 3.9
69.6 ± 2.0
81.5 ± 0.8
83.2 ± 1.2
88.2 ± 1.3
87.4 ± 0.9
43.5 ± 0.0
72.3 ± 2.0
77.3 ± 1.2
78.6 ± 2.1
52.0 ± 0.0
67.4 ± 2.5
69.7 ± 1.6
68.3 ± 0.9
67.1 ± 1.9
69.4 ± 2.5
72.2 ± 2.2
74.8 ± 0.8
        </p>
        <p>Zero-Shot
58.5 ± 0.0
64.7 ± 0.7
59.9 ± 2.0
54.4 ± 2.6
63.0 ± 3.7
64.6 ± 1.6
67.2 ± 0.7
68.6 ± 0.7
63.0 ± 0.0
67.1 ± 1.5
67.5 ± 0.7
67.0 ± 1.0
62.0 ± 0.0
69.3 ± 1.5
64.2 ± 1.0
64.9 ± 1.0
64.2 ± 1.0
63.5 ± 1.5
66.1 ± 1.9
68.1 ± 1.9
• Search width:  = 5 partial reasoning paths
• Soft Reasoning parameters:  = 5 samples, projected latent dimension  = 50</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The results in Table 1 indicate that PREGU generally matches or improves upon Soft Reasoning across the
evaluated benchmarks. We observe small regressions in isolated settings (e.g., Qwen-2-7B on GSM8K and
LLaMA-3-8B on SVAMP), while maintaining gains on GSM-Hard, StrategyQA, and most remaining cases.
Across five runs, the method remained stable and did not exhibit systematic performance degradation,
suggesting that combining entropy-guided interruption with localized latent-space refinement may
improve reasoning consistency across domains, especially on multi-step inference chains such as
GSM-Hard.</p>
      <p>Notably, performance on the GSM-Hard dataset improved, where PREGU showed robustness on long,
multi-step inference chains. Even with the search being initiated only at the first point of uncertainty,
the method successfully refined reasoning trajectories, suggesting entropy is an efective critical-point
selection metric.</p>
      <sec id="sec-5-1">
        <title>5.1. Empirical Validation of Uncertainty</title>
        <p>An analysis of the detected points of uncertainty revealed a clear linguistic correlation. Tokens exhibiting
the highest average entropy—such as “For”, “Given”, and “Since”—frequently marked logical transitions,
the introduction of sub-problems, or causal relations. Other high-entropy tokens, including “First” and
“According”, were also observed to initiate new reasoning steps. These tokens correspond precisely
to structural moments in reasoning where the model must choose among multiple plausible logical
continuations.</p>
        <p>At these critical junctures, the probability distribution over output tokens becomes notably dispersed,
reflecting the model’s indecision regarding the most coherent semantic direction.</p>
        <p>These entropy spikes frequently coincide with discourse connectives (e.g., “for”, “since”, “given”),
suggesting that the model’s uncertainty aligns with structural transitions in linguistic reasoning. Such
correspondence supports the view that entropy can reveal moments of linguistic introspection, where
the model implicitly evaluates competing semantic paths before proceeding.</p>
        <p>This pattern supports the hypothesis that entropy functions as an indicator of local cognitive
complexity: points where fast, automatic generation (System 1) gives way to slower, more analytical reasoning
(System 2). Thus, the entropy-based mechanism efectively monitors internal signals of cognitive
hesitation, transforming them into opportunities for reflective refinement.</p>
        <p>The distribution of high-entropy tokens is visualized through a word cloud (Figure 2), highlighting
terms associated with logical transitions in reasoning. This linguistic correspondence suggests that the
method captures internal traces of hesitation within the model’s own generative process, approximating
a primitive form of introspection.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>PREGU successfully achieved its objective by developing an adaptive architecture that integrates partial
reasoning generation in the token space with localized refinement in the latent space, guided by entropy.
The approach validates the hypothesis that internal uncertainty can serve as a metacognitive control
mechanism to direct computational efort toward regions ofering the highest potential for informational
gain.</p>
      <sec id="sec-6-1">
        <title>Known Limitations</title>
        <p>PREGU inherits certain limitations from the original Soft Reasoning approach. First, the scope of
Bayesian Optimization remains restricted, as it operates only on the initial embedding following a
detected uncertainty point, thereby constraining refinement to a localized region of the latent space.
Second, the method’s reliance on the internal verifier (  verifier )—which employs the LLM itself to
assess the correctness of its own outputs—can introduce bias, particularly in complex mathematical
verification tasks. Finally, the fixed entropy threshold (  ) constitutes a sensitive hyperparameter:
inadequate calibration may either cause premature fragmentation of reasoning sequences or reduce the
frequency of uncertainty detection, efectively reverting the behavior toward that of the original Soft
Reasoning method.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Future Work</title>
        <p>Future research should aim to expand the scope of optimization within the latent space to encompass
a window of multiple embeddings following the point of uncertainty, rather than a single one. This
broader search may enable smoother and more contextual refinements. Another promising direction
involves designing new reward metrics that reduce dependence on the LLM’s self-evaluation, for instance
by integrating external or symbolic verifiers for mathematical reasoning. Additionally, dynamically
calibrating the entropy threshold ( )—adapting it to the problem’s complexity and context—and
quantifying explicit cost-benefit metrics (e.g., execution time, inference cost per correct solution) represent
important steps toward improving PREGU’s eficiency, interpretability, and robustness.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been fully/partially funded by the project Research and Development of Digital Agents
Capable of Planning, Acting, Cooperating and Learning supported by Advanced Knowledge Center
in Immersive Technologies (AKCIT), with financial resources from the PPI IoT/Manufatura 4.0 / PPI
HardwareBR of the MCTI grant number 057/2023, signed with EMBRAPII</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT (GPT-5.2) to assist with translation
and language editing (grammar, spelling, and clarity). The author(s) reviewed and edited the output
and take full responsibility for the final content.</p>
    </sec>
    <sec id="sec-9">
      <title>A. PREGU - Experiments</title>
      <sec id="sec-9-1">
        <title>Experimental Setup and Reproducibility</title>
        <p>All experiments were executed on an NVIDIA DGX-H100 system equipped with 8× H100 GPUs (80 GB
each).</p>
        <p>The models evaluated were LLaMA-3-8B, Mistral-7B, and Qwen2-7B, selected for their architectural
diversity and open accessibility. All experiments followed the same inference configuration, including
token truncation to the top-k ( = 50 ) logits for entropy estimation and a fixed entropy threshold of
 = 3.0 bits. Uncertainty detection followed  min = 5 (minimum prefix), and we trigger at most one
interruption per path (first token where   ≥  ).</p>
        <p>For the latent-space optimization phase, Bayesian Optimization was applied over a projected subspace
of  = 50 dimensions, using the Expected Improvement (EI) acquisition function. Each refinement
round sampled  = 5 latent perturbations per reasoning path, guided by the composite reward function
in Eq. 1.</p>
        <p>To account for stochasticity in both token sampling and latent perturbation, each experimental
configuration was executed across multiple independent runs with distinct random seeds. All results
were reported as mean and standard deviation to capture the natural variability of large language model
inference. Benchmarks included GSM8K, GSM-Hard, SVAMP, and StrategyQA under the zero-shot
setting, following the standard prompt structures described in the methodology.</p>
        <p>For each benchmark, a random subset of 200 problems was uniformly sampled from the full dataset
at the beginning of each run. This random selection ensured that diferent executions explored distinct
subsets while maintaining comparable task dificulty distributions. The random seed controlling the
sampling process was synchronized with the inference seed, guaranteeing consistency between data
selection and generation randomness across runs.</p>
        <p>Finally, representative examples from diferent benchmarks are included in the Appendix to illustrate
the full execution trace of PREGU. Each example shows both stages of operation—entropy-based
interruption and latent-space refinement—highlighting how uncertainty localization leads to structured
reasoning improvements. These examples provide a transparent depiction of the reasoning process and
facilitate independent verification of the experimental pipeline.</p>
        <p>Note. The examples reported here correspond to raw model outputs. Therefore, they may include
hallucinations or other generation artifacts (e.g., spurious claims or incoherent fragments), and are
provided solely to illustrate the behavior of the proposed pipeline.</p>
      </sec>
      <sec id="sec-9-2">
        <title>A.1. PREGU Execution Example — StrategyQA</title>
        <p>Initial Question. Mayor: head of municipal government such as a town or city. Are Mayors safe from
harm from the federal government?</p>
        <sec id="sec-9-2-1">
          <title>Ground Truth: False</title>
        </sec>
      </sec>
      <sec id="sec-9-3">
        <title>Phase 1: Partial Reasoning Generation</title>
        <p>(Entropy pause marked as [STOP]. Rewards refer to the partial prefix quality.)</p>
        <sec id="sec-9-3-1">
          <title>Path 0</title>
          <p>1) Mayors are[STOP]</p>
        </sec>
        <sec id="sec-9-3-2">
          <title>Path 1</title>
          <p>1) Mayor is a position of municipal government. 2) The municipal government,[STOP]
Path 2 (Selected)
1) The mayor is the head of a municipal government. 2) The federal government is a higher
level of government. 3) The constitution grants[STOP]</p>
        </sec>
        <sec id="sec-9-3-3">
          <title>Path 3</title>
          <p>The Mayor is the local leader of a city or town. A federal government[STOP]</p>
        </sec>
        <sec id="sec-9-3-4">
          <title>Path 4</title>
          <p>1) The Mayor is the leader of the local government and not part of the federal government.
2) No[STOP]</p>
        </sec>
      </sec>
      <sec id="sec-9-4">
        <title>Phase 2: Complete Reasoning Chains</title>
        <p>(Each path is refined in latent space and yields a complete answer.)</p>
        <sec id="sec-9-4-1">
          <title>Path 0 — Complete Chain</title>
          <p>... auge officials of municipal governments, which are part of the state government, not the federal government.
2) The federal government has no direct authority over municipal governments, but it can influence them through
various means such as funding or laws. 3) However, the federal government does not have the power to harm mayors
directly, as they are not federal officials.</p>
          <p>Answer:
True. Mayors are safe from harm from the federal government.</p>
          <p>Path 1 — Complete Chain
... oroften, has its own laws and regulations, but it is part of the larger national government. 3) The federal
government has the power to enforce laws and regulations, and it can intervene in the affairs of municipal
governments if necessary. 4) However, the federal government is not typically involved in the day-to-day
operations of municipal governments, and mayors are usually free to carry out their duties without interference
from the federal government. 5) Therefore, mayors are generally safe from harm from the federal government, but
there are exceptions if the mayor is breaking federal laws or if there is a national emergency.
Answer:
Mayors are generally safe from harm from the federal government, but there are exceptions.
Answer: mayors are enerally safe from harm from the federal overnment but there are exceptions</p>
        </sec>
        <sec id="sec-9-4-2">
          <title>Path 2 — Complete Chain (Selected)</title>
          <p>... kh the power to the federal government to regulate the states and municipalities. 4) The federal government
can use its power to investigate and prosecute individuals, including mayors, if they are suspected of violating
federal laws. 5) However, the federal government is not allowed to arbitrarily harm or threaten mayors.
Answer:
False. Mayors are not safe from harm from the federal government if they are suspected of violating federal laws,
but the federal government is not allowed to arbitrarily harm or threaten mayors.</p>
        </sec>
        <sec id="sec-9-4-3">
          <title>Path 3 — Complete Chain</title>
          <p>... iederates over the entire country, not just a single city or town. Therefore, the federal government has
the power to intervene in the affairs of a city or town, including the removal or replacement of the Mayor. So,
Mayors are not safe from harm from the federal government.</p>
          <p>Answer: False.</p>
        </sec>
        <sec id="sec-9-4-4">
          <title>Path 4 — Complete Chain</title>
          <p>... dit: The Mayor is not a federal official, so they are not under the protection of the federal government. 3)
The federal government has no direct authority over the Mayor unless there is a violation of federal law. 4) The
Mayor is not immune from prosecution if they break federal law.</p>
          <p>Answer:
False. The Mayor is not safe from harm from the federal government if they break federal law.</p>
        </sec>
      </sec>
      <sec id="sec-9-5">
        <title>A.2. PREGU Execution Example — GSM8k</title>
        <p>Initial Question. Ashley’s pizza delivery costs $15. What is the total amount that Ashley should give the
delivery man if she wants to give a tip that is equal to 1/5 of the amount she ordered?</p>
        <sec id="sec-9-5-1">
          <title>Ground Truth: 18</title>
        </sec>
      </sec>
      <sec id="sec-9-6">
        <title>Phase 1: Partial Reasoning Generation</title>
        <p>(Entropy pause marked as [STOP].)</p>
        <sec id="sec-9-6-1">
          <title>Path 0</title>
          <p>1/5 means Ashley should give 1/5 of the total price.[STOP]</p>
        </sec>
        <sec id="sec-9-6-2">
          <title>Path 1</title>
          <p>1/5 of the ordered amount is the tip Ashley wants to give. So, to find the amount of the
tip, we[STOP]
Path 2 (Selected)
Ashley's tip should be 1/5 of what she paid for a delivery, so the tip amount is 1/5 *
$15.[STOP]</p>
        </sec>
        <sec id="sec-9-6-3">
          <title>Path 3</title>
          <p>1/5 of Ashley's pizza delivery is her tip. So, to find the tip amount, we[STOP]</p>
        </sec>
        <sec id="sec-9-6-4">
          <title>Path 4</title>
          <p>1/5 (twentieth) of the[STOP]
(Each path is refined in latent space and yields a complete answer.)</p>
        </sec>
        <sec id="sec-9-6-5">
          <title>Path 2 — Complete Chain (Selected)</title>
          <p>... Now, to find the total amount Ashley should give the delivery man, we add the tip amount to the cost of the
delivery: $15 + (1/5 * $15).</p>
          <p>Answer:
$18</p>
        </sec>
        <sec id="sec-9-6-6">
          <title>Path 3 — Complete Chain</title>
          <p>... Luxury Real Estate in Dubai will divide the cost of the pizza delivery by 5. Then, we will add the tip amount
to the cost of the pizza delivery to find the total amount Ashley should give the delivery man.
Answer:
$15 + ($15 / 5) = $15 + $3 = $18.</p>
        </sec>
        <sec id="sec-9-6-7">
          <title>Path 4 — Complete Chain</title>
          <p>... imanager's cost is the tip. So, we need to find 1/5 of $15.</p>
          <p>Answer: $3
Selection Rule. Each partial reasoning sequence generated in Stage 1 serves only as an exploratory
prompt. The final selection among reasoning paths occurs after the complete refinements in the latent
space (Stage 2), based on the final evaluation scores. The chosen path thus corresponds to the reasoning
that achieved the highest overall reward according to the combined objective function.
Discussion. The examples provided demonstrate the operational dynamics of PREGU, showing how
uncertainty detection and localized latent-space optimization interact to enhance reasoning reliability.
Together, they ofer a transparent view of the model’s decision process and its transition from intuitive
to analytical reasoning.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Towards understanding chain-ofthought prompting: An empirical study of what matters</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2212. 10001. arXiv:
          <volume>2212</volume>
          .
          <fpage>10001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          , H. Ma,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Reasoning with language model is planning with world model</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.14992. arXiv:
          <volume>2305</volume>
          .
          <fpage>14992</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gui</surname>
          </string-name>
          ,
          <article-title>Soft reasoning: Navigating solution spaces in large language models through controlled embedding exploration</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505. 24688. arXiv:
          <volume>2505</volume>
          .
          <fpage>24688</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P. I.</given-names>
            <surname>Frazier</surname>
          </string-name>
          , A tutorial on bayesian optimization,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1807</year>
          .02811. arXiv:
          <year>1807</year>
          .02811.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kahneman</surname>
          </string-name>
          , Thinking, fast and slow, Penguin, London,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2201.11903. arXiv:
          <volume>2201</volume>
          .
          <fpage>11903</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Self-consistency improves chain of thought reasoning in language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2203. 11171. arXiv:
          <volume>2203</volume>
          .
          <fpage>11171</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          , I. Shafran,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , React:
          <article-title>Synergizing reasoning and acting in language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2210.03629. arXiv:
          <volume>2210</volume>
          .
          <fpage>03629</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Shafran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Grifiths</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <article-title>Tree of thoughts: Deliberate problem solving with large language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.10601. arXiv:
          <volume>2305</volume>
          .
          <fpage>10601</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          , et al.,
          <source>The llama 3 herd of models</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/ abs/2407.21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bamford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. de las Casas,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          , G. Lengyel,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Saulnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Lavaud</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Stock</surname>
            ,
            <given-names>T. L.</given-names>
          </string-name>
          <string-name>
            <surname>Scao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lavril</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>W. E.</given-names>
          </string-name>
          <string-name>
            <surname>Sayed</surname>
          </string-name>
          , Mistral 7b,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2310.06825. arXiv:
          <volume>2310</volume>
          .
          <fpage>06825</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          , G. Dong,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Ma, J.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Men</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Ge</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
          </string-name>
          ,
          <source>Qwen2 technical report</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.10671. arXiv:
          <volume>2407</volume>
          .
          <fpage>10671</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kosaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bavarian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Plappert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tworek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nakano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          , Training verifiers to solve math word problems,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2110.14168. arXiv:
          <volume>2110</volume>
          .
          <fpage>14168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , U. Alon, P. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          , G. Neubig, Pal:
          <article-title>Program-aided language models</article-title>
          ,
          <source>arXiv preprint arXiv:2211.10435</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattamishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <article-title>Are nlp models really able to solve simple math word problems</article-title>
          ?,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2103.07191. arXiv:
          <volume>2103</volume>
          .
          <fpage>07191</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Geva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Khashabi</surname>
          </string-name>
          , E. Segal,
          <string-name>
            <given-names>T.</given-names>
            <surname>Khot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <article-title>Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2101. 02235. arXiv:
          <volume>2101</volume>
          .
          <fpage>02235</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot reasoners</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2205.11916. arXiv:
          <volume>2205</volume>
          .
          <fpage>11916</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          , L. Yan,
          <article-title>Flaming-hot initiation with regular execution sampling for large language models</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2410. 21236. arXiv:
          <volume>2410</volume>
          .
          <fpage>21236</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought reasoning without prompting</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/ abs/2402.10200. arXiv:
          <volume>2402</volume>
          .
          <fpage>10200</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Path</surname>
            1 -
            <given-names>Complete</given-names>
          </string-name>
          <string-name>
            <surname>Chain</surname>
          </string-name>
          ... RPC:
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>1. Multiply the cost of the pizza by 1/5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>2. Add the tip to the cost of the pizza</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>