<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Soft Thinking: Enhancing Knowledge Base Completion through Trainable Prompts in the Chain of Thought</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aldan Creo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Guéret</string-name>
          <email>christophe.gueret@accenture.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Bernardi</string-name>
          <email>alberto.bernardi@accenture.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrianna Janik</string-name>
          <email>adrianna.janik@accenture.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Costabello</string-name>
          <email>luca.costabello@accenture.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Accenture Labs</institution>
          ,
          <addr-line>7 Hanover Quay, Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Independent Author</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Joint proceedings of KBC-LM and LM-KBC @ ISWC 2025</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Table 1 LM-KBC Challenge @ ISWC 2025 Final Rankings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We introduce Soft Thinking, a parameter-eficient approach for knowledge base completion that inserts trainable soft prompts within the chain-of-thought reasoning space of language models. Unlike traditional soft prompting methods that prepend tokens to inputs, our approach embeds relation-specific trainable parameters between &lt;think&gt; and &lt;/think&gt; tokens, allowing models to develop specialized reasoning pathways for diferent relation types while keeping base model parameters frozen. We evaluated our method using a dataset of six diferent relations, achieving a macro F1-score of 0.3977, over two times higher than that of baseline prompting (0.186). Our approach particularly excels at entity-based relations, substantially improving geographic and organizational query results while maintaining parameter eficiency (less than 0.001% of total model parameters). Notably, we achieve optimal performance using only subject entities as input, demonstrating that soft prompts can autonomously develop efective reasoning strategies without manual prompt engineering. Our code is available at https://github.com/ACMCMC/soft-thinking.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Intuition.</title>
        <sec id="sec-1-1-1">
          <title>Several recent works have shown that LLMs can be prompted to think in a chain-of-thought manner, generating intermediate reasoning steps before arriving at the final answer.</title>
          <p>Those “thoughts” are expressed in words. But what if thoughts could go beyond words?</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Like a painter who draws a picture without needing to write down the steps of how to do it, can we help the model express its reasoning in a more abstract way?</title>
        </sec>
        <sec id="sec-1-1-3">
          <title>We do that by inserting trainable embeddings in the model’s reasoning process, allowing it to</title>
          <p>
            learn relation-specific reasoning pathways without needing to express them in words.
(We formalize this intuition in the next sections.)
modifying all model parameters. Liu et al. [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] showed that prompt tuning can be as efective as
finetuning the model, while only training a fraction of the weights, typically 0.1%-3% of parameters; Qin
and Eisner [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] utilized mixtures of soft prompts for question answering on LMs.
          </p>
          <p>
            However, these approaches typically insert trainable parameters at the input level or across all layers,
without a specific focus on enhancing the model’s reasoning process. Only very recently has soft
prompting started to be explored: Xu et al. [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] propose utilizing an assistant language model trained to
inform CoT reasoning processes. Similarly, COCONUT (Chain of Continuous Thought) [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] directly
feeds the last layer’s hidden states into the next time step, thus creating a continuous flow of information
that does need to be mapped back into a discrete representation in the form of tokens.
          </p>
          <p>Nonetheless, our work difers in that both approaches aim to generate intermediate thoughts that
are only expressed in the latent space without any explicit tokenization, while we aim to find a set of
soft input embeddings that are not generated on the fly, but rather trained to optimize the model’s
reasoning process. We opt for such a design choice to improve scalability and eficiency, as generating
soft thoughts on the fly would require substantial additional computational resources. This is further
motivated by the fact that predicting the object entity for a given subject, where the relation does not
change, can be expected to follow similar reasoning patterns (in contrast to, e.g., solving mathematical
problems, where there is a greater degree of variability). Therefore, a single chain of soft thoughts can
be trained to cover all such cases while avoiding the additional cost of inference-time generation.</p>
          <p>In this paper, we introduce Soft Thinking, a novel approach that combines the strengths of CoT
prompting with trainable soft prompts specifically inserted in the “thinking” section of the assistant
response. Our approach creates relation-specific reasoning pathways that guide the model’s thought
process while maintaining the model’s parameters frozen. We implement this by:
1. Adding soft chain-of-thought tokens that create a dedicated reasoning space between the question
and answer generation
2. Inserting trainable soft prompt embeddings in this thinking section
3. Training these embeddings for each relation type in a knowledge base completion task
Our contributions are as follows:
• A novel framework for knowledge base completion using trainable “soft thinking” prompts in
the model’s reasoning space
• An automatic optimization approach that eliminates manual prompt engineering by maximizing
answer likelihood, removing human bias and variability
• An eficient implementation requiring minimal trainable parameters (less than 0.001% of total
model parameters)
• Empirical validation achieving third place in the LM-KBC Challenge @ ISWC 2025, with an
average macro F1-score of 0.398 across six relations of diferent nature
• Insights into relation-specific reasoning pathways, showing particular efectiveness on
entitybased relations</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>2.1. Task definition
Formally, we can define the sets of all possible entities ℰ and relations ℛ. The cartesian product
ℰ × ℛ × ℰ defines the space  of all possible assertions. Given a boolean function () that is true if
and only if  is a “true fact”, a Knowledge Base ℬ satisfies the condition ∀ ∈ ℬ ⊆  : (), i.e., the
Knowledge Base is a subset of all possible assertions such that each assertion is a true fact.</p>
      <p>Our inputs are a set of tuples  = (, ) ∈ ℰ × ℛ , and we aim to find a Knowledge Base ℬ such
that for each input tuple , we can find a set of entities  ∈ ℰ such that the assertion  = (, , ) is true,
i.e., () = True.</p>
      <sec id="sec-2-1">
        <title>Standard Prompting</title>
        <p>(Fulbright Prize, REL awardWonBy)</p>
        <p>apply manual template
Who won the Fulbright Prize?
generate with reasoning
&lt;think&gt; I need to find a winner for the
Fulbright Prize. This is an important international
award. It’s probably awarded to a significant
public figure. I believe Bill Gates received this
award for his philanthropic work. &lt;/think&gt;
Answer: Bill Gates</p>
      </sec>
      <sec id="sec-2-2">
        <title>Soft Thinking (ours)</title>
        <p>(Fulbright Prize, REL awardWonBy)</p>
        <sec id="sec-2-2-1">
          <title>Embedding Matrix</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Frozen LM [9]</title>
          <p>User: Fulbright Prize
...</p>
          <p>science? famous rich?
technology? important USA repwuhteodwon?</p>
          <p>philanthropy</p>
          <p>Assistant: &lt;think&gt; 1 2 3 . . . &lt;/think&gt; Bill Gates</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Relation-specific trainable parameters</title>
          <p>∈ R|ℛ|× ×</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Available relations</title>
          <p>1: hasArea
2: hasCapacity
3: awardWonBy
: . . .</p>
          <p>
            For example, given the input tuple (Ireland, hasArea), we want to find all entities  such that
the assertion  = (Ireland, hasArea, ) is true. In this case, there is only one object  that
satisifes this condition: 84,421. Therefore, we can represent the knowledge base as  = {}, where
 = (Ireland, hasArea, 84,421).
2.2. System Architecture
Our approach builds upon the Qwen3-8B language model [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], which we keep frozen to maintain its
pretrained knowledge while introducing trainable parameters specifically for knowledge base completion
tasks. Figure 1 illustrates our architecture compared to standard prompting approaches.
          </p>
          <p>The base architecture consists of the frozen Qwen3-8B model with its standard tokenization pipeline.
We use the model’s existing embedding matrix to convert input tokens to continuous representations,
but we augment this process with relation-specific trainable embeddings inserted at strategic positions
in the input sequence.</p>
          <p>
            Unlike traditional soft prompting methods that prepend trainable tokens to the input [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], our approach
inserts soft prompt embeddings within the chain-of-thought reasoning section. We structure prompts
using special tokens: &lt;think&gt; to begin the reasoning phase, followed by relation-specific soft prompt
embeddings, and &lt;/think&gt; to conclude the reasoning before answer generation.
          </p>
          <p>For each relation type  in our knowledge base, we maintain a trainable parameter matrix  ∈ R× ,
where  is the number of soft prompt tokens (typically 5-10) and  is the model’s embedding dimension
(4096 for Qwen3-8B). These parameters are randomly initialized and optimized during training to
develop relation-specific reasoning pathways.</p>
          <p>Our prompt templates follow a simple structure: we start with the subject entity, add the &lt;think&gt;
token, insert the relation-specific soft prompt embeddings, close with &lt;/think&gt;, and allow the model
to generate the answer. This creates a dedicated reasoning space where the soft prompts can guide the
model’s thought process without interfering with the input question or output answer.</p>
          <p>During forward passes, we combine embeddings from three sources: (1) the frozen word embeddings
for input tokens and special tokens, (2) the trainable soft prompt embeddings for the current relation,
and (3) position-specific attention masks to ensure proper sequence modeling. The combined embedding
sequence is then processed by the frozen language model to generate answers.
2.3. Training Methodology
Our training approach focuses exclusively on optimizing the relation-specific soft prompt parameters
while maintaining the base language model in evaluation mode with frozen weights. This helps to
keep the training computationally eficient and avoiding catastrophic forgetting of the pre-trained
knowledge.</p>
          <p>We organize training data by relation type and process each relation separately within each epoch.
This allows the optimizer to focus on relation-specific patterns and develop specialized reasoning
strategies for diferent types of knowledge queries. For each relation, we shufle the training examples
to prevent overfitting to example ordering.</p>
          <p>We employ the Adam optimizer with a learning rate of 0.05 and weight decay of 0.01 for regularization.
Gradient clipping with a threshold of 2.5 prevents exploding gradients, which can occur with soft
prompt training. We also implement gradient accumulation to simulate larger efective batch sizes
when GPU memory constraints limit the actual batch size.</p>
          <p>Our training objective uses cross-entropy loss computed only on the answer tokens that follow the
&lt;/think&gt; token. This ensures the model learns to generate accurate answers based on the reasoning
process guided by the soft prompts, rather than simply memorizing the training examples. We mask
padding tokens and use the model’s end-of-sequence token for proper termination.</p>
          <p>During training, we evaluate on a fixed validation set of 20 examples per relation to ensure consistent
performance tracking across epochs. We compute macro and micro F1 scores, precision, and recall
metrics, and save the best-performing soft prompts for each relation based on relation-specific macro
F1 scores.</p>
          <p>We implement several memory optimization strategies including gradient checkpointing, careful
tensor management, and periodic garbage collection. These techniques allow training on standard GPU
hardware while maintaining training stability and speed.
2.4. Experimental Setup
We train our model for 300 epochs with a batch size of 10 and a prompt length of 10 trainable tokens per
relation. While we observed diminishing returns after 5 tokens, we chose 10 to capture any remaining
improvements, though results would likely be similar with 5 tokens. The model generates up to 15 new
tokens during both training and inference. We set the random seed to 2262 for reproducibility and run
validation after every epoch to monitor training progress.</p>
          <p>Our approach uses simple subject-entity prompt templates and processes 477 training examples
across 7 relation types. We position the soft prompts after the input prompt within the chain-of-thought
reasoning section, creating dedicated reasoning pathways for each relation type without external
knowledge access.</p>
          <p>With respect to the relation types considered in our experimental setup, we focus on six relation types
across diverse knowledge domains. While the shared task organizers provided detailed prompt templates
for each relation, we chose to use only the subject entity without additional prompt information to
allow the model more flexibility in reasoning.</p>
          <p>The relation types are1:
• REL awardWonBy: Recipients of awards
• REL companyTradesAtStockExchange: Financial market listings
• REL countryLandBordersCountry: International border relationships
• REL hasArea: Geographic area information (e.g., countries, islands)
• REL hasCapacity: Capacity measurements (e.g., stadiums, venues)
• REL personHasCityOfDeath: Biographical information about death locations</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>
        3.1. Competition Performance
Our Soft Thinking approach achieved third place in the LM-KBC Challenge at the 24th International
Semantic Web Conference (ISWC) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Table 1 shows the final competition rankings based on the
average macro F1-score across all relations.
      </p>
      <sec id="sec-3-1">
        <title>Rank</title>
      </sec>
      <sec id="sec-3-2">
        <title>Team</title>
        <p>Average Macro F1</p>
        <p>The baseline row corresponds to directly prompting Qwen3-8B to generate a list of answers, without
any optimization or relation-specific adaptation. Our approach achieved an average macro F1-score
of 0.398, representing a 0.186 point improvement over the baseline and establishing our method as
competitive in the knowledge base completion domain.
1We denote relation types as REL relationName throughout this paper.
3.2. Detailed Performance Analysis</p>
        <p>Our approach demonstrates particularly strong performance on geographic and organizational
relations, achieving F1-scores above 0.5 for REL countryLandBordersCountry (0.771), REL
companyTradesAtStockExchange (0.555), and REL personHasCityOfDeath (0.540). These
relations benefit from the structured reasoning pathways that our soft prompts develop during training.</p>
        <p>Conversely, numeric relations ( REL hasArea, REL hasCapacity) show substantially lower
performance, with F1-scores of 0.190 and 0.090 respectively.</p>
        <p>The precision-recall balance varies significantly across relations, with higher precision than recall
for most entity-based relations (e.g., REL personHasCityOfDeath: 0.930 precision vs. 0.600 recall),
suggesting our method tends toward conservative predictions that favor accuracy over completeness.</p>
        <p>Compared to the baseline, our method also shows substantial improvements on several
relation types, with the most notable gains on REL personHasCityOfDeath (+0.460) and REL
companyTradesAtStockExchange (+0.388).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Next, we analyze our experimental results and discuss the efectiveness of relation-specific soft prompts
for knowledge base completion.</p>
      <p>Our approach achieved competitive performance compared to the other two best teams. We secured
third place with a macro F1-score of 0.398. Notably, our score is practically identical to second place
(0.405), with only a 0.075-point diference; the top-performing methods are very close to each other,
which points to the inherent dificulty of the task.</p>
      <p>The dataset presents inherent challenges that afect all participating methods. We observed instances
of incorrectly formatted subjects, questions with ambiguous or multiple valid answers, and entities
that appear to be non-existent or poorly documented. These issues highlight the fundamental dificulty
of the task: if human experts with internet access struggle to answer certain questions accurately, we
cannot reasonably expect a language model without external knowledge access to perform better on
such cases.</p>
      <p>It is interesting to consider the role of general knowledge in the performance of our model. We got
our best results in the case of relations that a human person with general knowledge, whose language
abilities this general-purpose LM tries to model, would be able to predict or guess.</p>
      <p>To illustrate this, consider a persona  with a general knowledge of the world and capable of making
deductions even if they don’t know the specific answer to a question. We expect  to be able to answer
most of the REL countryLandBordersCountry questions, and to make “reasonable” guesses based
on the textual representation of . For instance, when asked about “Where did Maeve Binchy die?”
(( = Maeve Binchy,  = REL personHasCityOfDeath)),  might assume that such a person with a
name of Irish origin has died in Dublin or Cork, both Irish cities, even if  has no prior knowledge of
that person. There is a nontrivial probability that such guess would turn out to be correct, and indeed,
our model achieves a high F1-score of 0.540 for this relation.</p>
      <p>Conversely, in the context of relations that require specific numeric answers or names, the probability
of guessing correctly is close to impossible, so the performance is severely impacted (e.g. REL hasArea,
REL hasCapacity, REL awardWonBy). We believe that this shows promising results of our approach, as
it seems to be able to mimic the “common sense” reasoning of a human person with general knowledge.</p>
      <p>Another key idea to highlight is that we observed the best performance even after removing the
original prompt templates provided by the challenge organizers and using only the subject entity as
input. While we take more training steps to converge to the optimal solution, removing the prompt
templates eliminates human bias and variability expressed in the specific wording and structure of
the prompts. This allows the model to “explore” the reasoning space more freely and develop its own
understanding of how to approach the task. We see this as one of the key advantages of our approach;
we remove the frequent dilemmas prompt engineers face when trying to find the best prompt structure.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We introduced Soft Thinking, a novel approach for knowledge base completion that combines Chain of
Thought prompting with trainable soft prompts inserted in the model’s reasoning space. Our method
automatically optimizes relation-specific reasoning pathways without manual prompt engineering,
requiring less than 0.001% of total model parameters. We achieved third place in the LM-KBC Challenge
@ ISWC 2025 with a macro F1-score of 0.398, representing a 0.186 point improvement over baseline
and practically identical performance to second place.</p>
      <p>Our analysis revealed that the approach particularly excels at entity-based relations, with
substantial improvements over baseline methods. Notably, we achieved optimal performance using only
subject entities as input, demonstrating that our soft prompts develop efective reasoning strategies
autonomously while avoiding human bias. We believe Soft Thinking represents a promising direction for
parameter-eficient knowledge base completion that balances performance, eficiency, and automation.</p>
    </sec>
    <sec id="sec-6">
      <title>Limitations and Future Work</title>
      <p>While our approach shows promising results, several limitations warrant acknowledgment. Performance
varies significantly across relation types, with particularly poor results on numerical relations such
as REL hasArea (F1-score: 0.190) and REL hasCapacity (F1-score: 0.090). This suggests that our
current soft prompt configuration struggles to capture reasoning patterns for precise numerical queries
requiring exact knowledge retrieval.</p>
      <p>Future work could address these limitations through more sophisticated architectures for numerical
reasoning or hybrid approaches combining soft thinking with external knowledge retrieval. Investigating
the transferability of learned soft prompts across related relations could improve eficiency. Multi-task
learning approaches may yield better performance and more generalizable reasoning patterns. We also
see potential in exploring a mixture of a fixed set of soft prompts common across multiple relations
that are combined with relation-specific ones.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Claude 4.0 Sonnet and GPT-4.1 in order to:
Drafting content, Paraphrase and reword, Improve writing style, Abstract drafting, Grammar and
spelling check, Formatting assistance. The authors reviewed and edited the content to ensure its quality
and integrity. The use of these tools was limited to specific tasks, and the final work is a product of the
authors’ own eforts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          , M. Bosma, brian ichter,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain of thought prompting elicits reasoning in large language models</article-title>
          , in: A. H.
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Belgrave</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Cho (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/forum?id=_VjQlMeSB_J.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <article-title>The power of scale for parameter-eficient prompt tuning</article-title>
          , in: M.
          <article-title>-</article-title>
          <string-name>
            <surname>F. Moens</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , S. W.-t. Yih (Eds.),
          <source>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Online and
          <string-name>
            <given-names>Punta</given-names>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>3045</fpage>
          -
          <lpage>3059</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .emnlp-main.
          <volume>243</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .emnlp-main.
          <volume>243</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Prefix-tuning: Optimizing continuous prompts for generation</article-title>
          , in: C.
          <string-name>
            <surname>Zong</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>4582</fpage>
          -
          <lpage>4597</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>353</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>353</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , P-tuning:
          <article-title>Prompt tuning can be comparable to ifne-tuning across scales and tasks</article-title>
          , in: S. Muresan,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio (Eds.),
          <source>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>68</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .acl-short.8/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .acl-short.
          <volume>8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisner</surname>
          </string-name>
          ,
          <article-title>Learning how to ask: Querying LMs with mixtures of soft prompts</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <source>Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>5203</fpage>
          -
          <lpage>5212</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . naacl-main.
          <volume>410</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>410</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          , C. Miao,
          <article-title>SoftCoT: Soft chain-of-thought for eficient reasoning with LLMs</article-title>
          , in: W. Che,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nabende</surname>
          </string-name>
          , E. Shutova, M. T. Pilehvar (Eds.),
          <source>Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Vienna, Austria,
          <year>2025</year>
          , pp.
          <fpage>23336</fpage>
          -
          <lpage>23351</lpage>
          . URL: https://aclanthology. org/
          <year>2025</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>1137</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2025</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>1137</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sukhbaatar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <article-title>Training large language models to reason in a continuous latent space</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2412.06769. arXiv:
          <volume>2412</volume>
          .
          <fpage>06769</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <source>Qwen3 technical report</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2505.09388. arXiv:
          <volume>2505</volume>
          .
          <fpage>09388</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[10] Lm-kbc challenge @ iswc</source>
          <year>2025</year>
          ,
          <year>2025</year>
          . URL: https://lm-kbc.github.io/challenge2025/, accessed:
          <fpage>2025</fpage>
          -08-08.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>