=Paper=
{{Paper
|id=Vol-3894/paper3
|storemode=property
|title=Towards Optimizing with Large Language Model
|pdfUrl=https://ceur-ws.org/Vol-3894/paper3.pdf
|volume=Vol-3894
|authors=Pei-Fu Guo,Ying-Hsuan Chen,Yun-Da Tsai,Shou-De Lin
|dblpUrl=https://dblp.org/rec/conf/kil/GuoCTL24
}}
==Towards Optimizing with Large Language Model==
Towards Optimizing with Large Language Model
Pei-Fu Guo1,∗ , Ying-Hsuan Chen1,∗ , Yun-Da Tsai1 and Shou-De Lin1
1
Department of Computer Science and Information Engineering, National Taiwan University
Abstract
In this study, we evaluate the optimization capabilities of Large Language Models (LLMs) across diverse mathematical and
combinatorial optimization tasks, where each task is described in natural language. These tasks require LLM to iteratively
generate and evaluate solutions through interactive prompting, where each optimization step involves generating new
solutions based on past results and then pass to subsequent iterations. We demonstrate that LLMs can perform various
optimization algorithms and act as effective black-box optimizers, capable of intelligently optimizing unknown functions. We
also introduce three simple yet informative metrics to evaluate optimization performance, applicable across diverse tasks and
less sensitive to test sample variations. Our findings reveal that LLMs excel at optimizing small-scale problems with limited
data and their performance is significantly affected by the dimension of problem and values, highlighting the need for further
research in LLM optimization.
Keywords
LLM Reasoning, LLM Optimization
1. Introduction optimization or combinatorial optimization problems. In-
stead, our goal is to showcase the potential of LLM in
Large Language Models have demonstrated exceptional these optimization domains and find out limitations in
capabilities in reasoning across a variety of natural these settings.
language-based tasks [1]. However, their potential Our contributions are summarized as follows:
extends beyond multiple-choice questions or single-
question answering. This work explores LLMs’ effective- • Exploring the potential of LLMs in mathematical
ness in optimization across diverse tasks and problem and combinatorial optimization scenarios.
dimensions. Optimization involves iteratively generat- • Introduce three novel metrics for assessing LLM
ing and evaluating solutions to improve a given objective performance in optimization tasks.
function. Our research assesses LLM performance in in- • Delve into factors that influence LLM perfor-
teractive optimization, where each step generates new mance using our metrics, with a particular em-
solutions based on previous ones and their values. phasis on the impact of problem dimension and
We conduct our study with four different types of op- task type.
timization algorithms: Gradient Descent, Hill Climbing,
Grid Search, and Black Box Optimization. To provide a The remainder of this paper is structured as follows.
comprehensive evaluation of LLM performance, we in- In Section 2, we present preliminary works on LLMs
troduce three distinct metrics. These metrics provide a for addressing optimization challenges. In Section 3, we
multifaceted view of task performance and are applicable defined 4 optimization algorithms in the case studies.
across a broad spectrum of optimization tasks, reducing In Section 4, we demonstrate that LLMs with iterative
sensitivity to sample variations. prompting strategy function as optimizers. In Section 5,
Our findings suggest that LLMs show impressive opti- we present three metrics that we have designed to as-
mization capabilities, especially in small-scale problems. sess the overall performance of LLMs in undertaking
However, their performance is notably affected by factors optimization tasks. Section 6, details our experimental
like sample size and value range. These observations un- results, showcasing the effectiveness of using LLMs as
derscore the need for further research within the domain optimizers. In Section 7, we consolidated noteworthy ob-
of optimization tasks tailored for LLMs. It’s important to servations and points of discussion from the experiments.
note that our work does not aim to outperform state-of- Finally, Section 8 summarizes and concludes the paper.
the-art optimization algorithms for either mathematical
2. Related Works
KiL’24: Workshop on Knowledge-infused Learning co-located with
30th ACM KDD Conference, August 26, 2024, Barcelona, Spain In various optimization scenarios, the utilization of Large
∗
Both authors contributed equally.
Language Models (LLMs) has become indispensable for
Envelope-Open r12922217@csie.ntu.edu.tw (P. Guo); r12922044@csie.ntu.edu.tw
(Y. Chen); f08946007@csie.ntu.edu.tw (Y. Tsai); the development of optimization algorithms or agent
sdlin@csie.ntu.edu.tw (S. Lin) systems capable of handling complex and informative
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). text-based feedback. In this section, we summarize three
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
significant related works that leverage LLMs to tackle op- explore nearby solutions by making small incremental
timization and reinforcement learning challenges. These changes. In our task, neighboring solutions are gener-
works showcase the adaptability and effectiveness of ated by selecting a specific element within the solution
LLMs in addressing optimization and learning challenges and either increasing or decreasing it by one each time.
across various domains. Subsequently, the neighbor solution with the minimum
Optimization by PROmpting (OPRO) [2] OPRO loss is chosen as the new solution and passed to the next
harnesses LLMs as versatile optimizers by describing iteration.
optimization tasks in natural language prompts. It it- Grid-Search assesses the LLM’s ability to conduct
eratively generates and evaluates solutions from these exhaustive searches and locate optimal solutions within
prompts, demonstrating superior performance on tasks a predefined search space. LLMs are tasked with gen-
like linear regression and traveling salesman problems. erating all grid points and systematically searching for
OPRO outperforms human-designed prompts by up to the point that results in the lowest loss according to the
50% on challenging tasks. given loss function.
Reflexion [3] Reflexion introduces a novel framework Black-Box Optimization evaluates the LLM’s ability
for training language agents that rely on linguistic feed- to make informed decisions and optimize in an abstract
back rather than traditional reinforcement learning. This problem-solving context. We treat the LLMs as black
framework delivers outstanding results, boasting a re- boxes that try to fit an unknown loss function. We pro-
markable 91% pass@1 accuracy on coding tasks—an ex- vide the LLM with a limited set of solutions, each paired
ceptional 11% improvement over previous state-of-the- with its respective true loss value. The LLM’s objective
art models. Reflexion’s success underscores the potential is to discover new solutions that have lower losses than
of linguistic feedback as a powerful training mechanism. the existing solutions in each iteration by themselves.
EvoPrompt [4] EvoPrompt automates prompt op-
timization by connecting LLMs with evolutionary al-
gorithms. This automated process surpasses human- 4. Methodologies
designed prompts by up to 25% and outperforms existing
In this section, we show how LLMs, guided by itera-
automatic prompt generation methods by an impressive
tive prompting, can effectively function as optimizers,
14%. EvoPrompt’s success highlights the relationship be-
akin to various optimization algorithms. To systemati-
tween Large Language Models and traditional algorithms,
cally navigate the search space, we introduce an iterative
showcasing the potential for enhanced problem-solving
prompting framework that enables LLMs to incremen-
capabilities through this synergistic fusion.
tally achieve better solutions within the search space
through iterative processes.
3. Problem setting We applied Chain of Thoughts and iterative prompting
as our prompting method. LLM will accomplish each step
We design four optimization tasks that require the model with reasoning thoughts as intermediate outputs. In each
to algorithmically search for the optimal value of pa- of these tasks (optimization algorithm), LLMs are initially
rameters. These tasks encompass Gradient-Descent, required to formulate the loss function based on given
Hill-Climbing, Grid-Search, and Black-Box Optimiza- samples. Then each optimization iteration is composed of
tion, each representing unique optimization domains: two steps: (1) Generates new solution based on algorithm
gradient-based, meta-heuristics, decision-theoretic, and instructions and past search results (2) Calculate loss of
Bayesian. In terms of parameter types, Grid-Search new solution and add the results to the prompt of the
and Hill-Climbing involve discrete search spaces, while next iteration. We keep repeating the two steps until the
Gradient-Descent and Black-Box Optimization tackle stop criteria are met. Figure 1 shows an overview of how
continuous search spaces. Following is detailed infor- LLM performs optimization in interactive settings.
mation on each optimization task. To create an interactive environment, we utilize the
Gradient-Descent assesses the model’s proficiency chat mode of GPTs, where the entire conversation history
in advanced calculations and its grasp of the principles of serves as the prompt. This allows LLMs to retain memory
gradient descent. We instruct LLMs to undertake a con- of past search results and reasoning paths. New instruc-
ventional gradient descent optimization process based tions are appended to ongoing conversation records with
on the loss function they have defined. LLMs need to each iteration. If the dialogue surpasses the token limit,
compute the gradient and update the parameters using earlier portions are removed.
the gradient information and the learning rate given.
Hill-Climbing evaluate the LLM’s capability to ad-
here to custom predefined rules they have not seen be-
fore. LLMs start with an initial solution and iteratively
self-improvement, which is measured by 𝑔𝑜𝑎𝑙 𝑚𝑒𝑡𝑟𝑖𝑐, it is
also crucial to appraise the LLMs’ capability to operate
in a manner consistent with our truth model algorithm.
This metric serves as an indicator of the LLM’s adeptness
in adhering to task-specific instructions. We define the
𝑝𝑜𝑙𝑖𝑐𝑦 𝑚𝑒𝑡𝑟𝑖𝑐 of a test sample 𝑗 as :
𝑁 𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖 − 𝑙𝑜𝑠𝑠𝑡𝑟𝑢𝑡ℎ
1
𝑃𝑗 = ∑ (2)
𝑁 𝑖=1 𝑙𝑜𝑠𝑠𝑡𝑟𝑢𝑡ℎ
where 𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖 is the LLM output loss of trial 𝑖, 𝑙𝑜𝑠𝑠𝑡𝑟𝑢𝑡ℎ is
the ground truth of sample 𝑗 and 𝑁 is the number of trials.
Since the policy metric measures the disparity between
Figure 1: Overview of our prompting strategy. (1) LLMs the ground truth and the LLM’s output, a lower policy
formulate the loss function based on given samples. (2) Given
metric value indicates a more effective alignment of the
algorithm instructions and past results, LLM generates a new
solution. (3) Calculate the loss of the new solution and add LLM’s actions with the prescribed guidelines. When
the solution-score pairs to the prompt of the next iteration. the value is negative, it means that LLM’s performance
(4) Repeat the second and third steps until stop criteria are surpasses the ground truth.
met.
5.3. Uncertainty Metric
Uncertainty metric quantifies the variability in the LLM’s
5. Evaluation solutions under identical conditions. Stability is a crucial
We devised three novel metrics for the comprehensive characteristic in optimization tasks. We hope that the
evaluation of LLM capabilities. In this section, we will LLMs produce identical results in every trial involving the
explain the design and objective of each metric. These same sample, even under conditions with temperatures
metrics offer versatility in assessing LLM performance greater than zero. We define the 𝑢𝑛𝑐𝑒𝑟𝑡𝑎𝑖𝑛𝑡𝑦 𝑚𝑒𝑡𝑟𝑖𝑐 of a
across diverse tasks, making concurrent evaluation easier. test sample 𝑗 as :
Their reliance on ratio measures, rather than differences, 𝑁
makes them less sensitive to sample variations. 1
𝑈𝑗 = ∑(𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖 − 𝑙𝑜𝑠𝑠𝐿𝐿𝑀 )2 (3)
𝑁 𝑖=1
5.1. Goal Metric where 𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖 is the LLM output of the i-th trial, 𝑙𝑜𝑠𝑠𝐿𝐿𝑀
Goal metric evaluates how effectively LLMs perform op- is the mean of the trial outputs and 𝑁 is the number of
timization. It provides a quantitative measure of the trials. A stable LLM can be more trusted for tasks that
degree to which the LLM contributes to minimizing the demand consistent and reproducible results. In our case,
loss function values. In other words, ensuring that the if the language model truly understands the context of
ultimate solution loss is lower than the initial solution. problems, the final optimal output should be identical in
We define the 𝑔𝑜𝑎𝑙 𝑚𝑒𝑡𝑟𝑖𝑐 of a test sample 𝑗 as : every trial of the same sample.
𝑁 𝑙𝑜𝑠𝑠
1 𝐿𝐿𝑀,𝑖𝑛𝑖𝑡 − 𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖
𝐺𝑗 = ∑ (1) 6. Experiments
𝑁 𝑖=1 𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖𝑛𝑖𝑡
This section provides details of our experimental configu-
where 𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖𝑛𝑖𝑡 is the initial solution loss of sample 𝑗, rations and highlights the outcomes of experiments. Sub-
𝑙𝑜𝑠𝑠𝐿𝐿𝑀,𝑖 is the LLM output loss of trial 𝑖, and 𝑁 is the section 6.1 outlines the process of generating synthetic
number of trials per sample. The higher the metric value, datasets for all optimization tasks, while subsection 6.2
the greater the progress in optimization. The goal metric elucidates the detailed settings of our experiment. Lastly,
plays a crucial role in our evaluation framework, particu- subsection 6.3 offers a concise summary of the outcomes
larly in scenarios where ground truth is absent, such as derived from our experiment.
the Black-Box optimization scenarios.
6.1. Dataset
5.2. Policy Metric
In the experiment, we create five datasets with 𝑑 val-
Policy metric assesses the degree of alignment between ues chosen from the set {3, 6, 12, 24, 48} and generate in-
the final model output and the ground truth. Beyond stances belonging to [0, 10]𝑑 in each dataset to examine
sensitivity to the number of parameters, representing may pose greater difficulty for LLMs compared to other
the dimension of the optimization problem. For instance, tasks.
𝑑 = 3 indicates that there are 3 variables in the loss func-
tion and the dimension of this optimization problem is 3.
We then apply each instance to a loss function and find
the true solution for each parameter search task. These
authenticated solutions, coupled with their associated
losses, not only serve as the ground truth for the tasks
but also act as a pivotal benchmark against which the
solutions derived by LLMs are systematically evaluated
and compared in the ensuing analysis.
6.2. Detailed Settings
In our experiment, We set the LLM temperature to 0.8 and
the reset as default. We performed 5 repetitions of the test
for each instance in the dataset, with the LLM conducting
10 iterations of parameter search in each repetition. We
excluded excessively biased results to prevent our metrics
from being skewed by a minority of poorly performing
test outcomes. All experiments employ the GPT-turbo-
3.5 ’0613’ version as the Language Model.
6.3. Main Results
We summarize the outcomes of our experiment and subse-
quently examine the common trends observed across all
experiments. In every plot, the x-axis displays the dimen-
sion of the optimization problem. In the case of the goal
metric and policy metric plots, the y-axis illustrates the
average metric value for the respective tasks, while the
shaded area in a lighter color delineates the confidence
interval of the metric, denoted as [𝑣𝑎𝑙𝑢𝑒 − 𝑠𝑡𝑑, 𝑣𝑎𝑙𝑢𝑒 + 𝑠𝑡𝑑]. Figure 2: Goal Metric and Policy Metric hover from positive
As for the uncertainty metric plot, the y-axis showcases to near zero, signifying substantial optimization capability
and alignment between LLM’s output and ground truth.
the uncertainty metric value, which corresponds to the
standard deviation of the LLM final solution loss. It is
worth noting that the Goal Metric graph excludes the LLMs show potential as Black-Box Optimizers.
non-iterative Grid-Search task due to its non-iterative na- Favorable performance in Black-Box experiments sug-
ture, while the Policy Metric graph omits the Black-Box gests the use of LLM as an optimizer without giving any
task due to unattainable ground truth. algorithm instructions. From figure 3, we can see that
LLMs show strong optimization capabilities in GPT-turbo-3.5 performs notably when the dimension of
small-scale problems. Our experiments test the com- the problem is three, whereas GPT-4 excels when the
prehensive optimization capabilities of LLMs. Observing dimensions are three and six. Interestingly, as the dimen-
figure 2, GPT-turbo-3.5 showcases considerable optimiza- sion increases, the performance of both models gradually
tion capabilities across various scenarios. Impressively, diminishes. Eventually, GPT-4 edged out GPT-turbo-3.5
in the Gradient-Descent task, GPT-turbo-3.5 even sur- by a slight margin in optimization and stability.
passes the ground truth, particularly in the case of the
sample dimension equal to six. It’s also surprising that LLMs exhibit strong performance in Gradient-
the model achieves respectable results in the Grid-Search Descent. Gradient-Descent experiment tests the model’s
task, considering it must compute a vast number of grid proficiency in advanced calculations and grasp of mathe-
points, which increase exponentially as the dimension of matics principles. Figure 4 underscores this by revealing
the problem expands. The model faces challenges in the a policy metric that consistently hovers near zero, signi-
Hill-Climbing task, evident from a policy metric signifi- fying a remarkable alignment between the LLM’s output
cantly exceeding zero. This suggests that meta-heuristics and the ground truth. Despite a decline in the goal met-
ric as the sample size increases, the consistently low and
Figure 3: Goal Metric reflects the performance of LLMs as Figure 4: Low values in the Policy Metric and high positive
Black-Box optimizer, showing strong performance with in- values in the Goal Metric indicate the robust performance of
stances of smaller dimensions. the LLM in the gradient descent task.
stable value of the policy metric underscores the fact that
previous research [5, 6, 7] showing that language models
GPT’s performance in the gradient-descent task is nearly often prioritize their prior knowledge over new context.
on par with the truth model. Achieving balanced attention to both prior and context
knowledge is essential for further research to improve
the optimization capability of language models.
7. Analysis and Discussion LLMs are potential hybrid optimizers. The pre-
In this section, we consolidate several crucial insights dominantly positive goal metric values across most tasks
derived from our experimental results and subject them and datasets indicate LLMs’ capability for optimization.
to analysis. This highlights their versatile capacity to optimize across
Pretrained Knowledge dominates the optimiza- different problem spaces, potentially allowing for the
tion capability of LLM. Among all optimization tasks switching between optimization methods within a single
performed by LLMs, Gradient Descent emerges as the task. Such switching can help LLMs better explore the
leading performer, while Hill-Climbing poses greater solution space and escape local optima where they might
challenges. The main difference between the two tasks get stuck. This is a significant advantage of LLMs in op-
is that Hill-Climbing is a heuristic algorithm with more timization, as they can easily change methods through a
user-specific parameters, whereas gradient descent is an simple natural language prompt during iterations. Fur-
optimization algorithm that relies more on mathematical thermore, LLMs can act as agents (world models) that use
principles. This suggests that LLM optimization capabil- different algorithms as tools (actions), switching meth-
ities primarily stem from pretrained knowledge stored ods by evaluating the optimization path from past to
within the model parameters, rather than from context present (state). This adaptability underscores the poten-
knowledge provided by users. Our findings align with tial of LLMs to enhance optimization processes through
dynamic method selection and strategic problem-solving. of test samples. Previous research has indicated that
LLMs possess richer solution space in small-scale LLMs may demonstrate preferences for particular num-
problems. In our experiments, we observed high uncer- bers, words, and symbols [8], which can introduce a level
tainty metric values and significant variations in policy of bias in their responses. Given the high sensitivity of
and goal metrics when samples had smaller dimensions. LLMs to the input prompt, the initial starting points and
Interestingly, LLMs tend to perform more effectively with data provided can exert a significant influence on their
smaller dimension instances, suggesting a correlation be- outputs. In essence, the impact of instruction descrip-
tween higher uncertainty and better performance. This tion and data initialization should be carefully considered
consistent pattern across various tasks and models in- when interpreting the results of LLM-based experiments
dicates that LLMs have a richer solution space when to ensure a more accurate assessment of their perfor-
tackling small-scale problems. The expanded solution mance.
space leads to higher uncertainty, providing LLMs with a Self-consistency prompting improves stability. In
broader range of solutions to explore. This highlights the the Gradient-Descent task, we employ self-consistency
importance of dimension reduction in data preprocessing technique [9], where we conduct five repetitions for each
for effective optimization by LLMs. Figure 2 and 5 both iteration and select the solution that emerges most fre-
highlight the pattern of uncertainty, where the uncer- quently. From Figure 6, we can see that GPT-4 perfor-
tainty initially rises and then gradually decreases. mances increase largely, and the confidence interval for
both the policy-metric and goal-metric narrows, indicat-
ing improved stability and reliability. Nonetheless, this
approach does not yield favorable outcomes when ap-
plied to GPT-turbo-3.5. This suggests the need for further
investigation within the realm of variance reduction.
8. Conclusion and Future
Directions
In this paper, we present our in-depth examination of
assessing Large Language Models within the realm of
optimization, where LLM progressively generates new so-
lutions to optimize an objective function. We investigate
LLMs’ performance across four optimization tasks that
necessitate their comprehension of algorithmic instruc-
tions and their ability to generate new solutions based
on previous solutions and their corresponding values.
Our evaluation shows that LLMs showcase optimiza-
tion prowess across diverse domains. Among the four
tasks we examined, LLMs exhibit their greatest strengths
in the Gradient-Descent task, displaying remarkable pro-
ficiency in this area. However, they encounter more pro-
nounced difficulties in the meta-heuristics task, where
they must adhere to predefined rules that they have not
encountered previously. Furthermore, LLMs demonstrate
impressive skills in the grid search task, showcasing their
ability to conduct exhaustive searches effectively. In the
Black-Box task, LLMs excel, particularly when dealing
with limited sample sizes, suggesting inherent optimiza-
Figure 5: An initial rise followed by a decline in the Uncer- tion abilities within them.
tainty Metric with instance dimension growth suggests LLMs We also consolidate several crucial insights derived
may have a richer sample space for small-scale problems, con- from our experimental results and subject them to anal-
sistent across tasks and models.
ysis. We find that pretrained knowledge dominates the
optimization capability of LLMs, while they also possess
LLMs are sensitive to numerical values. It’s worth a richer solution space in small-scale problems. Further-
considering that the aforementioned results may be in- more, we elaborate on the potential of LLMs as hybrid
fluenced by the inherent randomness in the generation optimizers. These insights and analyses unveil a host of
[5] H.-T. Chen, M. Zhang, E. Choi, Rich knowledge
sources bring complex knowledge conflicts: Recal-
ibrating models to reflect conflicting evidence, in:
Proceedings of the 2022 Conference on Empirical
Methods in Natural Language Processing, pages
2292–2307, Abu Dhabi, United Arab Emirates, Asso-
ciation for Computational Linguistics., 2022.
[6] A. Pagnoni, V. Balachandran, Y. Tsvetkov, Under-
standing factuality in abstractive summarization
with frank: A benchmark for factuality metrics, in:
Proceedings of the 2021 Conference of the North
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies,
pages 4812–4829, Online. Association for Computa-
tional Linguistics, 2021.
[7] W. Zhou, S. Zhang, H. Poon, M. Chen, Context-
faithful prompting for large language models, in:
ArXiv, abs/2303.11315, 2023.
[8] A. Renda, A. Hopkins, M. Carbin, Can llms generate
random numbers? evaluatingllm sampling in con-
trolled domains, in: ICML 2023 Workshop: Sampling
and Optimization in Discrete Space, 2023.
[9] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi,
D. Zhou, Self-consistency improves chain of thought
reasoning in language models, in: arXiv preprint
arXiv:2203.11171, 2022.
Figure 6: The confidence intervals for both the policy and
goal metrics of GPT-4 narrow, indicating improved stability.
A negative policy metric with a high goal metric signifies
significant outperformance of the ground truth model with
six-dimensional instances.
unresolved questions that warrant further research.
References
[1] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. Chi,
Q. Le, D. Zhou, Chain of thought prompting elic-
its reasoning in large language models, in: arXiv
preprint arXiv:2201.11903, 2022.
[2] C. Yang, X. Wang, Y. Lu, H. Liu, Q. V. Le, D. Zhou,
X. Chen, Large language models as optimizers, in:
arXiv preprint arXiv:2309.03409, 2023.
[3] N. Shinn, F. Cassano, B. Labash, A. Gopinath,
K. Narasimhan, S. Yao, Reflexion: Language
agents with verbal reinforcement learning, in:
arXiv:2303.11366, 2023.
[4] Q. Guo, R. Wang, J. Guo, B. Li, K. Song, X. Tan, G. Liu,
J. Bian, Y. Yang, Connecting large language mod-
els with evolutionary algorithms yields powerful
prompt optimizers, in: arXiv:2309.08532, 2023.
A. Prompt Templates
User Prompt:
Q:
Given the data points (y1, y2, ...) = {data}, what is the MSE loss function with respect to the ŷs for a hypothetical set of
predicted ŷs values?
A:
The MSE loss function for the given data points (y1, y2, ...) = {data} with respect to ŷs is:...
Figure 7: Example prompt for getting objective function.
User Prompt:
Q:
Please minimize the loss function using gradient descent with learning rate of 0.1 at point (ŷ1, ŷ2, ŷ3, .....) = {point}. What is
the point we eventually end up after one update? Your answer includes two parts an explanation with calculation and a short
answer of result.
A:
Explanation : Lets think step by step ...
Short Answer: After calculation, the next update point is (ŷ1𝑛𝑒𝑤 , ŷ2𝑛𝑒𝑤 , ŷ3𝑛𝑒𝑤 , .....) = ...
Figure 8: Example prompt for Gradient-Descent.
User Prompt :
Q:
I want to do grid search on the ŷs and the range of them are the integers of {low_bound} to {high_bound}. Generate all
possible combinations of ŷs values from the specify range.
What are the combinations? Your answer includes two parts an explanation with calculation and a list containing all the
combinations.
A:
Explanation : Lets think step by step ...
List : [write all the combinations here]
Figure 9: Example prompt for Grid-Search (Create Grid Points)
User Prompt :
Q:
For every combinations of ŷs, calculate its MSE loss. Which combination has the smallest MSE loss? Your answer includes two
parts an explanation with calculation and a list containing the combination with the smallest MSE loss.
A:
Explanation : Lets think step by step
List : [write the combination with smallest MSE loss]
Figure 10: Example prompt for Grid-Search (Select)
User Prompt :
Q:
I want to minimize the loss function using hill climbing. Generate neighboring solutions by either adding 1 or minus 1 to a
specific element in the current solution. The current solution is solution. Your answer includes two parts an explanation with
calculation and a list containing all neighbor solutions(eg. [(ŷ1, ŷ2,....), (ŷ1, ŷ2,....), ...]).
A:
Explanation : Let’s think step by step ...
List : [write neighbor solutions here]
Figure 11: Example prompt for Hill-Climbing Prompt (Generate Neighbors)
User Prompt :
Q:
For every neighbor solution, calculate its MSE loss. Which neighbor has the smallest MSE loss? Your answer includes two parts
an explanation with calculation and a list containing the smallest MSE loss neighbor solution(eg. [(ŷ1, ŷ2,....)]).
A:
Explanation : Lets think step by step ...
List : [write best neighbor solution here]
Figure 12: Example prompt for Hill-Climbing Prompt (Select)
User Prompt :
Q:
You want to minimize an unknown MSE loss function by guessing the values of the ŷs. When you guess, you should take
consider of the past guessing result so that your new guess will have smaller loss than the past results. Pass guessing result are
{pass_result}. Base on the previous guesses, what is your next guess?
A:
(ŷ1, ŷ2,....) = [your answer]
Figure 13: Example prompt for Black-Box
User Prompt :
*Define Loss function
User :
Given the data points (y1, y2, y3) = (2,6,0), what is the MSE loss function with respect to the ŷs for a hypothetical set of
predicted ŷs values?
LLM :
The MSE loss function for the given data points (y1, y2, y3) = (2,6,0) with respect to ŷs is: 13 [(𝑦1̂ − 2)2 + (𝑦2̂ − 6)2 + (𝑦3̂ − 0)2 ]
*First Iteration
User :
Please minimize the loss function using gradient descent with learning rate of 0.1 at point (ŷ1, ŷ2, ŷ3) = (2,3,4). What is the
point we eventually end up after one update? Your answer includes two parts an explanation with calculation and a short
answer of result.
LLM :
Explanation : To minimize the Mean Squared Error (MSE) loss function using gradient descent ...
Short Answer: After calculation, the next update point is (ŷ1𝑛𝑒𝑤 , ŷ2𝑛𝑒𝑤 , ŷ3𝑛𝑒𝑤 ) = (1.92, 3.05, 3.94)
*Second Iteration
User :
Please minimize the loss function using gradient descent with learning rate of 0.1 at point (ŷ1, ŷ2, ŷ3) = ((1.92, 3.05, 3.94). What
is the point we eventually end up after one update? Your answer includes two parts an explanation with calculation and a short
answer of result.
LLM :
Explanation : To minimize the Mean Squared Error (MSE) loss function using gradient descent ...
Short Answer: After calculation, the next update point is (ŷ1𝑛𝑒𝑤 , ŷ2𝑛𝑒𝑤 , ŷ3𝑛𝑒𝑤 ) = (1.81, 3.12, 3.41)...
Figure 14: Iterative prompt for Gradient-Descent
User Prompt :
*Define Loss function
User :
Given the data points (y1, y2, y3) = (2,6,0), what is the MSE loss function with respect to the ŷs for a hypothetical set of
predicted ŷs values?
LLM :
The MSE loss function for the given data points (y1, y2, y3) = (2,6,0) with respect to ŷs is: 13 [(𝑦1̂ − 2)2 + (𝑦2̂ − 6)2 + (𝑦3̂ − 0)2 ]
*Create Grid Points
User :
I want to do grid search on the ŷs and the range of them are the integers of 2 to 3. Generate all possible combinations of ŷs
values from the specify range. What are the combinations? Your answer includes two parts an explanation with calculation and
a list containing all the combinations.
LLM :
Explanation : To generate all the grid points, you should ...
List : [(2, 2, 2), (2, 2, 3), (2, 3, 2), (2, 3, 3), (3, 2, 2), (3, 2, 3), (3, 3, 2), (3, 3, 3)]
*Select Best
User :
For every combinations of ŷs, calculate its MSE loss. Which combination has the smallest MSE loss? Your answer includes two
parts an explanation with calculation and a list containing the combination with the smallest MSE loss.
LLM :
Explanation : To select the grid points with smallest MSE value ...
List : [(2, 3, 3)]...
Figure 15: Iterative prompt for Grid-Search
User Prompt :
*Define Loss function
User :
Given the data points (y1, y2, y3) = (2,6,0), what is the MSE loss function with respect to the ŷs for a hypothetical set of
predicted ŷs values?
LLM :
The MSE loss function for the given data points (y1, y2, y3) = (2,6,0) with respect to ŷs is: 13 [(𝑦1̂ − 2)2 + (𝑦2̂ − 6)2 + (𝑦3̂ − 0)2 ]
*Generate Neighbors
User :
I want to minimize the loss function using hill climbing. Generate neighboring solutions by either adding 1 or minus 1 to a
specific element in the current solution. The current solution is (10,10,10). Your answer includes two parts an explanation with
calculation and a list containing all neighbor solutions(eg. [(ŷ1, ŷ2,....), (ŷ1, ŷ2,....), ...]).
LLM :
Explanation : To generate all the neighbors ...
List : [(11, 10, 10), (10, 11, 10), (10, 10, 11), (9, 10, 10), (10, 9, 10), (10, 10, 9)].
*Select Best
User :
For every neighbor solution, calculate its MSE loss. Which neighbor has the smallest MSE loss? Your answer includes two parts
an explanation with calculation and a list containing the smallest MSE loss neighbor solution(eg. [(ŷ1, ŷ2,....)]).
LLM :
Explanation : To select the neighbor solution with smallest MSE value ...
List : [(9, 10, 10)]
Figure 16: Iterative prompt for Hill-Climbing
User Prompt :
*Define Loss function
User :
Given the data points (y1, y2, y3) = (2,6,0), what is the MSE loss function with respect to the ŷs for a hypothetical set of
predicted ŷs values?
LLM :
The MSE loss function for the given data points (y1, y2, y3) = (2,6,0) with respect to ŷs is: 13 [(𝑦1̂ − 2)2 + (𝑦2̂ − 6)2 + (𝑦3̂ − 0)2 ]
First Iteration
User :
You want to minimize an unknown MSE loss function by guessing the values of the ŷs. When you guess, you should take
consider of the past guessing result so that your new guess will have smaller loss than the past results. Pass guessing result are
𝑓 (2, 4, 6) = 10, 𝑓 (1, 5, 0) = 2. Base on the previous guesses, what is your next guess?
LLM :
(ŷ1, ŷ2, ŷ3) = [1, 3, 5]
Second Iteration
User :
You want to minimize an unknown MSE loss function by guessing the values of the ŷs. When you guess, you should take
consider of the past guessing result so that your new guess will have smaller loss than the past results. Pass guessing result are
𝑓 (2, 4, 6) = 10, 𝑓 (1, 5, 0) = 2, 𝑓 (1, 3, 5) = 5. Base on the previous guesses, what is your next guess?
LLM :
(ŷ1, ŷ2, ŷ3) = [1, 1, 1]
Figure 17: Iterative prompt for Black-Box Optimization