=Paper=
{{Paper
|id=Vol-3812/paper13
|storemode=property
|title=Exploiting Large Language Models for the Automated Generation of Constraint Satisfaction Problems
|pdfUrl=https://ceur-ws.org/Vol-3812/paper13.pdf
|volume=Vol-3812
|authors=Lothar Hotz,Christian Bähnisch,Sebastian Lubos,Alexander Felfernig,Albert Haag,Johannes Twiefel
|dblpUrl=https://dblp.org/rec/conf/confws/HotzBLFHT24
}}
==Exploiting Large Language Models for the Automated Generation of Constraint Satisfaction Problems==
Exploiting Large Language Models for the Automated
Generation of Constraint Satisfaction Problems
Lothar Hotz1,*,† , Christian Bähnisch1,† , Sebastian Lubos2 , Alexander Felfernig2 , Albert Haag3
and Johannes Twiefel4
1
Hamburger Informatik Technologie-Center e.V., Hamburg, Germany
2
Graz University of Technology, Graz, Austria
3
Product Management Haag GmbH, Bad Dürkheim, Germany
4
exXxa GmbH, Hamburg, Germany
Abstract
Constraint Satisfaction Problems (CSPs) are a core technology that solves many real-world problems, especially for configu-
ration tasks. A key success factor in this context is an efficient knowledge acquisition process where domain experts and
knowledge engineers (developers of CSPs) should develop an agreement on the correctness of the expanding knowledge base
as soon as possible. In this paper, we show how large language models (LLMs) can be applied to the automated generation of
solutions for constraint satisfaction problems thus reducing overheads related to CSP development and maintenance in the
future.
Keywords
Constraint Satisfaction Problems, Large Language Models, Knowledge Acquisition, Automated Generation
1. Introduction ing legacy code [4]. This is achieved through general
LLMs such as ChatGPT1 or specific ones like for cod-
Knowledge acquisition for knowledge-based systems, es- ing trained LLMs such as CoPilot2 , or software agents
pecially constraint-based systems, is a complex task. It like CREWAI3 and AutoGenStudio4 , or software assisting
includes the formalization of partly tangible knowledge engineers such as Devine AI5 or its open source corre-
with a knowledge model, such as a configuration, con- spondent Devika6 .
straint, or feature model [1, 2]. Constrained-based sys- An example of developing a small game with a graph-
tems, in particular, are often used to implement configu- ical user interface is GPT-Engineer7 , which prompts in
ration systems, due to their ability to compute possible natural languages and provides the game logic and the
values for configurations or even directly configurations graphical interface through HTML and CSS coding. Es-
themselves [3]. Hence, modeling a configuration prob- pecially the combination of an LLM with a compiler or
lem as a constraint satisfaction problem (CSP) is a typical interpreter in a chain leads to a self-evaluation where
approach for computing configurations. This type of a code, which is generated code by an LLM, is directly
modeling is similar to developing programs in a high- checked through the interpreter, and the resulting er-
level programming language but also incorporates logical ror message is taken as the next prompt for the LLM.
semantic elements. Through iterating this (e.g., with a supporting tool such
Currently in software engineering, the use of large as LangCHAIN8 ) a syntactically correct program is cre-
language models (LLMs) is exploited to support program- ated. However, broader tasks, such as reviewing a soft-
mers in their daily tasks, such as coding, code comple- ware architecture or refactoring code distributed over
tion, reviewing code, API programming, generating test multiple files are hindered by the limited size of a prompt
cases, documentation, or identifying design patterns as (the context window) to a given number of tokens such
well as learning programming languages or understand- as 128, 000 input tokens for GPT-4. In summary, apply-
ing LLMs to software development still has to cope with
ConfWS’24: 26th International Workshop on Configuration, Sep 2–3, issues such as limited context windows, maintainability
2024, Girona, Spain
*
Corresponding author. 1
† Chat Generative Pre-Trained Transformer
These authors contributed equally. 2
https://github.com/features/copilot. All URLs accessed in May 2024
$ lothar.hotz@uni-hamburg.de (L. Hotz); 3
https://www.crewai.com/
christian.baehnisch@uni-hamburg.de (C. Bähnisch); 4
https://microsoft.github.io/autogen/blog/2023/12/01/
slubos@ist.tugraz.at (S. Lubos); afelfern@ist.tugraz.at AutoGenStudio
(A. Felfernig); albert@product-management-haag.de (A. Haag); 5
https://preview.devin.ai/
twiefel@exxxa.ai (J. Twiefel) 6
https://github.com/stitionai/devika
0000-0001-7370-7726 (L. Hotz) 7
https://github.com/gpt-engineer-org/gpt-engineer
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 8
Attribution 4.0 International (CC BY 4.0). https://python.langchain.com/docs/get_started/introduction
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
of AI-created code, and applicability in project planning then reuse this general prompt for specific knowledge
or software architecture. engineering tasks at hand.
Inspired by these approaches, for constraint satisfac- The remainder of this paper is organized as follows.
tion problems the following questions arise: Section 2 provides an overview of the state-of-the-art in
LLM-based knowledge modeling. In Section 3, we sketch
• Which knowledge acquisition and engineering our proposed LLM-based CSP generation approach. In
tasks can be supported by LLMs? Section 4, we provide details regarding the LLM prompt-
• Can knowledge models be generated by LLMs? ing approach based on different example CSP tasks. A
How is quality ensured? discussion (Section 5) and a conclusion (Section 6) ends
• Which alternative implementations of a knowl- the paper.
edge model are the most comprehensible or main-
tainable?
2. State of the Art in LLMs and
However, a main ingredient of LLMs is of course a
prompt that triggers the generation of text, here of knowl- Knowledge Modeling
edge models. Hence, for a knowledge model, the ques-
Many real-world problems can be interpreted as con-
tions are also:
straint satisfaction problems (CSPs). Knowledge engi-
• What type of prompts do we need? neers have the task of formalizing domain-specific con-
• What has to be presented to the LLM? straints into corresponding constraint-based represen-
tations. This formalization step is often effortful and
• Is natural language text describing the problem
more assistance and automation are required in the mod-
enough?
eling process which can be regarded as a major challenge
• Should examples of the formal modeling lan-
for constraint-based systems [5]. Due to recent develop-
guage, which shall be used for formalization, be
ments in the area of large language models (LLMs), we
given?
could envision agent-based approaches that support the
• Or the complete language specification? (semi-)automated generation of CSPs [6]. In the follow-
This paper explores the innovative use of LLMs to ing, we discuss related work on the application of LLMs
automatically generate constraint models in constraint to knowledge modeling.
programming languages like PyChoco9 . Constraint pro- The idea of exploiting synergy potentials of
gramming involves defining problems in terms of vari- knowledge-based systems and language models for
ables, related variable domains, and constraints that must knowledge-base generation has been proposed a.o. by
be satisfied. Popular problems in this domain include, e.g., Petroni et al. [7], Ding et al. [8], and Razniewski et al. [9].
the N-queens problem, Magic Square, Map Coloring, and Suchanek and Luu [10] motivate the integration of LLMs
the Traveling Salesperson Problem (TSP). To tackle such with data-driven and knowledge-based approaches to
problems, we propose a generalized approach (Section exploit the advantages of both worlds, specifically, to
3) where an LLM is leveraged to generate a constraint let data-driven approaches act as a basis for grounding
model based on a provided prompt. the LLM output in reality. In the work presented in our
We test our approach based on several constraint sat- paper, a simple form of grounding is the usage of test
isfaction problem examples (Section 4). In this paper, we cases to assure intended CSP semantics. A similar line of
mainly concentrate on known examples from the con- research is presented in Nayak and Timmapathini [11]
straint community given by well-known descriptions, where object and relation identification based on LLMs
not that much on new examples formulated in natural is investigated.
language. Although the task of starting with known prob- Ahmed and Choudhury [12] introduce the idea of ap-
lems is untypical for configuration problems, in our view, plying LLMs to the generation of optimization problems.
this step is necessary for exploring the possibilities of The motivation behind this is to make related problem
LLMs for knowledge modeling in general. This approach formulation tasks more accessible to domain experts, i.e.,
of using commonly known constraint and configuration to decrease the need for specialized mathematical knowl-
problems builds a baseline for further work. edge to make problem formulations feasible. Based on
With our proposed approach, we aim to streamline the a given dataset comprising different example problem
process of model creation in constraint programming, definitions (and solutions), LLMs are used to generate op-
making it faster and more accessible by harnessing the timization problem definitions for new problems defined
capabilities of LLMs. The main idea is to define a general in a textual fashion. A major difference compared to our
prompt once which describes the problem context and work is that model fine-tuning is primarily based on a
"human in the loop" approach whereas we focus more
9
https://pypi.org/project/pychoco/ on automated fine-tuning giving LLM feedback on the
syntactical (is the generated CSP correct or what were and the process repeats. This loop continues until the
the compilation issues?) and the semantic level with test syntax is correct.
cases. A related approach is the application of LLMs Once a syntactically correct model is produced, the
to support different types of strategic reasoning tasks, next step is to verify its semantics. This is done by run-
for example, in the context of economic simulations and ning a series of test cases specific to the problem domain.
game theory [13]. Successful completion of test cases indicates a correct
A very similar approach provides [14]. This ‘Program semantic implementation (at least concerning the test
of Thoughts’ named approach also combines an LLM cases).
with an external Python interpreter, however, in [14] for This approach, thus, executes an iterative prompting
computing numerical problems. Furthermore, the verifi- of an LLM, where only a human user provides the first
cation of the generated Python program is not discussed. prompt, while the results of a Python interpreter are
The adaptation of LLMs for logical reasoning tasks [15] automatically used by the LLM as successive prompts. A
can be performed either based on LLM fine-tuning where complete example of such an iterative dialog including
LLMs are trained (and adapted) for specific problem set- in-between generated answers of the LLM (”Assistant”)
tings [16] or based on (automated) prompt adaptation is presented in the Appendix Section 2.
[17]. Intending to improve the quality of code generation, This methodology can be further detailed through the
Pan et al. [15] follow the idea of using constraint (and following steps:
SAT) solver execution errors as feedback channels (in the Leveraging Existing Knowledge Problems: To cre-
prompting process) to increase the quality of code gen- ate useful constraint models, we use logical problems
eration. The feedback approach presented in our paper that are well understood and documented on the internet.
extends the work of Pan et al. [15] in terms of addition- Examples include classic problems such as the N-queens
ally providing test cases that help to improve the quality problem, Map Coloring, or the Traveling Salesperson
of the generated CSP also on the semantic level. Problem. These problems are selected due to their well-
In [18], various prompts are used to create source code known constraints and solution strategies.
with built-in variability. Specific prompts enable the gen- Integration with Constraint Solvers: Once the log-
eration of code in different programming languages, al- ical problems are defined, we connect these models to a
lowing for application configuration. However, a human constraint system, here PyChoco - other suitable solvers
developer would execute these prompts, no iterative pro- could be used. This involves mapping the logical prob-
cess ensures the correctness of the resulting code. lem’s constraints to the syntax and structures of the cho-
The automated generation of CSPs can also be re- sen constraint language and, thus, facilitating automated
garded as a specific type of ontology construction where solving.
concept hierarchies are derived based on different LLM Syntax Verification Loop: An iterative process is set
prompts representing concept hierarchy-specific queries up where the LLM generates a constraint model based
(see, for example, the work of Funk et al. [19]). An exam- on the initial prompt. This model is then verified for
ple of such a query is the following: What are all of the syntactic correctness using the constraint interpreter or
most important subcategories of the category A? compiler. If errors are identified, the prompt is automat-
ically adjusted by the LLM, by using an error message
as a next prompt and the LLM regenerates the model.
3. Methodology This loop continues until a syntactically correct model is
achieved.
Our approach for generating constraint models using
Semantic Verification with Test Cases: After
LLMs is a structured, iterative process designed to create
achieving a syntactically correct model, the focus shifts
syntactically and semantically (almost) correct models.
to ensure semantic correctness. This involves introduc-
This begins by defining a general LLM prompt that can
ing test cases specific to the defined problem. The LLM-
encompass the varying specifics of different constraint
generated model runs through these test cases to ver-
problems. For instance, the prompt can be tailored to a
ify that it not only adheres to syntactic rules but also
specific constraint problem but always solicits a PyChoco
correctly implements the problem’s logic. Iteration con-
interpreter that solves the problem.
tinues until the model passes all semantic verification
The process, furthermore, involves using an iterative
tests.
loop where the LLM generates a constraint model based
Generalizability: The ultimate goal of our approach
on the provided prompt. This generated model is then
is to establish a generalizable method for generating con-
verified through a constraint interpreter or compiler. If
straint models using LLMs. By systematically using an
the syntax of the model is incorrect, the result of the
iterative loop for both syntactic and semantic verification
Python interpreter call is automatically taken as a prompt
that employs examples and logical problem definitions,
for a next iteration and the model is adjusted by the LLM
we aim to create a robust framework that can be adapted
tool Python_REPL, description: A Python shell. Use this to run and verify your solution to the constraint problem.
Input should be a valid Python script using pychoco. The last line should be "verify_solutions(solutions, ...)" in order
to verify your solutions
System:
You are an agent designed to solve constraint problems, you write python code to solve and verify constraint problems.
You have access to a python REPL, which you can use to execute python code and verify your solution. If you get an
error, debug your code and try again. You only use the constraint solver pychoco. You conclude your python code
with a variable "solutions" and verify it with the method call "verify_solutions(solutions, ", where
is given to you as the first word of the instructions. "solutions" should be a list of all solutions to the
constraint problem. If verification fails, you correct your solution.
Use the following example as a template for solving constraint problems with pychoco:
User: ints10. find numbers one to 10.
Assistant:
‘‘‘python
from pychoco.model import Model
from test_tool import verify_solutions
model = Model()
x = model.intvar(1, 10, "x")
solver = model.get_solver()
solutions = [s.get_int_val(x) for s in solver.find_all_solutions()]
verify_solutions(solutions, "ints10")
‘‘‘
Figure 1: General prompt including an example for the LLM on how to call the verification method. The first three lines (the
tool description) are needed to avoid iterations related to wrongly calling the method verify_solutions (see Appendix
Section 7 for created errors without those lines).
to various constraint languages and problems. 3. a hint of the behavior of the tool (“The python
By following these steps, our approach leverages the REPL. . . “),
capabilities of LLMs to create accurate and efficient con- 4. a hint of how to use the tool (“Use this. . . “),
straint models, providing a scalable solution for different 5. to focus on that tool (“use only“).
problem types.
Prompting Semantic Verification: To verify, if a
4. Prompting Constraint Models semantic correct solution can be computed with the gen-
erated constraint model, we include a verification scheme
We divide prompts into a general part which is used that builds on tests. Those are specific for the constraint
with each constraint problem, i.e., the system prompt (see problem at hand. However, the general prompt is en-
Section 4.1), and a query part which points to a specific hanced with a call to the verification method, which is
problem (see Section 4.2). parameterized with the name of the constraint problem.
4.1. General Prompts The verification scheme is given by an example that ex-
plains the parameterization, a constraint model, and the
Prompting Syntactic Valid Models Figure 1 shows the call to the verification method (Figure 1).
general prompt which we developed for querying for Prompt Refinement: For more understanding, of
constraint solutions. The prompt provides the following why the LLM generates a certain code line in the model,
explanations to the LLM: we have enforced the LLM to provide comments above
1. what role it has (“You are an agent. . . “), a code line (see Figure 2). This is done by explaining
in the prompt what a “Good python code example“11 is.
2. the explanation of the REPL10 as a tool (“You have
This is done, by repeating this text at various appropriate
access. . . “),
11
The italic style in the prompt is only for the human reader, i.e.,
10
Read-Eval-Print-Loop only the text is given as prompt.
You are an agent designed to solve constraint problems, you write and execute python code to answer questions.
You have access to a python REPL, which you can use to execute python code. The python REPL will keep its state
between usage. Use this to gradually approach a final solution. Divide the solution into meaningful parts run them
part after part verifying that each part runs correctly. If you get an error, debug your code and try again only the last
part that failed. You only use the constraint solver pychoco. Only use the output of your code to answer the question.
Write for each code line a comment as needed to justify your reasoning for that code line. I will give you an example
of good python code, please, follow the code convention of the good python code example. This good python code
exampleis only an example. This good python code example has nothing to do with the constraint problem solution.
You might know the answer without running any code, but you should still run the code to get the answer. If it does
not seem like you can write code to answer the question, just return "I don’t know" as the answer. The user may
ask questions or give follow up instructions after you presented your solution. You then have to adjust you solution
accordingly.
good python code example:
"# Import necessary modules
import os
# Define the input and output file paths
input_file_path = \’numbers.txt\’
output_file_path = \’average.txt\’
# Function to read numbers from a file
def read_numbers_from_file(file_path):
# Initialize an empty list to store the numbers
numbers = []
# Open the file in read mode
with open(file_path, \’r\’) as file:
# Read each line in the file
for line in file:
# Strip any leading/trailing whitespace and convert to float
number = float(line.strip())
# Append the number to the list
numbers.append(number)
# Return the list of numbers
return numbers
main()"
Figure 2: Refining the prompt to provide comments for the generated model (slightly differently prompted).
positions in the prompt, as well as providing such an Results: The LLM makes multiple Python REPL calls,
example in the prompt. one that prints the solution, and one that follows the
system prompt on how to verify the solution. This leads
4.2. Prompting Specific Problems
to the correct computation of the model including the
The query part simply consists of the name of the con- conversion of the solutions to a list of lists (see Figure 3).
straint problem and related specific problem descriptions, Comments: The solution to the N-queens problem is
see Listing 112 . provided in the PyChoco documentation on the internet,
N-Queens Problem hence, the LLM was trained with it. However, a different
Problem: Positioning n queens with possible movements solution was found. Also, the 3-queen problem which
known from chess on a 𝑛*𝑛 chessboard13 . Figure 1 shows provides no solution was correctly represented. A further
the used prompt. observation was that the result is not deterministic, e.g., it
varies arithmetical constraints for diagonals (see Listing
2). Fixing the seed and setting the temperature to zero
12
The code in the repository contains further problems such as magic does not make results deterministic.
square that lead to similar observations. The semantic verification test for the N-Queens prob-
13
See files “chats/queen8_cs_agent_sol2_gpt-4-1106-preview_*“ in
the repository for results
lem is shown in Listing 3. Through the method
Listing 1: Queries for specific constraint problems
queen3 = "solve the 3-queen problem"
queen8 = "solve the 8-queen problem"
queenn = "solve the n-queen problem, use n=8 as test instance"
----------------------------------------------------------------------------
coloring = """
solve the map coloring problem for four regions, three colors and the given adjacency:
regions = [’A’, ’B’, ’C’, ’D’]
adjacency_list = {
’A’: [’B’, ’C’],
’B’: [’A’, ’C’, ’D’],
’C’: [’A’, ’B’, ’D’],
’D’: [’B’, ’C’]
}.
the solution should be a list of python dicts where each dict maps regions to color indices
"""
----------------------------------------------------------------------------
tsp = """solve the traveling salesman problem, use the following problem instance:
# Number of cities
C = 4
# Distance matrix
D = [[0, 10, 15, 20], [10, 0, 35, 25], [15, 35, 0, 30], [20, 25, 30, 0]]
the solution should be a list of valid solutions, each solution being a list of integers
representing the cities to be visited.
"""
verify_solutions called by the prompt, each solu- Results: In general, there are several recurring issues
tion (queens) is tested. This test is exhaustive because when attempting to model constraints for the TSP (see
it analytically tests the queens’ position, not leaving out the error logs in Appendix Section 8). One major problem
a constraint. is accurately modeling distance constraints, which often
Map Coloring leads to incorrect solutions. Similarly, the Hamiltonian
Problem: Listing 1 describes the well-known Map Col- path constraints are difficult to model correctly, com-
oring as a query. pounding the problem. Additionally, there is frequent
Results: A first model which contains the constraint misuse of the API. These misuses typically involve pro-
all_different raised an error14 but was corrected by viding arguments in the wrong order or supplying an in-
using another constraint modeling (i.e., with != instead correct number of arguments. As a result of these issues,
of all_different) through 2 iterations15 . A further the solutions generated are often flawed. With GPT-4, no
run provides a one-shot success, however not recognized solutions could be found with the resources at hand, but
as such, instead a solution was hallucinated for present- with GPT-4o, some solutions were correctly identified.
ing to the user16 . A further run firstly used the API However, no implementation was found that computes
wrongly, but finds finally a correct solution17 . all the test cases provided for verification. Overall, these
Comments: The solution can also be found on the challenges underscore the complexity of effectively mod-
internet, however, this does not prevent from making eling TSP constraints and highlight areas that require
errors. Finally, also because the iterative approach is significant improvements to find optimal solutions con-
used, a solution was found. sistently.
Traveling Salesperson Problem
Problem: Listing 1 describes the well-known TSP as a 4.3. Implementation
query.
The implementation, which utilizes LangChain and
14
It seems to be an implementation issue, see https://github.com/ Python, leverages ChatGPT-4 and ChatGPT-4o and is
15
chocoteam/choco-solver/issues/705 available on a repository at Zenodo18 . As a constraint
coloring_cs_agent_sol2_gpt-4-1106-preview_ 20240529094847.txt
16
coloring_cs_agent_sol2_gpt-4-1106-preview_ 20240529095157.txt
17 18
coloring_cs_agent_sol2_gpt-4-1106-preview_ 20240529095612.txt https://doi.org/10.5281/zenodo.12794211
Assistant:
There was a mistake in the code. It seems I incorrectly used the
‘solver.find_solution()‘ method, which only finds one solution, and I
didn’t convert the solution to the expected list format before passing
it to the verification function. I need to find all solutions and convert
each solution to a list of integers representing the positions of the
queens.
Let me correct the code.
Python_REPL (tool call 1/1):
from pychoco.model import Model
from test_tool import verify_solutions
model = Model("8-Queens")
# Create an array of 8 variables representing the positions of queens on a
# chess board
queens = model.intvars(8, 1, 8)
# Add constraints
for i in range(8):
for j in range(i + 1, 8):
# Queens cannot be on the same row
model.arithm(queens[i], "!=" , queens[j]).post()
# Queens cannot be on the same diagonal
model.arithm(queens[i], "-", queens[j], "!=", i - j).post()
model.arithm(queens[i], "-", queens[j], "!=", j - i).post()
# Find all solutions
solver = model.get_solver()
solutions = []
for solution in solver.find_all_solutions():
solutions.append([solution.get_int_val(queens[i]) for i in range(8)])
verify_solutions(solutions, "queen8")
Python_REPL (tool response):
92 of the 92 solutions are correct. Expected 92 solutions
Assistant:
The 8-queens problem has been successfully solved, and all 92 solutions are correct.
tool calls: 6, text messages: 6
Figure 3: Constraint model for 8-queens problem and last output
solver, we used PyChoco as a Python API for the con- pute a single solution could be found (see Appendix sec-
straint solver Choco19 . The resources of ChatGPT are tions 3 to 6). In Appendix Section 9, we document the
limited in respect of context/prompt length. This ap- number of iterations per experiment, i.e., calls to the
proach causes the LLM to stop responding when the chat Python interpreter (”Tool Calls”) and answers given by
size extends the allowed number of tokens, which easily the LLM (”Text Messages”). Nevertheless, for the TSP, we
can occur because the chat includes the entire previous could not find a model that computes all solutions. The
chat history. The resulting chats are included in the pa- iteration was stopped by the LLM, if no changes occurred
per’s appendix and the repository. The implementation or if too many tokens were used. So the paper’s result
can be used to rerun the experiments, and the data can be is that in principle constraint models for well-known
utilized for further investigation in prompt engineering. problems can be computed, however, not in any cases.
An interesting observation is that the generated vari-
5. Discussion able names within the model are always semantically
meaningful (e.g., "queens") and not arbitrary. This likely
Overall, across all problems, after initiating multiple occurs because LLMs statistically favor names that have
chats, each with several iterations, a model that can com- been previously encountered.
In this paper, we use well-known constraint problems,
19
https://choco-solver.org/
Listing 2: Non-deterministic results
- correct
model.arithm(queens[i], "!=" , queens[j] + (j - i)).post()
model.arithm(queens[i], "!=", queens[j] - (j - i)).post()
- correct
model.arithm(qs[i], "-", qs[j], "!=" , j - i).post()
model.arithm(qs[j], "-", qs[i], "!=" , j - i).post()
- wrong
model.abs(queens - queens).ne(i - j).post() # Diagonal constraint
model.abs(queens - queens).ne(j - i).post() # Diagonal constraint
- wrong
model.arithm(queens[i], "-", queens[j], "!=", i - j).post()
model.arithm(queens[i], "+", queens[j], "!=", i + j + 2).post()
- does not find all solutions
diag1 = [model.int_offset_view(queens[i], i) for i in range(n)]
diag2 = [model.int_offset_view(queens[i], -i) for i in range(n)]
model.all_different(diag1).post()
model.all_different(diag2).post()
- wrong
model.arithm(queens[i], "!=" , queens[j]).post() # Different columns
model.arithm(queens[i], "-", i, "!=" , queens[j], "-", j).post() # Different diagonals (left-
top to right-bottom)
model.arithm(queens[i], "+", i, "!=" , queens[j], "+", j).post() # Different diagonals (left-
bottom to right-top)
- wrong
model.all_different([queens[i], queens[j]]).post()
model.arithm(queens[i], "-", queens[j], "!=" , j - i).post()
model.arithm(queens[i], "+", queens[j], "!=" , j - i).post()
Listing 3: Verifying a solutions for N-queens through testing the queens’ positions.
def is_valid_solution(queens):
n = len(queens)
assert type(queens) == list, "expected list of ints"
for i in range(n):
for j in range(i + 1, n):
# Check if queens are in the same row
if queens[i] == queens[j]:
return False
# Check diagonals: if the difference between the column indices equals
# the difference between the row indices, they’re in the same diagonal
if abs(queens[i] - queens[j]) == abs(i - j):
return False
return True
which, of course, can only be a first step for leveraging prevented by providing the API definition in the prompt,
LLMs in knowledge acquisition. The next steps would be because the complete definition extends the number of
to supply new, unknown problems, also industrial ones, possible tokens we could send to the LLM.
e.g., by giving tables of correct variable combinations Additionally, the chats are not stable and can vary
(configurations) in table constraints and generating ab- significantly even with the same prompt. This instability
stracted constraint models. However, Listing 1 shows a persists even with temperature set to 0 and a fixed seed,
potential way of representing formally a problem as a making chats non-deterministic.
query which probably can be used for other tasks. The used prompts should be further developed to
There are several general problems identified: the pro- achieve the intended general generation of knowledge
gram persistently uses the PyChoco API incorrectly and models; in other words, prompt engineering for knowl-
often hallucinates PyChoco functions. This could not be edge modeling needs to be elaborated. Few-Shot Prompt-
ing [20] could be used to encourage the model to produce ernisierung an Mietwohnungsbaubeständen (KIMM)“) as
more consistent outputs and adhere more closely to given well as through the Federal Ministry for Economic Affairs
instructions. and Climate Action (Bundesministerium für Wirtschaft
The approach of first creating syntactically correct und Klimaschutz), Germany, within the “Joint project:
models and then verifying them to ensure semantic ac- EnOB: IntelMOD - Intelligent modernization platform
curacy could be successfully demonstrated (see Listing based on functional cost splitting; sub-project: Infrastruc-
3). Because we use a test-driven semantic verification ture of the modernization platform.“ (“Verbundvorhaben:
this verification depends on the quality of the used test EnOB: IntelMOD - Intelligente Modernisierungsplat-
cases. For example, the N-queen problem was exhaustive tform auf Basis des Funktionalen Kostensplittings; Teil-
because of an analytical-based verification and not an projekt: Infrastruktur der Modernisierungsplattform.“)
enumeration of test cases. Furthermore, writing tests
for knowledge-based tasks is, in our experience, typi-
cally easier than writing knowledge models, hence, the References
experts’ tasks of creating formal constraints and also con-
figuration models will, in our opinion, shift from writing [1] A. Felfernig, L. Hotz, C. Bagley, J. Tiihonen,
models to writing tests - and let us write the models Knowledge-based Configuration – From Research
through an LLM. Thus, the knowledge model develop- to Business Cases, Morgan Kaufmann, 2014.
ing process probably, with the presented approach, can [2] A. Felfernig, A. Falkner, D. Benavides, Feature Mod-
be executed by domain experts not only by knowledge els: AI-driven Design, Analysis, and Applications,
engineers. Springer, 2024.
The paper offers, furthermore, an easy way to begin [3] L. Hotz, A. Felfernig, M. Stumptner, A. Ryabokon,
with iterative LLM prompting. Moreover, it is suggested C. Bagley, K. Wolter, Configuration Knowledge Rep-
to use the LLM for creating a new configuration language resentation & Reasoning, in: A. Felfernig, L. Hotz,
by exploiting the hallucination capabilities. C. Bagley, J. Tiihonen (Eds.), Knowledge-based Con-
figuration – From Research to Business Cases, Mor-
gan Kaufmann Publishers, 2014, pp. 59–96.
6. Conclusions [4] I. Ozkaya, A. Carleton, J. Robert, D. Schmidt, Ap-
The paper presents the first steps in supporting the for- plication of large language models (llms) in soft-
malization task in constraint modeling with LLMs, i.e., ware engineering: Overblown hype or disrup-
the mapping of a well-known problem into a constraint tive change?, Carnegie Mellon University, Soft-
representation of a certain constraint language, here Py- ware Engineering Institute’s Insights (blog), 2023.
Choco, with large language models. By taking various URL: https://doi.org/10.58012/6n1p-pw64, accessed:
constraint problems, formulating generally applicable 2024-May-16.
prompts, and doing so in an iterative manner which in- [5] B. O’Sullivan, Automated modelling and solving
cludes syntactic and semantic verification processes, the in constraint programming, in: 24th AAAI Con-
LLM could generate appropriate and correct constraint ference on Artificial Intelligence, AAAI’10, AAAI
models. Through the iteration, in-between errors related Press, 2010, pp. 1493–1497.
to syntax and semantics were automatically corrected. [6] E. C. Freuder, Conversational modeling for
Further work will include a quantitative evaluation of the constraint satisfaction, AAAI Conference on
method, a comparison of different LLMs, as well as the Artificial Intelligence 38 (2024) 22592–22597.
generation of constraint models for unknown problems. URL: https://ojs.aaai.org/index.php/AAAI/article/
In total, this paper provides a first step into a new kind view/30268. doi:10.1609/aaai.v38i20.30268.
of constraint and configuration modeling approach with [7] F. Petroni, T. Rocktäschel, P. P. Lewis, A. Bakhtin,
LLMs. Y. Wu, A. Miller, S. Riedel, Language models as
knowledge bases?, 2019. URL: https://arxiv.org/pdf/
1909.01066.
Acknowledgments [8] Y. Ding, A. Poudel, Q. Zeng, T. Weninger, B. Veera-
mani, S. Bhattacharya, Entgpt: Linking generative
This project was funded through the Federal Insti- large language models with knowledge bases, 2024.
tute for Research on Building, Urban Affairs and Spa- URL: https://arxiv.org/abs/2402.06738.
tial Development (Bundesinstitut für Bau-, Stadt- und [9] S. Razniewski, A. Yates, N. Kassner, G. Weikum,
Raumforschung), Germany, within the Zukunft Bau Language models as or for knowledge bases, 2021.
Forschungsförderung Maßnahme (Future of construc- URL: https://arxiv.org/abs/2110.04888.
tion Research funding Measure): “AI-supported mod- [10] F. Suchanek, A. T. Luu, Knowledge bases and lan-
ernization of rental housing stock“ (“KI-gestützte Mod- guage models: Complementing forces, in: 7th Inter-
national Joint Conference on Rules and Reasoning,
Springer-Verlag, Berlin, Heidelberg, 2023, pp. 3–15.
URL: https://doi.org/10.1007/978-3-031-45072-3_1.
[11] A. Nayak, H. P. Timmapathini, Llm2kb: Construct-
ing knowledge bases using instruction tuned con-
text aware large language models, 2023. URL: https:
//arxiv.org/abs/2308.13207.
[12] T. Ahmed, S. Choudhury, Lm4opt: Unveiling the
potential of large language models in formulating
mathematical optimization problems, 2024. URL:
https://arxiv.org/abs/2403.01342.
[13] Y. Zhang, S. Mao, T. Ge, X. Wang, A. de Wynter,
LLM as a mastermind: A survey of strategic reason-
ing with large language models, 2023. URL: https://
arxiv.org/abs/2404.01230v1. arXiv:2404.01230.
[14] W. Chen, X. Ma, X. Wang, W. W. Cohen, Pro-
gram of thoughts prompting: Disentangling com-
putation from reasoning for numerical reasoning
tasks, Transactions on Machine Learning Re-
search (2023). URL: https://openreview.net/forum?
id=YfZ4ZPt8zd.
[15] L. Pan, A. Albalak, X. Wang, W. Y. Wang, Logic-
lm: Empowering large language models with sym-
bolic solvers for faithful logical reasoning, 2023.
arXiv:2305.12295.
[16] P. Clark, O. Tafjord, K. Richardson, Transformers as
soft reasoners over language, in: 29th International
Joint Conference on Artificial Intelligence, IJCAI’20,
2021, pp. 3882–3890.
[17] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. H. hsin
Chi, D. Zhou, Self-consistency improves chain
of thought reasoning in language models, ArXiv
(2022). URL: https://arxiv.org/abs/2203.11171.
[18] M. Acher, J. G. Duarte, J.-M. Jézéquel, On program-
ming variability with large language model-based
assistant, in: Proceedings of the 27th ACM Interna-
tional Systems and Software Product Line Confer-
ence - Volume A, SPLC ’23, Association for Comput-
ing Machinery, New York, NY, USA, 2023, p. 8–14.
URL: https://doi.org/10.1145/3579027.3608972.
[19] M. Funk, S. Hosemann, J. C. Jung, C. Lutz, Towards
ontology construction with language models, 2023.
URL: https://arxiv.org/abs/2309.09898.
[20] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka-
plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas-
try, A. Askell, et al., Language models are few-shot
learners, Advances in neural information process-
ing systems 33 (2020) 1877–1901.