1. Introduction

Exploiting Large Language Models for the Automated Generation of Constraint Satisfaction Problems

Lothar Hotz

Christian Bähnisch

Sebastian Lubos

Alexander Felfernig

Albert Haag

Johannes Twiefel

3 0 Graz University of Technology , Graz , Austria 1 Hamburger Informatik Technologie-Center e.V. , Hamburg , Germany 2 Product Management Haag GmbH , Bad Dürkheim , Germany 3 exXxa GmbH , Hamburg , Germany

Constraint Satisfaction Problems (CSPs) are a core technology that solves many real-world problems, especially for configuration tasks. A key success factor in this context is an eficient knowledge acquisition process where domain experts and knowledge engineers (developers of CSPs) should develop an agreement on the correctness of the expanding knowledge base as soon as possible. In this paper, we show how large language models (LLMs) can be applied to the automated generation of solutions for constraint satisfaction problems thus reducing overheads related to CSP development and maintenance in the future.

eol>Constraint Satisfaction Problems Large Language Models Knowledge Acquisition Automated Generation

1. Introduction

ing legacy code [ 4 ]. This is achieved through general

LLMs such as ChatGPT1 or specific ones like for cod

Knowledge acquisition for knowledge-based systems, es- ing trained LLMs such as CoPilot2, or software agents pecially constraint-based systems, is a complex task. It like CREWAI3 and AutoGenStudio4, or software assisting includes the formalization of partly tangible knowledge engineers such as Devine AI5 or its open source correwith a knowledge model, such as a configuration, con- spondent Devika6. straint, or feature model [ 1, 2 ]. Constrained-based sys- An example of developing a small game with a graphtems, in particular, are often used to implement configu- ical user interface is GPT-Engineer7, which prompts in ration systems, due to their ability to compute possible natural languages and provides the game logic and the values for configurations or even directly configurations graphical interface through HTML and CSS coding. Esthemselves [ 3 ]. Hence, modeling a configuration prob- pecially the combination of an LLM with a compiler or lem as a constraint satisfaction problem (CSP) is a typical interpreter in a chain leads to a self-evaluation where approach for computing configurations. This type of a code, which is generated code by an LLM, is directly modeling is similar to developing programs in a high- checked through the interpreter, and the resulting erlevel programming language but also incorporates logical ror message is taken as the next prompt for the LLM. semantic elements. Through iterating this (e.g., with a supporting tool such

Currently in software engineering, the use of large as LangCHAIN8) a syntactically correct program is crelanguage models (LLMs) is exploited to support program- ated. However, broader tasks, such as reviewing a softmers in their daily tasks, such as coding, code comple- ware architecture or refactoring code distributed over tion, reviewing code, API programming, generating test multiple files are hindered by the limited size of a prompt cases, documentation, or identifying design patterns as (the context window) to a given number of tokens such well as learning programming languages or understand- as 128, 000 input tokens for GPT-4. In summary, applying LLMs to software development still has to cope with issues such as limited context windows, maintainability of AI-created code, and applicability in project planning or software architecture.

Inspired by these approaches, for constraint satisfaction problems the following questions arise: then reuse this general prompt for specific knowledge engineering tasks at hand.

However, a main ingredient of LLMs is of course a prompt that triggers the generation of text, here of knowledge models. Hence, for a knowledge model, the questions are also: • What type of prompts do we need? • What has to be presented to the LLM? • Is natural language text describing the problem enough? • Should examples of the formal modeling language, which shall be used for formalization, be given? • Or the complete language specification?

This paper explores the innovative use of LLMs to automatically generate constraint models in constraint programming languages like PyChoco9. Constraint programming involves defining problems in terms of variables, related variable domains, and constraints that must be satisfied. Popular problems in this domain include, e.g., the N-queens problem, Magic Square, Map Coloring, and the Traveling Salesperson Problem (TSP). To tackle such problems, we propose a generalized approach (Section

3) where an LLM is leveraged to generate a constraint

model based on a provided prompt.

We test our approach based on several constraint satisfaction problem examples (Section 4). In this paper, we mainly concentrate on known examples from the constraint community given by well-known descriptions, not that much on new examples formulated in natural language. Although the task of starting with known problems is untypical for configuration problems, in our view, this step is necessary for exploring the possibilities of

LLMs for knowledge modeling in general. This approach

of using commonly known constraint and configuration problems builds a baseline for further work.

With our proposed approach, we aim to streamline the process of model creation in constraint programming, making it faster and more accessible by harnessing the capabilities of LLMs. The main idea is to define a general prompt once which describes the problem context and

9https://pypi.org/project/pychoco/ 2. State of the Art in LLMs and Knowledge Modeling

Many real-world problems can be interpreted as constraint satisfaction problems (CSPs). Knowledge engineers have the task of formalizing domain-specific constraints into corresponding constraint-based representations. This formalization step is often efortful and more assistance and automation are required in the modeling process which can be regarded as a major challenge for constraint-based systems [ 5 ]. Due to recent developments in the area of large language models (LLMs), we could envision agent-based approaches that support the (semi-)automated generation of CSPs [ 6 ]. In the following, we discuss related work on the application of LLMs to knowledge modeling.

The idea of exploiting synergy potentials of

knowledge-based systems and language models for knowledge-base generation has been proposed a.o. by

Petroni et al. [7], Ding et al. [8], and Razniewski et al. [9].

Suchanek and Luu [ 10 ] motivate the integration of LLMs with data-driven and knowledge-based approaches to exploit the advantages of both worlds, specifically, to let data-driven approaches act as a basis for grounding the LLM output in reality. In the work presented in our paper, a simple form of grounding is the usage of test cases to assure intended CSP semantics. A similar line of research is presented in Nayak and Timmapathini [ 11 ] where object and relation identification based on LLMs is investigated.

Ahmed and Choudhury [12] introduce the idea of ap

plying LLMs to the generation of optimization problems.

The motivation behind this is to make related problem formulation tasks more accessible to domain experts, i.e., to decrease the need for specialized mathematical knowledge to make problem formulations feasible. Based on a given dataset comprising diferent example problem definitions (and solutions), LLMs are used to generate optimization problem definitions for new problems defined in a textual fashion. A major diference compared to our work is that model fine-tuning is primarily based on a "human in the loop" approach whereas we focus more on automated fine-tuning giving LLM feedback on the syntactical (is the generated CSP correct or what were and the process repeats. This loop continues until the the compilation issues?) and the semantic level with test syntax is correct. cases. A related approach is the application of LLMs Once a syntactically correct model is produced, the to support diferent types of strategic reasoning tasks, next step is to verify its semantics. This is done by runfor example, in the context of economic simulations and ning a series of test cases specific to the problem domain. game theory [ 13 ]. Successful completion of test cases indicates a correct

A very similar approach provides [ 14 ]. This ‘Program semantic implementation (at least concerning the test of Thoughts’ named approach also combines an LLM cases). with an external Python interpreter, however, in [ 14 ] for This approach, thus, executes an iterative prompting computing numerical problems. Furthermore, the verifi- of an LLM, where only a human user provides the first cation of the generated Python program is not discussed. prompt, while the results of a Python interpreter are

The adaptation of LLMs for logical reasoning tasks [ 15 ] automatically used by the LLM as successive prompts. A can be performed either based on LLM fine-tuning where complete example of such an iterative dialog including LLMs are trained (and adapted) for specific problem set- in-between generated answers of the LLM (”Assistant”) tings [ 16 ] or based on (automated) prompt adaptation is presented in the Appendix Section 2. [ 17 ]. Intending to improve the quality of code generation, This methodology can be further detailed through the

Pan et al. [15] follow the idea of using constraint (and following steps:

SAT) solver execution errors as feedback channels (in the Leveraging Existing Knowledge Problems: To creprompting process) to increase the quality of code gen- ate useful constraint models, we use logical problems eration. The feedback approach presented in our paper that are well understood and documented on the internet. extends the work of Pan et al. [ 15 ] in terms of addition- Examples include classic problems such as the N-queens ally providing test cases that help to improve the quality problem, Map Coloring, or the Traveling Salesperson of the generated CSP also on the semantic level. Problem. These problems are selected due to their well

In [18], various prompts are used to create source code known constraints and solution strategies.

with built-in variability. Specific prompts enable the gen- Integration with Constraint Solvers: Once the logeration of code in diferent programming languages, al- ical problems are defined, we connect these models to a lowing for application configuration. However, a human constraint system, here PyChoco - other suitable solvers developer would execute these prompts, no iterative pro- could be used. This involves mapping the logical probcess ensures the correctness of the resulting code. lem’s constraints to the syntax and structures of the cho

The automated generation of CSPs can also be re- sen constraint language and, thus, facilitating automated garded as a specific type of ontology construction where solving. concept hierarchies are derived based on diferent LLM Syntax Verification Loop: An iterative process is set prompts representing concept hierarchy-specific queries up where the LLM generates a constraint model based (see, for example, the work of Funk et al. [ 19 ]). An exam- on the initial prompt. This model is then verified for ple of such a query is the following: What are all of the syntactic correctness using the constraint interpreter or most important subcategories of the category A? compiler. If errors are identified, the prompt is automatically adjusted by the LLM, by using an error message as a next prompt and the LLM regenerates the model. 3. Methodology This loop continues until a syntactically correct model is achieved.

LOLuMr sapisparosatrcuhctfuorredg,einteerraattiivnegpcrooncesstrsadinestigmnoeddetlos curseiantge achSieemvinangtaicsyVntearcitficiacatilloyncowrrietcht mToedstel,Cthaesefso:cus sAhfitfetsr syntactically and semantically (almost) correct models. to ensure semantic correctness. This involves introducThis begins by defining a general LLM prompt that can ing test cases specific to the defined problem. The LLMencompass the varying specifics of diferent constraint generated model runs through these test cases to verproblems. For instance, the prompt can be tailored to a ify that it not only adheres to syntactic rules but also specific constraint problem but always solicits a PyChoco correctly implements the problem’s logic. Iteration coninterpreter that solves the problem. tinues until the model passes all semantic verification

The process, furthermore, involves using an iterative tests. loop where the LLM generates a constraint model based Generalizability: The ultimate goal of our approach on the provided prompt. This generated model is then is to establish a generalizable method for generating converified through a constraint interpreter or compiler. If straint models using LLMs. By systematically using an the syntax of the model is incorrect, the result of the iterative loop for both syntactic and semantic verification Python interpreter call is automatically taken as a prompt that employs examples and logical problem definitions, for a next iteration and the model is adjusted by the LLM we aim to create a robust framework that can be adapted tool Python_REPL, description: A Python shell. Use this to run and verify your solution to the constraint problem. Input should be a valid Python script using pychoco. The last line should be "verify_solutions(solutions, ...)" in order to verify your solutions System: You are an agent designed to solve constraint problems, you write python code to solve and verify constraint problems. You have access to a python REPL, which you can use to execute python code and verify your solution. If you get an error, debug your code and try again. You only use the constraint solver pychoco. You conclude your python code with a variable "solutions" and verify it with the method call "verify_solutions(solutions, <problem_name>", where <problem_name> is given to you as the first word of the instructions. "solutions" should be a list of all solutions to the constraint problem. If verification fails, you correct your solution.

Use the following example as a template for solving constraint problems with pychoco: User: ints10. find numbers one to 10.

Assistant: ‘‘‘python from pychoco.model import Model from test_tool import verify_solutions model = Model() x = model.intvar(1, 10, "x") solver = model.get_solver() solutions = [s.get_int_val(x) for s in solver.find_all_solutions()] verify_solutions(solutions, "ints10") ‘‘‘ to various constraint languages and problems.

By following these steps, our approach leverages the capabilities of LLMs to create accurate and eficient constraint models, providing a scalable solution for diferent problem types. 4. Prompting Constraint Models We divide prompts into a general part which is used

with each constraint problem, i.e., the system prompt (see

Section 4.1), and a query part which points to a specific

problem (see Section 4.2). 4.1. General Prompts Prompting Syntactic Valid Models Figure 1 shows the general prompt which we developed for querying for constraint solutions. The prompt provides the following explanations to the LLM:

1. what role it has (“You are an agent. . . “), 2. the explanation of the REPL10 as a tool (“You have

access. . . “), 10Read-Eval-Print-Loop

3. a hint of the behavior of the tool (“The python

REPL. . . “),

4. a hint of how to use the tool (“Use this. . . “), 5. to focus on that tool (“use only“).

Prompting Semantic Verification: To verify, if a semantic correct solution can be computed with the generated constraint model, we include a verification scheme that builds on tests. Those are specific for the constraint problem at hand. However, the general prompt is enhanced with a call to the verification method, which is parameterized with the name of the constraint problem.

The verification scheme is given by an example that ex

plains the parameterization, a constraint model, and the call to the verification method (Figure 1).

Prompt Refinement: For more understanding, of

why the LLM generates a certain code line in the model, we have enforced the LLM to provide comments above a code line (see Figure 2). This is done by explaining in the prompt what a “Good python code example“11 is.

This is done, by repeating this text at various appropriate

11The italic style in the prompt is only for the human reader, i.e.,

only the text is given as prompt.

You are an agent designed to solve constraint problems, you write and execute python code to answer questions. You have access to a python REPL, which you can use to execute python code. The python REPL will keep its state between usage. Use this to gradually approach a final solution. Divide the solution into meaningful parts run them part after part verifying that each part runs correctly. If you get an error, debug your code and try again only the last part that failed. You only use the constraint solver pychoco. Only use the output of your code to answer the question. Write for each code line a comment as needed to justify your reasoning for that code line. I will give you an example of good python code, please, follow the code convention of the good python code example. This good python code exampleis only an example. This good python code example has nothing to do with the constraint problem solution. You might know the answer without running any code, but you should still run the code to get the answer. If it does not seem like you can write code to answer the question, just return "I don’t know" as the answer. The user may ask questions or give follow up instructions after you presented your solution. You then have to adjust you solution accordingly. good python code example: "# Import necessary modules import os # Define the input and output file paths input_file_path = \’numbers.txt\’ output_file_path = \’average.txt\’ # Function to read numbers from a file def read_numbers_from_file(file_path): # Initialize an empty list to store the numbers numbers = [] # Open the file in read mode with open(file_path, \’r\’) as file: # Read each line in the file for line in file: # Strip any leading/trailing whitespace and convert to float number = float(line.strip()) # Append the number to the list numbers.append(number) # Return the list of numbers return numbers main()" 12The code in the repository contains further problems such as magic

square that lead to similar observations. 13See files “chats/queen8_cs_agent_sol2_gpt-4-1106-preview_*“ in

the repository for results positions in the prompt, as well as providing such an Results: The LLM makes multiple Python REPL calls, example in the prompt. one that prints the solution, and one that follows the 4.2. Prompting Specific Problems system prompt on how to verify the solution. This leads to the correct computation of the model including the The query part simply consists of the name of the con- conversion of the solutions to a list of lists (see Figure 3). straint problem and related specific problem descriptions, Comments: The solution to the N-queens problem is see Listing 112. provided in the PyChoco documentation on the internet, N-Queens Problem hence, the LLM was trained with it. However, a diferent Problem: Positioning n queens with possible movements solution was found. Also, the 3-queen problem which known from chess on a * chessboard13. Figure 1 shows provides no solution was correctly represented. A further the used prompt. observation was that the result is not deterministic, e.g., it varies arithmetical constraints for diagonals (see Listing

2). Fixing the seed and setting the temperature to zero does not make results deterministic. The semantic verification test for the N-Queens problem is shown in Listing 3. Through the method

Listing 1: Queries for specific constraint problems queen3 = "solve the 3-queen problem" queen8 = "solve the 8-queen problem" queenn = "solve the n-queen problem, use n=8 as test instance" ---------------------------------------------------------------------------coloring = """ solve the map coloring problem for four regions, three colors and the given adjacency: regions = [’A’, ’B’, ’C’, ’D’] adjacency_list = { ’A’: [’B’, ’C’], ’B’: [’A’, ’C’, ’D’], ’C’: [’A’, ’B’, ’D’], ’D’: [’B’, ’C’] }. the solution should be a list of python dicts where each dict maps regions to color indices """ ---------------------------------------------------------------------------tsp = """solve the traveling salesman problem, use the following problem instance: # Number of cities C = 4 # Distance matrix D = [[ 0, 10, 15, 20 ], [ 10, 0, 35, 25 ], [ 15, 35, 0, 30 ], [ 20, 25, 30, 0 ]] the solution should be a list of valid solutions, each solution being a list of integers representing the cities to be visited. """ verify_solutions called by the prompt, each solu- Results: In general, there are several recurring issues tion (queens) is tested. This test is exhaustive because when attempting to model constraints for the TSP (see it analytically tests the queens’ position, not leaving out the error logs in Appendix Section 8). One major problem a constraint. is accurately modeling distance constraints, which often Map Coloring leads to incorrect solutions. Similarly, the Hamiltonian Problem: Listing 1 describes the well-known Map Col- path constraints are dificult to model correctly, comoring as a query. pounding the problem. Additionally, there is frequent

Results: A first model which contains the constraint misuse of the API. These misuses typically involve proall_different raised an error14 but was corrected by viding arguments in the wrong order or supplying an inusing another constraint modeling (i.e., with != instead correct number of arguments. As a result of these issues, of all_different) through 2 iterations15. A further the solutions generated are often flawed. With GPT-4, no run provides a one-shot success, however not recognized solutions could be found with the resources at hand, but as such, instead a solution was hallucinated for present- with GPT-4o, some solutions were correctly identified. ing to the user16. A further run firstly used the API However, no implementation was found that computes wrongly, but finds finally a correct solution 17. all the test cases provided for verification. Overall, these

Comments: The solution can also be found on the challenges underscore the complexity of efectively modinternet, however, this does not prevent from making eling TSP constraints and highlight areas that require errors. Finally, also because the iterative approach is significant improvements to find optimal solutions conused, a solution was found. sistently.

Traveling Salesperson Problem Problem: Listing 1 describes the well-known TSP as a 4.3. Implementation query.

The implementation, which utilizes LangChain and

14It seems to be an implementation issue, see https://github.com/ Python, leverages ChatGPT-4 and ChatGPT-4o and is chocoteam/choco-solver/issues/705 available on a repository at Zenodo18. As a constraint 15coloring_cs_agent_sol2_gpt-4-1106-preview_ 20240529094847.txt 16coloring_cs_agent_sol2_gpt-4-1106-preview_ 20240529095157.txt 17coloring_cs_agent_sol2_gpt-4-1106-preview_ 20240529095612.txt 18https://doi.org/10.5281/zenodo.12794211 Assistant: There was a mistake in the code. It seems I incorrectly used the ‘solver.find_solution()‘ method, which only finds one solution, and I didn’t convert the solution to the expected list format before passing it to the verification function. I need to find all solutions and convert each solution to a list of integers representing the positions of the queens.

Let me correct the code.

Python_REPL (tool call 1/1): from pychoco.model import Model from test_tool import verify_solutions model = Model("8-Queens") # Create an array of 8 variables representing the positions of queens on a # chess board queens = model.intvars(8, 1, 8) # Add constraints for i in range(8): for j in range(i + 1, 8): # Queens cannot be on the same row model.arithm(queens[i], "!=" , queens[j]).post() # Queens cannot be on the same diagonal model.arithm(queens[i], "-", queens[j], "!=", i - j).post() model.arithm(queens[i], "-", queens[j], "!=", j - i).post() # Find all solutions solver = model.get_solver() solutions = [] for solution in solver.find_all_solutions():

solutions.append([solution.get_int_val(queens[i]) for i in range(8)]) verify_solutions(solutions, "queen8") Python_REPL (tool response): 92 of the 92 solutions are correct. Expected 92 solutions Assistant: The 8-queens problem has been successfully solved, and all 92 solutions are correct.

tool calls: 6, text messages: 6 solver, we used PyChoco as a Python API for the con- pute a single solution could be found (see Appendix secstraint solver Choco19. The resources of ChatGPT are tions 3 to 6). In Appendix Section 9, we document the limited in respect of context/prompt length. This ap- number of iterations per experiment, i.e., calls to the proach causes the LLM to stop responding when the chat Python interpreter (”Tool Calls”) and answers given by size extends the allowed number of tokens, which easily the LLM (”Text Messages”). Nevertheless, for the TSP, we can occur because the chat includes the entire previous could not find a model that computes all solutions. The chat history. The resulting chats are included in the pa- iteration was stopped by the LLM, if no changes occurred per’s appendix and the repository. The implementation or if too many tokens were used. So the paper’s result can be used to rerun the experiments, and the data can be is that in principle constraint models for well-known utilized for further investigation in prompt engineering. problems can be computed, however, not in any cases.

An interesting observation is that the generated vari

5. Discussion able names within the model are always semantically meaningful (e.g., "queens") and not arbitrary. This likely Overall, across all problems, after initiating multiple occurs because LLMs statistically favor names that have chats, each with several iterations, a model that can com- been previously encountered.

In this paper, we use well-known constraint problems,

Listing 2: Non-deterministic results - correct model.arithm(queens[i], "!=" , queens[j] + (j - i)).post() model.arithm(queens[i], "!=", queens[j] - (j - i)).post() - correct model.arithm(qs[i], "-", qs[j], "!=" , j - i).post() model.arithm(qs[j], "-", qs[i], "!=" , j - i).post() - wrong model.abs(queens - queens).ne(i - j).post() # Diagonal constraint model.abs(queens - queens).ne(j - i).post() # Diagonal constraint - wrong model.arithm(queens[i], "-", queens[j], "!=", i - j).post() model.arithm(queens[i], "+", queens[j], "!=", i + j + 2).post() - does not find all solutions diag1 = [model.int_offset_view(queens[i], i) for i in range(n)] diag2 = [model.int_offset_view(queens[i], -i) for i in range(n)] model.all_different(diag1).post() model.all_different(diag2).post() - wrong model.arithm(queens[i], "!=" , queens[j]).post() # Different columns model.arithm(queens[i], "-", i, "!=" , queens[j], "-", j).post() # Different diagonals (lefttop to right-bottom) model.arithm(queens[i], "+", i, "!=" , queens[j], "+", j).post() # Different diagonals (leftbottom to right-top) - wrong model.all_different([queens[i], queens[j]]).post() model.arithm(queens[i], "-", queens[j], "!=" , j - i).post() model.arithm(queens[i], "+", queens[j], "!=" , j - i).post() Listing 3: Verifying a solutions for N-queens through testing the queens’ positions. def is_valid_solution(queens): n = len(queens) assert type(queens) == list, "expected list of ints" for i in range(n): for j in range(i + 1, n): # Check if queens are in the same row if queens[i] == queens[j]:

return False # Check diagonals: if the difference between the column indices equals # the difference between the row indices, they’re in the same diagonal if abs(queens[i] - queens[j]) == abs(i - j):

return False return True which, of course, can only be a first step for leveraging prevented by providing the API definition in the prompt, LLMs in knowledge acquisition. The next steps would be because the complete definition extends the number of to supply new, unknown problems, also industrial ones, possible tokens we could send to the LLM. e.g., by giving tables of correct variable combinations Additionally, the chats are not stable and can vary (configurations) in table constraints and generating ab- significantly even with the same prompt. This instability stracted constraint models. However, Listing 1 shows a persists even with temperature set to 0 and a fixed seed, potential way of representing formally a problem as a making chats non-deterministic. query which probably can be used for other tasks. The used prompts should be further developed to

There are several general problems identified: the pro- achieve the intended general generation of knowledge gram persistently uses the PyChoco API incorrectly and models; in other words, prompt engineering for knowloften hallucinates PyChoco functions. This could not be edge modeling needs to be elaborated. Few-Shot Prompting [ 20 ] could be used to encourage the model to produce ernisierung an Mietwohnungsbaubeständen (KIMM)“) as more consistent outputs and adhere more closely to given well as through the Federal Ministry for Economic Afairs instructions. and Climate Action (Bundesministerium für Wirtschaft

The approach of first creating syntactically correct und Klimaschutz), Germany, within the “Joint project: models and then verifying them to ensure semantic ac- EnOB: IntelMOD - Intelligent modernization platform curacy could be successfully demonstrated (see Listing based on functional cost splitting; sub-project: Infrastruc3). Because we use a test-driven semantic verification ture of the modernization platform.“ (“Verbundvorhaben: this verification depends on the quality of the used test EnOB: IntelMOD - Intelligente Modernisierungsplatcases. For example, the N-queen problem was exhaustive tform auf Basis des Funktionalen Kostensplittings; Teilbecause of an analytical-based verification and not an projekt: Infrastruktur der Modernisierungsplattform.“) enumeration of test cases. Furthermore, writing tests for knowledge-based tasks is, in our experience, typically easier than writing knowledge models, hence, the References experts’ tasks of creating formal constraints and also conifguration models will, in our opinion, shift from writing models to writing tests - and let us write the models through an LLM. Thus, the knowledge model developing process probably, with the presented approach, can be executed by domain experts not only by knowledge engineers.

The paper ofers, furthermore, an easy way to begin with iterative LLM prompting. Moreover, it is suggested to use the LLM for creating a new configuration language by exploiting the hallucination capabilities. 6. Conclusions The paper presents the first steps in supporting the for

malization task in constraint modeling with LLMs, i.e., the mapping of a well-known problem into a constraint representation of a certain constraint language, here Py

Choco, with large language models. By taking various

constraint problems, formulating generally applicable prompts, and doing so in an iterative manner which includes syntactic and semantic verification processes, the

LLM could generate appropriate and correct constraint

models. Through the iteration, in-between errors related to syntax and semantics were automatically corrected.

Further work will include a quantitative evaluation of the

method, a comparison of diferent LLMs, as well as the generation of constraint models for unknown problems.

In total, this paper provides a first step into a new kind

of constraint and configuration modeling approach with

LLMs. Acknowledgments This project was funded through the Federal Insti

tute for Research on Building, Urban Afairs and Spatial Development (Bundesinstitut für Bau-, Stadt- und

Raumforschung), Germany, within the Zukunft Bau Forschungsförderung Maßnahme (Future of construc

tion Research funding Measure): “AI-supported modernization of rental housing stock“ (“KI-gestützte Mod

[1]

Felfernig ,

Hotz ,

Bagley ,

Tiihonen , Knowledge-based Configuration - From Research to Business Cases, Morgan Kaufmann, 2014 .

[2]

Felfernig ,

Falkner ,

Benavides , Feature Models: AI-driven

Design

, Analysis, and Applications , Springer, 2024 .

[3]

Hotz ,

Felfernig ,

Stumptner ,

Ryabokon ,

Bagley ,

Wolter , Configuration Knowledge Representation & Reasoning , in: A. Felfernig , L.

Hotz , C.

Bagley , J. Tiihonen (Eds.), Knowledge-based Conifguration - From Research to Business Cases, Morgan Kaufmann Publishers, 2014 , pp. 59 - 96 .

[4]

Ozkaya ,

Carleton ,

Robert ,

Schmidt , Application of large language models (llms) in software engineering: Overblown hype or disruptive change? , Carnegie Mellon University, Software Engineering Institute's Insights (blog) , 2023 . URL: https://doi.org/10.58012/6n1p-pw64, accessed: 2024 -May- 16 .

[5]

B. O

'Sullivan , Automated modelling and solving in constraint programming , in: 24th AAAI Conference on Artificial Intelligence , AAAI' 10 , AAAI Press, 2010 , pp. 1493 - 1497 .

[6]

E. C.

Freuder , Conversational modeling for constraint satisfaction , AAAI Conference on Artificial Intelligence 38 ( 2024 ) 22592 - 22597 . URL: https://ojs.aaai.org/index.php/AAAI/article/ view/30268. doi: 10 .1609/aaai.v38i20. 30268 .

[7]

Petroni ,

Rocktäschel ,

P. P.

Lewis ,

Bakhtin ,

Wu ,

Miller ,

Riedel , Language models as knowledge bases? , 2019 . URL: https://arxiv.org/pdf/ 1909 .01066.

[8]

Ding ,

Poudel ,

Zeng ,

Weninger ,

Veeramani ,

Bhattacharya , Entgpt: Linking generative large language models with knowledge bases, 2024 . URL: https://arxiv.org/abs/2402.06738.

[9]

Razniewski ,

Yates ,

Kassner , G. Weikum, Language models as or for knowledge bases , 2021 . URL: https://arxiv.org/abs/2110.04888.

[10]

Suchanek ,

A. T.

Luu , Knowledge bases and language models: Complementing forces , in: 7th International Joint Conference on Rules and Reasoning , Springer-Verlag, Berlin, Heidelberg, 2023 , pp. 3 - 15 . URL: https://doi.org/10.1007/978-3- 031 -45072- 3 _ 1 .

[11]

Nayak ,

H. P.

Timmapathini , Llm2kb: Constructing knowledge bases using instruction tuned context aware large language models , 2023 . URL: https: //arxiv.org/abs/2308.13207.

[12]

Ahmed , S. Choudhury, Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems , 2024 . URL: https://arxiv.org/abs/2403.01342.

[13]

Zhang , S. Mao,

Ge ,

Wang , A. de Wynter, LLM as a mastermind: A survey of strategic reasoning with large language models , 2023 . URL: https:// arxiv.org/abs/2404.01230v1. arXiv: 2404 . 01230 .

[14]

Chen ,

Ma ,

Wang ,

W. W.

Cohen , Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks , Transactions on Machine Learning Research ( 2023 ). URL: https://openreview.net/forum? id= YfZ4ZPt8zd .

[15]

Pan ,

Albalak ,

Wang ,

W. Y.

Wang , Logiclm: Empowering large language models with symbolic solvers for faithful logical reasoning , 2023 . arXiv: 2305 . 12295 .

[16]

Clark ,

Tafjord ,

Richardson , Transformers as soft reasoners over language , in: 29th International Joint Conference on Artificial Intelligence, IJCAI'20 , 2021 , pp. 3882 - 3890 .

[17]

Wang ,

Wei ,

Schuurmans ,

Le , E. H.

hsin

Chi , D.

Zhou , Self-consistency improves chain of thought reasoning in language models , ArXiv ( 2022 ). URL: https://arxiv.org/abs/2203.11171.

[18]

Acher ,

J. G.

Duarte , J.-M. Jézéquel , On programming variability with large language model-based assistant , in: Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A, SPLC '23 , Association for Computing Machinery, New York, NY, USA, 2023 , p. 8 - 14 . URL: https://doi.org/10.1145/3579027.3608972.

[19]

Funk ,

Hosemann ,

J. C.

Jung ,

Lutz , Towards ontology construction with language models , 2023 . URL: https://arxiv.org/abs/2309.09898.

[20]

Brown ,

Mann ,

Ryder ,

Subbiah ,

J. D.

Kaplan ,

Dhariwal ,

Neelakantan ,

Shyam ,

Sastry ,

Askell , et al., Language models are few-shot learners , Advances in neural information processing systems 33 ( 2020 ) 1877 - 1901 .