<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Large Language Models as Knowledge Engineers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Florian Brand</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukas Malburg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Bergmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Artificial Intelligence and Intelligent Information Systems, University of Trier</institution>
          ,
          <addr-line>54296 Trier</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>German Research Center for Artificial Intelligence (DFKI), Branch University of Trier</institution>
          ,
          <addr-line>54296 Trier</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <issue>2024</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Many Artificial Intelligence (AI) systems require human-engineered knowledge at their core to reason about new problems based on this knowledge, with Case-Based Reasoning (CBR) being no exception. However, the acquisition of this knowledge is a time-consuming and laborious task for the domain experts who provide the needed knowledge. We propose an approach to help in the creation of this knowledge by leveraging Large Language Models (LLMs) in conjunction with existing knowledge to create the vocabulary and case base for a complex real-world domain. We find that LLMs are capable of generating knowledge, with results improving by using natural language and instructions. Furthermore, permissively licensed models like CodeLlama and Mixtral perform similar or better than closed state-of-the-art models like GPT-3.5 Turbo and GPT-4 Turbo.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Case-Based Reasoning</kwd>
        <kwd>Knowledge Engineering</kwd>
        <kwd>Knowledge Acquisition Bottleneck</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Prompting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Foundations and Related Work</title>
      <p>
        In this section, we give an overview of the concept of knowledge in CBR based on the knowledge
containers introduced by Richter [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (see Section 2.1). Subsequently, we introduce Large Language
Models in Section 2.2 and prompting in Section 2.3. Finally, the section closes with an overview of
related work (see Section 2.4).
      </p>
      <sec id="sec-2-1">
        <title>2.1. Knowledge in Case-Based Reasoning</title>
        <p>
          The most prominent way to represent knowledge in a CBR system is to view knowledge in the form of
knowledge containers, as proposed by Richter [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]:
        </p>
        <p>
          Vocabulary Container: The vocabulary container stores knowledge about the description of the
elements which describe the objects and the elements of the cases itself, for example the word “position”
describing the locality of an object [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Therefore, the used vocabulary encodes a lot of knowledge and
is the basic for any knowledge-based system. Furthermore, diferent descriptions can (and should) be
used to describe objects depending on the given task.
        </p>
        <p>
          Similarity Container: The similarity container consists of the knowledge needed to determine
the similarity between two cases to approximate the utility to reuse the solution suitable for the new
problem. There are diferent possibilities to calculate the similarity between cases, which range from
simple symbolic similarities like the equality of two objects, to the usage of weighted similarity measures
for complex objects. The similarity is used for the retrieval step in the CBR cycle and thus requires
knowledge about the problem at hand as well as possible solutions [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Case Base Container: As the name implies, the case base container contains the experiences, i.e., the
cases. The case base grows over time and can be created from experiences or be completely synthetic.
This container is therefore the cornerstone of any CBR system and the main source of knowledge [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Adaptation Container: To adapt existing cases to new problems, the knowledge stored inside the
adaptation container can be used. There are several algorithms to adapt cases to the given problem to
adapt cases in a semi-automatic or fully automatic manner [
          <xref ref-type="bibr" rid="ref13 ref14 ref7">13, 14, 7</xref>
          ]. Consequently, the adaptation
container stores the information needed to execute the given algorithm(s) in the required format.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Large Language Models</title>
        <p>
          Language Modeling is one of the biggest areas of Natural Language Processing, with LLMs being the
latest advancement in this area. While they can be fine-tuned to be suitable for any downstream task,
these tasks can also be treated as a “text-to-text” problem, i.e., the model produces text as an output
instead of the desired label for the downstream task [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This allows to apply the same model and
training task for every downstream task imaginable and marks a paradigm shift. Instead of fine-tuning
a model for a given downstream task, a suficiently capable model just needs to be prompted, which
in practice means to change the textual input to the model. The biggest advantage of prompting over
ifne-tuning is the ease of usage: Fine-tuning is a laborious task which requires the acquisition of a
suitable training dataset and the training process itself, whereas a change in the prompt instantly results
in a change of the output. Furthermore, prompting is computationally more efective than fine-tuning
an existing model [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. To make the prompting of LLMs easier for users, pre-trained models can be
ifne-tuned to follow instructions, which yields an instruction-tuned LLM [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ]. Besides the easier
usage, instruction-tuned models are more controllable and significantly more preferred by users over
the pre-trained base models [
          <xref ref-type="bibr" rid="ref16 ref18">18, 16</xref>
          ]. Furthermore, instruction-tuning improves the performance of the
model over a range of benchmarks, including held-out ones [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Structuring of Prompts</title>
        <p>
          As established in the previous section, prompting has become the preferred and superior usage pattern
of LLMs. However, the exact wording of the prompt itself has severe implications to the performance of
the model for the given task, which results in various techniques for prompting [
          <xref ref-type="bibr" rid="ref19 ref20 ref21">19, 20, 21</xref>
          ]. There exist
several strategies for prompting, as seen in Figure 2.3. The simplest prompts are zero-shot prompts, in
which the prompt has no additional details about the tasks and the model completes the given input:
code only
le natural
ty language
S only
natural
language
+ code
        </p>
        <p>Prompt</p>
        <p>Instruction | Task
Few-Shot Example
Few-Shot Example
Few-Shot Example</p>
        <p>Input
Input</p>
        <p>Input
Instruction | Task
Zero-Shot CoT</p>
        <p>Steps
Steps
Steps
CoT</p>
        <p>Solution
Solution
Solution</p>
        <sec id="sec-2-3-1">
          <title>Zero-Shot Prompt</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Who wrote the book the origin of species?</title>
        </sec>
        <sec id="sec-2-3-3">
          <title>Charles Darwin</title>
          <p>In the above example, the first half of the box is the prompt, whereas the second half is the completion
of the LLM.</p>
          <p>
            LLMs are capable of in-context learning, which describes the conditioning of a model with natural
language instructions and a few demonstrations, also referred to as few-shot prompting [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]:
In-Context Learning Prompt
          </p>
        </sec>
        <sec id="sec-2-3-4">
          <title>Translate English to French:</title>
          <p>sea otter =&gt; loutre de mer
peppermint =&gt; menthe poivrée
plush girafe =&gt; girafe peluche
cheese =&gt;
fromage</p>
          <p>
            The given prompt consists of a natural language instruction ("Translate English to French”) and three
examples, making it a three-shot prompt. In-context learning, a meta-learning technique, is similar to
human intelligence, as argued by Lake et al. [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ]. Humans can learn a new concept from only a single
or a handful of examples, whereas many machine learning approaches need a lot of data points.
          </p>
          <p>
            Building upon in-context learning, Chain-of-Thought (prompting) prompting results in improvements
over a range of benchmarks [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ]. In this technique, the model is provided not only the answer to a
question, but as the steps to derive the answer as well:
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls.
How many tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer
is 11.
          </p>
          <p>Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many
apples do they have?
The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They
bought 6 more apples, so they have 3 + 6 = 9. The answer is 9.</p>
          <p>
            Kojima et al. [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] introduce zero-shot CoT, a prompting technique which results in similar outputs to
CoT prompting, without the need of the steps being provided. Their technique appends “Let’s think
step by step.” to a prompt:
          </p>
        </sec>
        <sec id="sec-2-3-5">
          <title>Zero-Shot Chain-of-Thought Prompt</title>
          <p>Q: A juggler can juggle 16 balls. Half of the balls are golf balls, and half of the golf balls are blue.
How many blue golf balls are there?
A: Let’s think step by step.</p>
          <p>There are 16 balls in total. Half of the balls are golf balls. That means that there are 8 golf balls.
Half of the golf balls are blue. That means that there are 4 blue golf balls.</p>
          <p>
            While both CoT techniques improve the performance of LLMs over various benchmarks, the
explanations provided by the models may be wrong, with the final answer of the output not following
the reasoning of the explanation, indicating that the text and style of CoT is more important than the
factual correctness of the output [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ].
          </p>
          <p>
            Besides the content of the prompt itself, the style of the prompt has implications on the performance,
too. This can be divided into three categories: code-only, natural language only, or a mixture of both.
Depending on the task at hand, the choice of styling is restricted. The code-only style obviously can
only be applied in code-related tasks, such as the generation of SQL statements. These prompts can also
be reworded into natural language, which intuitively should result in better results as LLMs are trained
on a giant corpus of natural language. However, Sun et al. [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] found that prompting in a code-only
style with a natural language instruction yields superior results over natural language prompts for
generating SQL queries. A mixture of both is used by Zhang et al. [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] and Singh et al. [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ], where
natural language is used in a coding syntax, such as Python, without the code being executable or
supplying necessary information:
          </p>
          <p>Code-only Prompt
[step1, step0]
instructions = "Given a goal and two steps, predict the order to do the steps to achieve the goal"
goal = "Draw a Simple Teddy Bear"
step0 = "erase unnecessary lines"
step1 = "draw a shirt for the bear"
order_of_execution =</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Related Work</title>
        <p>
          LLMs have been successfully applied in the generation of knowledge in various domains. Guan et al.
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and Oswald et al. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] use LLMs to generate knowledge for AI planning methods, in particular
PDDL domains. Both first translate actions from an existing domain from PDDL into natural language
and task the LLM to re-create the given action for the domain. Liu et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] use LLMs to generate
PDDL problems by describing the problem in natural language and supplying an exemplary PDDL
domain. Gestrin et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] use LLMs to generate both a PDDL domain and the matching PDDL problem
by supplying a task description in natural language. Furthermore, LLMs have also been successfully
applied in the generation of process models in the form of BPMN1, which is also an important topic in
the field of Process-Oriented Case-Based Reasoning [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. Kourani et al. [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] utilize LLMs with diferent
prompting techniques to generate workflows from textual descriptions. Feedback from users can be
used to iteratively refine the workflows, which can be transformed into Petri nets or BPMN models.
Bernardi et al. [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] fine-tune LLMs to generate process models, with the prompts being further enhanced
by retrieving relevant chunks of the BPMN representation format.
        </p>
        <p>
          In CBR, the main focus lies upon the acquisition of adaptation knowledge, which is one of the hardest
challenges in CBR [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. To combat this, a wide range of methods to automatically adapt cases have been
proposed for numerous domains [
          <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4, 5, 6, 7, 8</xref>
          ]. These methods include algorithmic approaches, which
utilize the case base to subsequently apply patterns onto new problems or generalize and specialize
cases, to Machine Learning-based methods or an approach based on Reinforcement Learning. LLMs
have not been applied in the creation of CBR knowledge, which will be addressed in this paper.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Engineering Knowledge for Case-Based Reasoning with Large</title>
    </sec>
    <sec id="sec-4">
      <title>Language Models</title>
      <p>As laid out in the previous chapter, prompting LLMs is the superior approach for using LLMs. This
is especially apparent in the process of creating new knowledge, where the data for the new domain
is sparse or even non-existent. We present a general approach to create knowledge in CBR, which is
domain-agnostic, and show the applicability of the approach in an applied domain. Figure 2 gives an
overview of the approach. Knowledge, which can be in any format, is given as part of the prompt to the
LLM, which then generates knowledge in the desired output format.</p>
      <p>Domain Expert
Knowledge-Based</p>
      <p>System
Existing CBR</p>
      <p>System</p>
      <p>Existing
Knowledge</p>
      <p>Prompt</p>
      <p>Large
Language</p>
      <p>Model</p>
      <p>New
Knowledge</p>
      <p>Case Base
Vocabulary
Similarity
Adaptation
The concept has four diferent parameters which are changeable: The new knowledge to be generated,
the existing knowledge, the prompt, and the LLM.</p>
      <p>New Knowledge: The knowledge to generate should represent the knowledge from any of the
knowledge containers introduced in Section 2.1. We expect that every container is suitable to be
generated by this method: The vocabulary container can be created and extended with a LLM, but also
parts of the similarity container or the case base can be created. For the adaptation container, the LLM
can be used to generate knowledge for any semi-automatic or fully automatic adaptation processes.
The output from the LLM should be in a format which is directly usable by the CBR system, i.e., the
LLM should generate knowledge in the form of code to be used by the CBR system, such as knowledge
in the form of XML.</p>
      <p>Existing Knowledge: The existing knowledge used for the prompt can be in various formats,
stemming from diferent sources. For example, the prompt can be written entirely by domain experts,
with the needed knowledge provided by them. Alternatively, a knowledge base can be queried, with
the outputs being inserted into the prompt. It is also possible to use knowledge from an existing CBR
system to either expand the CBR system or to use another CBR system from a similar domain as a
reference point, similar to transfer learning. Furthermore, it is also possible to include multiple sources
of knowledge: Knowledge from an existing CBR system can be used and supplemented with knowledge
by a domain expert to further specify the desired goal.</p>
      <p>Prompt: There exist several strategies to design the prompt itself, as shown in Section 2.3. However,
not all aspects are useful in the context of prompting. Prompts for the generation of knowledge must
involve few-shot examples which denote the input, i.e., existing knowledge, and the desired solution,
i.e., the new knowledge from any knowledge container. Thereby, every prompt contains at least one
example with existing knowledge and the desired target, e.g., a part of the vocabulary container.</p>
      <p>In theory, a zero-shot prompt is possible, but the LLM would be unable to generate valid knowledge
due to the lack of information about the scheme. Thus, zero-shot prompts are not a valid strategy for
the generation of valid knowledge.</p>
      <sec id="sec-4-1">
        <title>Zero-Shot Prompt Example</title>
        <p># Example 1
## Conditions:
- &lt;Existing Knowledge&gt;
Generate a valid CBR model.</p>
        <p>Aside from the given examples, a prompt may include an instruction and use Zero-Shot CoT at the
end of the prompt. For the style, natural language only is not useful, as the goal is to generate CBR
knowledge directly, which is supplied as part of the few-shot example. The other styles encode the
inserted knowledge either in the form of code or describe it using natural language.</p>
        <p>Large Language Model: There are no special requirements for the LLM and thus, every LLM could
possibly be used for the task. However, as the prompts themselves use some examples due to the
few-shot setting, the LLM should have a bigger context size to fit the examples.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Exemplary Application Scenario for Generating Knowledge in a</title>
    </sec>
    <sec id="sec-6">
      <title>Cyber-Physical Domain</title>
      <p>As a case study, the concept is implemented in the cyber-physical domain, which is concerned about
the production of workpieces in a learning factory. The CBR system of the domain stores the processes
to execute in the factory and adapts these in case a machine breaks. The subsequent sections introduce
the domain and the deployed CBR system, followed by the implementation of the approach for this
domain.</p>
      <sec id="sec-6-1">
        <title>4.1. Introduction to the Cyber-Physical Domain</title>
        <p>
          The smart factory, deployed at the University of Trier, is depicted in Figure 3. As an abstraction over a
real-world factory, it can be used as a Learning Factory [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] that provides insights on a smaller scale,
which can then be transferred to solve real-world problems. The factory at hand consists of two shop
lfoors, which contain five identical workstations in a mirrored layout: A sorting machine with color
detection, a high-bay warehouse, an oven connected to a milling through a workstation transport
robot and a vacuum gripper robot. Additionally, there are machines unique to a singular shop floor:
The first shop floor features a human workstation, which is capable of performing a range of possible
tasks and a punching machine, while the second floor possess a drilling machine. The machines are
connected through various conveyor belts, providing a route from the milling machine through the
sorting machine to the drilling (or punching) machine, the vacuum gripper and the human workstation,
which can reach almost all positions by simulating the transport done by a human. Furthermore, the
machines are equipped with various sensors, including NFC readers/writers, light barriers, switches,
and pressure sensors to provide insights about the current state of the machines [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. To control the
factory, a service-based approach is used, with each service representing multiple actions of one or
multiple machines. As an example, a transport service combines several motor actuations and sensor
observations into a single function. These services and the machines with their motors and sensors are
modelled in an ontology [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]. Furthermore, the services are semantically enriched in the ontology with
pre- and postconditions. For transporting a workpiece, the preconditions state that the machine needs to
be ready to execute any service, a workpiece must lay at the starting position while the ending position
is empty. These web services are also modeled in a CBR vocabulary in ProCAKE2 [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ], which supports
size: 35mm
quantity: 8
        </p>
        <p>Drill Holes
Transport</p>
        <p>Transport
Sheet Metal</p>
        <p>Sheet Metal</p>
        <p>Sheet Metal</p>
        <p>
          Sheet Metal
holes: 8 x 35
holes: 8 x 35
task node
data node
control-flow edge
data-flow edge
semantic
description
all knowledge containers [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. The vocabulary container is represented by a domain model in ProCAKE.
For the factory, the model, which is engineered entirely by domain experts, contains the representation
of the workpiece properties to be manufactured and the basic properties of the physical factory, such as
the positions of each machine. Furthermore, the semantic web services are described, albeit on a more
general level. To control the factory, the services can be used as part of semantic graphs, which model
not only the tasks to be executed in the factory, but also the state of the workpiece. Such a workflow is
depicted in Figure 4. Task nodes, represented by white squares, denote the services to execute in the
real-world factory. These task nodes represent, together with the semantic descriptors (gray squares),
the semantic web services introduced earlier. Data nodes, shown as white circles, represent the state of
the workpiece during this point in the workflow and are subsequently consumed and produced by each
task node. The given semantic graph therefore shows the transport of a metal sheet, in which eight
holes get drilled into, before it gets transported again.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>4.2. Implementing the Knowledge-Generation Process</title>
        <p>Algorithm 1 shows the overall algorithm that is applied to the selected application scenario, with the
steps being described in more detail in subsequent sections.</p>
        <p>Algorithm 1: Generate and validate knowledge from a LLM</p>
        <p>Input : Examples , LLM  and Prompt Template 
_ ← SELECT_RANDOM(, )
if use_natural_language then</p>
        <p>_ ← TRANSLATE(used_examples)
end
 ←  ∪ _
 ←  ()
 ← EXTRACT_TARGET(output)</p>
        <p>VERIFY_TARGET(target)</p>
      </sec>
      <sec id="sec-6-3">
        <title>4.3. Selection of LLMs</title>
        <p>For the implementation, diferent LLMs, which are considered state-of-the-art during the time of writing,
are used. The following models are selected:
GPT-3.5 Turbo from OpenAI, specifically gpt-3.5-turbo-1106.</p>
        <p>
          GPT-4 Turbo [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] from OpenAI, specifically gpt-4-1106-preview, which is considered
state-of-theart at the time of writing.
        </p>
        <p>
          CodeLlama [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ] from META, which is the best permissively licensed coding model. The evaluation
uses two variants: The raw, pre-trained version, referred to as CodeLlama, and the instruction-tuned
version, referred to as CodeLlama-Instruct. Both models are used in their 34B parameter variant.
Mixtral [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ], which is considered the best open, general model at the time of writing. The
instructiontuned variant, Mixtral-8x7B-Instruct-v0.1 is used.
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>4.4. Selecting and Translating Examples</title>
        <p>
          At first, n examples are chosen from the knowledge base, i.e., semantic web services from the ontology
or semantic graphs from the case base with n being the number of shots in the prompt, so a 5-shot
prompt uses 5 examples, whereas a 3-shot prompt uses 3 examples. To retrieve the knowledge from the
ontology, SPARQL [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] queries are used. SPARQL is a query language with a SQL-like syntax to retrieve
information from an ontology or to even construct a new RDF graph. If a web service is retrieved, the
matching representation of this web service in the CBR model is supplied to the prompt as well. As an
optional step, the knowledge can be translated into natural language to change the style of the prompt,
similar to the approach by Guan et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>(?Name = vgr_transport_to_pm_1_sink_pos)
(?Type = &lt;#preconditionsImplyOverAll&gt;)
(?subjectName = VGR_1) (?predicateName = isReady)
(?objectName = true) (?paramValue = )
(?Type = &lt;#postconditionsImplyAtEnd&gt;)
(?subjectName = BusinessKey) (?predicateName = at)
(?objectName = pm_1_sink_pos) (?paramValue = pm_1_sink_pos)</p>
        <sec id="sec-6-4-1">
          <title>Listing 1: SPARQL result of a query. Names shortened for brevity.</title>
          <p>Over all, the VGR_1 should be ready.</p>
          <p>At the end, be at the pm_1_sink_pos.</p>
          <p>The name of the service is vgr_transport_to_pm_1_sink_pos</p>
        </sec>
        <sec id="sec-6-4-2">
          <title>Listing 2: SPARQL result of a query</title>
          <p>Listing 1 shows an excerpt of the results of a SPARQL query of the ontology, whereas Listing 2 shows
the same information, but in natural language. The same web service as modeled in the vocabulary
container of the CBR model is shown in Listing 3.</p>
          <p>&lt;StringClass name="end"
superClass="ShopFloorPositions"&gt;
&lt;InstanceEnumerationPredicate&gt;</p>
          <p>&lt;Value v="pm_1_sink_pos"/&gt;
&lt;/InstanceEnumerationPredicate&gt;
&lt;/StringClass&gt;
&lt;AggregateClass name="vgr_transport_to_pm_1_sink_pos"&gt;
&lt;Property name="url"&gt;/vgr/transport_to_pm_1_sink_pos&lt;/Property&gt;
&lt;Property name="outputProperties"&gt;</p>
          <p>&lt;Property name="position" value="pm_1_sink_pos"/&gt;
&lt;/Property&gt;
&lt;/Property&gt;
&lt;/AggregateClass&gt;</p>
          <p>Listing 3: CBR representation of the query from the previous Listings</p>
          <p>As mentioned in Section 2, the targets are not translated, as the LLM-generated output should be
in the format of the knowledge, i.e., directly usable XML for the CBR model. However, the workflows
stored in the case base are an exception to this rule: In ProCAKE, cases are stored in XML and result in
many tokens for the prompts when using the chosen models from Section 4.3. Therefore, workflows
from the case base are converted into a custom format based on YAML. This format encodes the task
nodes of a cyber-physical workflow and the data flow of the workflow. The workflow from Figure 4 as
represented in YAML is shown in Listing 4.
resource: dm_2
start: dm_2_pos
end: dm_2_sink_pos
quantity: 8
size: 35
3. Transport:
production_step: vgr_transport
parameters:
resource: vgr_2
start: dm_2_sink_pos
end: hbw_2_pos</p>
        </sec>
        <sec id="sec-6-4-3">
          <title>Listing 4: A NEST Workflow in YAML</title>
          <p>The conversion of NEST graphs from XML to YAML has substantial efects: A single workflow with
10 tasks encoded in XML uses roughly 10,000 tokens with the (Code)Llama and Mixtral tokenizer and
8,000 tokens when using the tokenizer for GPT-3.5 and GPT-4. Converting the workflow into YAML
results in roughly 1,300 tokens for Llama and Mixtral, whereas GPT-based models use 1,000 tokens. A
similar efect can be observed when translating the ontology into natural language, which results in
roughly 75% less tokens.</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>4.5. Prompt Templates</title>
        <p>As shown in Section 3, the acquisition of knowledge can be prompted in diferent styles, by (not) using
instructions and by (not) utilizing CoT. Therefore, there are three diferent prompting styles, with each
style being represented by a template.</p>
        <p>The basic template uses no instructions and directly inserts the examples.</p>
        <p>Prompt with no Instructions for the CBR model
# Example 1
## Conditions:
- &lt;Knowledge from Ontology&gt;
## Resulting CBR model:
&lt;CBR model in XML&gt;
&lt;Name of the service to generate&gt;
## Conditions:
- &lt;Knowledge from Ontology to generate&gt;
## Resulting CBR model:</p>
        <p>
          The template for the instruction prompt adds an instruction add the beginning of the prompt (“Your
task is to generate a CBR model for the given conditions”), whereas the CoT prompt contains the
instruction at the beginning and zero-shot CoT from Wei et al. [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] ("Let’s think step by step").3
For the case base, workflow(s) are supplied in the prompt template.
        </p>
        <sec id="sec-6-5-1">
          <title>Prompt with no Instructions for the case base # Example 1 &lt;Workflow in YAML&gt; # New Workflow:</title>
          <p>3All prompts are also in the repository at https://gitlab.rlp.net/iot-lab-uni-trier/iccbr-2024-cbr-llm-workshop</p>
        </sec>
      </sec>
      <sec id="sec-6-6">
        <title>4.6. Verifying LLM Outputs</title>
        <p>To evaluate the output from the LLM, the relevant target, i.e., the YAML workflow or the CBR model,
is extracted. This is done automatically by using regular expressions to find the relevant part(s). If
the extraction fails, the target is extracted manually. Then, the target is syntactically and semantically
validated using software-based verifiers. A software-based verifier gets the expected, gold label target
and the LLM-generated output, i.e., the CBR model or the workflow, and then checks for syntactic or
semantic equality. For the CBR model, the syntactic equality is checked by using the ProCAKE parser,
whereas the semantic equality is checked by using XMLUnit4. The YAML workflows are first converted
into their XML representation and then verified by ProCAKE parsers for syntactic and semantic equality.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Evaluation</title>
      <p>5.1. Setup
The implementation from the previous section is evaluated using an experiment. Section 5.1 introduces
the setup for the experiment, with Section 5.2 and Section 5.3 showing the results for the generation of
the vocabulary container and the case base, respectively. Section 5.4 discusses the results.
The overall setup follows the implementation from the previous Section. To assess the impact of prompts
and various parameters, diferent configurations for the prompts are used.</p>
      <p>Prompt Template
As introduced in Chapter 6, there are three diferent prompt templates, which use either no instructions,
use instructions and use instructions and CoT.</p>
      <p>Number of Shots
Furthermore, the number of examples is varied for the prompts. The examples stay the same across all
prompts, so every 5-shot prompt uses the same five examples every time.</p>
      <p>Usage of Natural Language
For the CBR model, the conditions from the ontology are either inserted directly or translated into
natural language. It is necessary to test all variations of the possible input variables to meaningfully
assess the impact of each variable. For example, if testing the impact of using natural language, it is
necessary to also test 1-, 3- and 5-shot prompts to assess the impact of natural language in various
settings, mitigating a possible bias in the selection of a specific prompt technique. Finally, this setup is
applied for testing every parameter described in the following.</p>
      <p>For the evaluation, each combination is run 5 times due to the probabilistic nature of LLM outputs.
We used 0.2 for the temperature and do not vary this parameter, as preliminary experiments showed a
negligible impact when changing it.</p>
      <sec id="sec-7-1">
        <title>5.2. Generation of a Vocabulary Container for CBR</title>
        <p>Figure 5 shows the impact of using natural language as the input compared to using the SPARQL outputs
directly. It can be seen that using natural language results in more syntactically and semantically correct
outputs, which is true for all models, including the coding-focused CodeLlama models. This could
be explained by the fact that CodeLlama is Llama 2, a general-purpose LLM, which is then further
trained on code samples. Overall, the GPT models outperform the permissively licensed models, which
becomes especially apparent when not using natural language, where Mixtral can only generate 53.33%
semantically valid outputs.</p>
        <p>Figure 6 shows the impact of the diferent prompting templates as introduced in Section 4.5. With
the exception of GPT-4, using instructions in the prompt results in significant better results, even for
the base, non-instruction-tuned CodeLlama. The CodeLlama models also show the biggest boost in
performance when using instructions, with the results increasing by 70%. Interestingly, using CoT
results in a worse performance for every model, both syntactically and semantically. For all models,
using no instructions at all yields better results than using CoT.</p>
        <p>The number of examples, as depicted in Figure 7, has little impact on the overall performance, except
for Mixtral, which is unable to generate valid CBR models with a single example. When using multiple
examples, the performance greatly improves.</p>
      </sec>
      <sec id="sec-7-2">
        <title>5.3. Generation of a Case Base for CBR</title>
        <p>For the case base, using no instructions results in the best performance, as can be seen in Figure 8.
The most surprising result is the performance of GPT-4 Turbo, which is unable to generate a single
semantically valid example. Looking at the outputs from GPT-4 Turbo, the outputs fall into roughly
two categories: The LLM output is the request for more clarifications or the output is too creative,
hallucinating machines and services which do not exist, such as a non-existent “cooling station”. A
possible explanation for this result could lie in the instruction-tuning and RLHF phase of the training
phase of GPT-4 Turbo, which could be focused on human conversations and creativity, resulting in
those outputs for this task.</p>
        <p>The number of examples has impacts on the performance, as shown in Figure 9. More examples
results in better performance for GPT-3.5 Turbo and Mixtral, whereas both CodeLlama variants perform
worse when using three examples. Parallel to the other results, GPT-4 Turbo is unable to generate valid
example cases.</p>
      </sec>
      <sec id="sec-7-3">
        <title>5.4. Discussion</title>
        <p>
          The results from the previous sections give numerous insights. Overall, it is mostly possible for LLMs
to generate knowledge for the cyber-physical domain. As the approach is general and not specific for
the domain, we expect the results to be transferable to other domains as well. This assumption on
transferability of the general approach has already been experimentally demonstrated in other settings,
such as AI planning [
          <xref ref-type="bibr" rid="ref43 ref9">9, 43</xref>
          ]. As a prerequisite for this, the domain needs at least some hand-engineered
knowledge to be used as example(s) in the given prompts. The usage of natural language improves
the results across multiple tasks, especially when the wording of the prompts is meaningful relative to
the modeled domain and not obfuscated, as shown by similar works [
          <xref ref-type="bibr" rid="ref11 ref43 ref9">9, 43, 11</xref>
          ]. This also applies to
CodeLlama, a coding-focused model, in which the base model being a general purpose LLM likely plays
a role. Furthermore, adding instructions to prompts result in improved performance, whereas the usage
of zero-shot CoT gives mixed results, highlighting the need of experimenting with diferent prompting
techniques when creating knowledge. The number of examples in the prompt plays an important role,
albeit a smaller one. However, the creation of the vocabulary container contained similar examples.
When supplying the prompt with similar examples while the task is the creation of vocabulary which
is dissimilar, the performance drops significantly with models being unable to successfully create the
target5. Furthermore, permissively licensed models are close to the performance of closed models, even
surpassing the performance when generating further examples for a case base. However, the evaluation
is dependent on the software-based verifier, which can only check for equality due to their respective
implementations, resulting in overly harsh results due to the binary feedback provided by the verifiers.
When inspecting failure cases manually, numerous failures can be fixed by a human rather fast, as most
errors are wrong variables and missing/superfluous XML tags.
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6. Conclusion and Future Work</title>
      <p>
        We present an approach using Large Language Models (LLMs) to generate knowledge for Case-Based
Reasoning (CBR) systems. Similar to related work [
        <xref ref-type="bibr" rid="ref10 ref9">10, 9</xref>
        ], we propose the utilization of existing
knowledge with examples to generate new, but similar knowledge ready-to-use in CBR. Our findings
indicate that LLMs are capable of generating knowledge in this setting, but there exists a high variance
in the results based on the prompting techniques used. Furthermore, permissively licensed models are
capable of generating knowledge with a similar or even better performance than closed models.
      </p>
      <p>
        For future work, diferent domains beside the used cyber-physical domain should be investigated.
In this context, we examine further, more practice-oriented application scenarios in which LLMs can
help to remedy the high eforts for knowledge acquisition and engineering. At a technical standpoint,
diferent prompting strategies can be explored with a focus on the selection of the provided examples.
It could also be researched whether it is possible to create knowledge without the need of providing
(many) examples in the prompt itself. Moreover, there is also a trend emerging to eliminate the need to
manually craft prompt with frameworks such as DSPy [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ]. Furthermore, an interesting direction to
research is to incorporate the feedback from the software-based verifiers into the knowledge-creation
process. One possible approach to achieve this is to prompt the LLM with the initial prompt, the initial
completion of the LLM, and the feedback from the verifier, thus making a multi-turn conversation,
similar to the back prompting approach by Guan et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Furthermore, our case study uses two of the
four knowledge containers. While the approach proposed is general, future work could research the
applicability to the similarity and adaptation containers. Finally, we want to integrate a LLM into the
ProCAKE framework [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], helping CBR users to develop more easily customized CBR applications by
adding a chat-based assistant to guide domain experts through the process of knowledge acquisition
and modeling. Moreover, a LLM could augment existing CBR domains by using the existing vocabulary
or case base to extend or generate new knowledge. In this context, a LLM can help users to generate
suitable adaptation rules [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ] in a domain, which is often a laborious and time-consuming task.
5All results are provided in the repository (https://gitlab.rlp.net/iot-lab-uni-trier/iccbr-2024-cbr-llm-workshop)
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Shu-Hsien</surname>
            <given-names>Liao</given-names>
          </string-name>
          ,
          <article-title>Expert System Methodologies and Applications-a Decade Review from 1995 to 2004, Expert Systems with Applications 28 (</article-title>
          <year>2005</year>
          )
          <fpage>93</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aamodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <source>Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches</source>
          <volume>7</volume>
          (
          <year>1994</year>
          )
          <fpage>39</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hanney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <article-title>The Adaptation Knowledge Bottleneck: How to Ease it by Learning from Cases</article-title>
          ,
          <source>in: 2nd ICCBR</source>
          , Springer,
          <year>1997</year>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          , G. Müller,
          <article-title>Similarity-Based Retrieval and Automatic Adaptation of Semantic Workflows, in: Synergies Between Knowledge Engineering</article-title>
          and Software Engineering,
          <source>Advances in Intelligent Systems and Computing</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Brand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          , et al.,
          <article-title>Using Deep Reinforcement Learning for the Adaptation of Semantic Workflows</article-title>
          ,
          <source>in: 31st ICCBR Workshops</source>
          , volume
          <volume>3438</volume>
          , CEUR-WS.org,
          <year>2023</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <source>Adaptive Management of Cyber-Physical Workflows by Means of Case-Based Reasoning and Automated Planning, in: 26th EDOC Workshops, LNBIP</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jalali</surname>
          </string-name>
          , et al.,
          <source>Learning Adaptations for Case-Based Classification: A Neural Network Approach, in: 29th ICCBR</source>
          , volume
          <volume>12877</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>279</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zeyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Adaptation of Scientific Workflows by Means of ProcessOriented Case-Based Reasoning</article-title>
          ,
          <source>in: 27th ICCBR</source>
          , volume
          <volume>11680</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>388</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Valmeekam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sreedharan</surname>
          </string-name>
          , et al.,
          <source>Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>14909</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>LLM+P: Empowering Large Language Models with Optimal Planning Proficiency</surname>
          </string-name>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>11477</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kuhlmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Seipp,</surname>
          </string-name>
          <article-title>NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2405</volume>
          .
          <fpage>04215</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>M. M. Richter</surname>
            ,
            <given-names>R. O.</given-names>
          </string-name>
          <string-name>
            <surname>Weber</surname>
          </string-name>
          ,
          <source>Case-Based Reasoning: A Textbook</source>
          , Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          , Experience Management: Foundations,
          <string-name>
            <given-names>Development</given-names>
            <surname>Methodology</surname>
          </string-name>
          , and
          <source>InternetBased Applications</source>
          , volume
          <volume>2432</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <source>Workflow Modeling Assistance by Case-based Reasoning</source>
          , Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          , et al.,
          <article-title>Exploring the Limits of Transfer Learning with a Unified Text-to-</article-title>
          <string-name>
            <surname>Text</surname>
            <given-names>Transformer</given-names>
          </string-name>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <source>Instruction Tuning for Large Language Models: A Survey</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2308</volume>
          .
          <fpage>10792</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Finetuned Language Models Are Zero-Shot</surname>
            <given-names>Learners</given-names>
          </string-name>
          , in: 10th ICLR,
          <article-title>OpenReview</article-title>
          .net,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , et al.,
          <source>Training Language Models to Follow Instructions with Human Feedback</source>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2203</volume>
          .
          <fpage>02155</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Pre-Train</surname>
          </string-name>
          , Prompt, and
          <article-title>Predict: A Systematic Survey of Prompting Methods in Natural Language Processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bartolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moore</surname>
          </string-name>
          , et al.,
          <article-title>Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity</article-title>
          ,
          <source>in: 60th ACL, ACL</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>8086</fpage>
          -
          <lpage>8098</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Razeghi</surname>
          </string-name>
          , R. L.
          <string-name>
            <surname>Logan</surname>
            <given-names>IV</given-names>
          </string-name>
          , et al.,
          <article-title>AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts</article-title>
          , in: EMNLP Conference, ACL,
          <year>2020</year>
          , pp.
          <fpage>4222</fpage>
          -
          <lpage>4235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Language Models Are Few-Shot</surname>
            <given-names>Learners</given-names>
          </string-name>
          , in: NeurIPS Conference,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <article-title>Human-Level Concept Learning through Probabilistic Program Induction</article-title>
          ,
          <source>Science</source>
          <volume>350</volume>
          (
          <year>2015</year>
          )
          <fpage>1332</fpage>
          -
          <lpage>1338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          , et al.,
          <source>Chain-of-Thought Prompting Elicits Reasoning in Large Language Models</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2201</volume>
          .
          <fpage>11903</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Large Language Models Are Zero-Shot Reasoners</surname>
          </string-name>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2205</volume>
          .
          <fpage>11916</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yazdanbakhsh</surname>
          </string-name>
          , Text and Patterns: For Efective Chain of Thought, It Takes Two to Tango,
          <year>2022</year>
          . arXiv:
          <volume>2209</volume>
          .
          <fpage>07686</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Arik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakhost</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>SQL-PaLM: Improved Large Language Model</surname>
          </string-name>
          <article-title>Adaptation for Text-to-</article-title>
          <string-name>
            <surname>SQL</surname>
          </string-name>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2306</volume>
          .
          <fpage>00739</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Dugan,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          , et al.,
          <source>Exploring the Curious Case of Code Prompts</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>13250</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>I.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Blukis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mousavian</surname>
          </string-name>
          , et al.,
          <source>ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models</source>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2209</volume>
          .
          <fpage>11302</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Oswald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kokel</surname>
          </string-name>
          , et al.,
          <article-title>Large Language Models as Planning Domain Generators (Student Abstract)</article-title>
          ,
          <source>in: AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Minor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Recio-García</surname>
          </string-name>
          ,
          <article-title>Process-oriented case-based reasoning</article-title>
          ,
          <source>Information Systems</source>
          <volume>40</volume>
          (
          <year>2014</year>
          )
          <fpage>103</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kourani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuster</surname>
          </string-name>
          , et al.,
          <source>ProMoAI: Process Modeling with Generative AI</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2403</volume>
          .
          <fpage>04327</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>M. L. Bernardi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Casciani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Cimitile</surname>
          </string-name>
          , et al.,
          <article-title>Conversing with Business Process-Aware Large Language Models: The BPLLM Framework, Preprint</article-title>
          , In Review,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Seiger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <source>Using Physical Factory Simulation Models for Business Process Management Research</source>
          , volume
          <volume>397</volume>
          <source>of LNBIP</source>
          , Springer,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>E.</given-names>
            <surname>Abele</surname>
          </string-name>
          , G. Chryssolouris,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sihn</surname>
          </string-name>
          , et al.,
          <article-title>Learning Factories for Future Oriented Research and Education in Manufacturing</article-title>
          ,
          <source>CIRP Annals 66</source>
          (
          <year>2017</year>
          )
          <fpage>803</fpage>
          -
          <lpage>826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Converting semantic web services into formal planning domain descriptions to enable manufacturing process planning and scheduling in industry 4.0</article-title>
          ,
          <string-name>
            <surname>EAAI</surname>
          </string-name>
          126 (
          <year>2023</year>
          )
          <fpage>106727</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Grumbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          , et al.,
          <source>ProCAKE: A Process-Oriented Case-Based Reasoning Framework, in: 27th ICCBR Workshops</source>
          , volume
          <volume>2567</volume>
          , CEUR-WS.org,
          <year>2019</year>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultheis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zeyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>An Overview and Comparison of Case-Based Reasoning Frameworks</article-title>
          ,
          <source>in: 31st ICCBR</source>
          , volume
          <volume>14141</volume>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , New Models and Developer Products Announced at DevDay, https://openai.com/blog/new-models-and
          <article-title>-developer-products-announced-at-</article-title>
          <string-name>
            <surname>devday</surname>
          </string-name>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>B.</given-names>
            <surname>Rozière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gloeckle</surname>
          </string-name>
          , et al.,
          <source>Code Llama: Open Foundation Models for Code</source>
          , arXiv (
          <year>2023</year>
          ). arXiv:
          <volume>2308</volume>
          .
          <fpage>12950</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roux</surname>
          </string-name>
          , et al.,
          <source>Mixtral of Experts</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2401</volume>
          .
          <fpage>04088</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>SPARQL</given-names>
            <surname>Query Language for</surname>
          </string-name>
          <string-name>
            <surname>RDF</surname>
          </string-name>
          , https://www.w3.org/TR/rdf-sparql-query/,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>K.</given-names>
            <surname>Valmeekam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sreedharan</surname>
          </string-name>
          , et al.,
          <source>On the Planning Abilities of Large Language Models - A Critical Investigation</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>15771</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maheshwari</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines</surname>
          </string-name>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>03714</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Improving Complex Adaptations in Process-Oriented CaseBased Reasoning by Applying Rule-Based Adaptation</article-title>
          ,
          <source>in: 32nd ICCBR, Lecture Notes in Computer Science</source>
          , Springer,
          <year>2024</year>
          . Accepted for Publication.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>