<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Model⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Rachmann</string-name>
          <email>a.rachmann@cbs.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CBS International Business School</institution>
          ,
          <addr-line>Hammer Landstraße 89, 41460 Neuss</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Condori-Fernández</institution>
          ,
          <addr-line>O. Dieste, R. Guizzardi, K. M. Habibullah, A. Perini, A. Susi, S. Abualhaija, C. Arora, D. Dell'Anna, A</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ferrari</institution>
          ,
          <addr-line>S. Ghanavati, F. Dalpiaz, J. Steghöfer, A. Rachmann, J. Gulden, A. Müller, M. Beck, D. Birkmeier, A. Herrmann</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>In: D. Mendez</institution>
          ,
          <addr-line>A. Moreira, J. Horkof, T. Weyer, M. Daneva, M. Unterkalmsteiner, S. Bühne, J. Hehn, B. Penzenstadler, N</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>P. Mennig, K. Schneider. Joint Proceedings of REFSQ-2024 Workshops, Doctoral Symposium, Posters &amp; Tools Track</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>AI supported Creativity is a hot debated topic, due to the rise of LLMs. This paper explores a prototypical implementation of the classic Six Thinking Hats method via a LLM, using the OpenAI API. The results suggest that such AI support is useful in a creative process; but will not replace the human creativity.</p>
      </abstract>
      <kwd-group>
        <kwd>ai supported creativity</kwd>
        <kwd>six thinking hats</kwd>
        <kwd>large language model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>https://cbs.de/ (A. Rachmann)</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related works</title>
      <p>Related work from the view of requirements engineering such as [11] describe the usage of
LLM in each major stage of requirements engineering according to [12]. For the elicitation
phase, a single-shot prompt is used. This is comparable to the webbrowser prompt in section
4.2 of this paper. This point of view to this topic is not well explored; first experiments such as
[13] are being carried out.</p>
      <p>Related work from the view of LLM usage is in full bloom. In the ChatGPT hype, the
development of prompting strategies such as [14], [15], [16], [17] provide standardized approaches to
prompting. These approaches take on the technical usage of prompting, such as the ”write clear
instructions”, ”provide reference texts” [17]. Though these are very useful, these approaches do
often not tackle complex situations of human conversations.</p>
      <p>The main new idea of this paper is the simulation of a human workshop, using the LLM in
multiple, programmed steps.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Design</title>
      <sec id="sec-4-1">
        <title>3.1. Design Considerations for a programmed Six Thinking Hats workshop</title>
        <p>The Six Thinking Hats methodology is well known in Requirements Engineering. Although it
entails several ways to use the diferent hats 2, this paper only explores one application i.e. the
identification of one alternative from a group of three.</p>
        <p>A practical workshop needs an introduction by the moderator (blue hat). Due to the OpenAI
API, this type of introduction is not necessary or is covered by providing appropriate information
before the prompt. Before the actual prompt, a system content is given that contains the content
of an introduction to a classic workshop. The order of colors is white, green, yellow, black, red
and blue.</p>
        <p>To limit the necessary computing power, the contribution from each hat is limited to three
bullet points, without explicit limit of words or characters.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Program flow</title>
        <p>Python3 was used as the implementation environment. A simple object-oriented design of the
prototype was chosen. The source code is available at https://GitHub.com/rachmann-alexander/
sixthinkingchatbots.</p>
        <p>ChatGPT-4 was used as the LLM. In general, ChatGPT-4 is considered the market leader and
is therefore suitable as an implementation tool for the prototype. It is unclear whether other
LLMs would show a substantially diferent result. Choosing ChatGPT entails the use of the
corresponding API, which in turn entails certain prompt forms.</p>
        <p>Six Thinking Chatbots follows this flow:
2E.g. generating initial ideas (hat sequence: blue, white, green, blue), identifying solutions (blue, white, green, yellow,
black, red, blue), etc. See also [18]).
3In the opinion of the author, one could have chosen for almost any other modern programming language, as no
exclusive Python-specific capabilities were used.</p>
        <p>1. The user delivers a problem statement.
2. The messages for white, green, yellow, black, and red are computed. The messages to
each hat follow the structure:
• System Content gives the LLM the context to interpret the prompt. System Content
is a necessary parameter to the ChatGPT API.
• User content, also a necessary parameter to the ChatGPT API, is subdivided in two
entities:</p>
        <p>Problem Statement, a description of the topic to-be-discussed by the hats.</p>
        <p>Task, a description of what the hat should do.</p>
        <p>The messages are submitted to ChatGPT API and responses are received.
3. Compute the prompt for blue hat; submit the prompt to ChatGPT API. The prompt to
blue follows the same structure as above, but incorporate the responses to the other hats.</p>
        <p>The response for blue is received.</p>
        <p>4. All prompts and responses are exported to a Markdown-formatted document.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Prompts to the hats</title>
        <p>OpenAI expects the user to give context to a certain prompt, which is given in the system
content. This context gives the LLM a better understanding of what the task should be. Each
hat receives a diferent system content, depending on its nature as defined in Six Thinking Hats
Methodology. Table 1 lists the system content of each hat. Each prompts of all hats, except for
blue Hat, contain a problem statement and a generic task. Table 2 gives the content for the
white hat, table 3 for the blue hat.</p>
        <p>The blue hat owns the moderator role and has therefore the task to summarize the outputs of
the other hats. The prompt to the blue hat is therefore more complex. At first, the content is
given, then the task itself. After that all other contributions are given. Table 4 gives an example
for a response to the blue hat.</p>
        <sec id="sec-4-3-1">
          <title>System content</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>Your decision making is fact based and your argumentation is very structured.</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>Your decision making is based on the possibilities of innovations.</title>
        </sec>
        <sec id="sec-4-3-4">
          <title>You are an optimist.</title>
        </sec>
        <sec id="sec-4-3-5">
          <title>You are a pessimist.</title>
        </sec>
        <sec id="sec-4-3-6">
          <title>Your decision making is emotion based.</title>
        </sec>
        <sec id="sec-4-3-7">
          <title>You are the moderator of a workshop, using the Six Thinking Hats methodology.</title>
        </sec>
        <sec id="sec-4-3-8">
          <title>You are neutral to the contributions from the participants and value each contribution.</title>
        </sec>
        <sec id="sec-4-3-9">
          <title>System content: You are a participant of a workshop. Your decision making is fact based and your</title>
          <p>argumentation is very structured.</p>
        </sec>
        <sec id="sec-4-3-10">
          <title>Prompt (Problem statement): The goal of the workshop is to evaluate three alternatives to a</title>
          <p>problem. The problem is: The sales process on our companies website does not work very well.</p>
        </sec>
        <sec id="sec-4-3-11">
          <title>Management wants us to change the sales process to one of the three alternatives:</title>
        </sec>
        <sec id="sec-4-3-12">
          <title>First, an old-school chatbot with a beforehand written structure.</title>
        </sec>
        <sec id="sec-4-3-13">
          <title>Second, a chatbot using a large language model, without a pre-defined structure.</title>
        </sec>
        <sec id="sec-4-3-14">
          <title>Third, an old-school contact form.</title>
        </sec>
        <sec id="sec-4-3-15">
          <title>Prompt (Task): Describe your point of view to this problem, using only three bullet points.</title>
        </sec>
        <sec id="sec-4-3-16">
          <title>System content: You are the moderator of a workshop, using the Six Thinking Hats methodology.</title>
        </sec>
        <sec id="sec-4-3-17">
          <title>You are neutral to the contributions from the participants and value each contribution.</title>
        </sec>
        <sec id="sec-4-3-18">
          <title>Prompt (Task): Summarize the contributions of the participants, as listed beneath. Think about</title>
          <p>the problem statement and decide, based on the contributions of the participants, which alternative
should be chosen.</p>
        </sec>
        <sec id="sec-4-3-19">
          <title>Prompt (Problem statement): The problem statement of the workshop is as follows: …</title>
        </sec>
        <sec id="sec-4-3-20">
          <title>Prompt (Contribution White): Contribution from a fact-oriented participant: …</title>
          <p>Prompt (Other Contributions): Contribution from …hat: …</p>
        </sec>
        <sec id="sec-4-3-21">
          <title>Response from the LLM to the Blue Hat Prompt</title>
        </sec>
        <sec id="sec-4-3-22">
          <title>After analyzing all of these contributions, it appears that the majority lean towards investing in a</title>
          <p>large language model chatbot for the company. Those in favor of this option highlighted several
benefits including its adaptability to changing customer needs, the potential for learning and
improvement over time, and the ability to provide tailored, eficient service on the fly. They also
lauded its innovative approach and the potential for the drastically improved client interaction it
could ofer.</p>
        </sec>
        <sec id="sec-4-3-23">
          <title>It’s recommended to consider the concerns raised by the pessimist too, ensuring to mitigate</title>
          <p>potential issues like consistency, the maintenance of brand tone, and avoiding technical glitches.</p>
        </sec>
        <sec id="sec-4-3-24">
          <title>Availability of resources, cost implications and consideration for the long-term impact are also factors that should be taken into account when implementing a large language model chatbot solution.</title>
        </sec>
        <sec id="sec-4-3-25">
          <title>The participants showed less enthusiasm for the old-school chatbot and the contact form, identi</title>
          <p>fying limitations like lack of flexibility and evolution, impersonal service, slow turnaround, and
inability to handle complex inquiries. These options could result in a process that may feel outdated
and ineficient, potentially deterring customers rather than facilitating sales and improving the
customer experience.</p>
        </sec>
        <sec id="sec-4-3-26">
          <title>Therefore, based on these contributions, the large language model chatbot stands out as the most promising alternative.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Evaluation</title>
      <sec id="sec-5-1">
        <title>4.1. Evaluation question and methods</title>
        <p>One can evaluate the Six Thinking Chatbots in terms of:
• Q1: Would a regular prompt deliver the same results with less efort?
• Q2: Are the arguments by the colored hats disjoint? Did the hats take diferent position?
• Q3: Did the blue hat summarize the contributions from the other chatbots correctly?
• Q4: How “creative” is the script? How much do the responses difer from each other, if
the script runs several times?
• Q5: Are the contributions valid, do they represent the state of the art or are they
comparable to a workshop with people?</p>
        <p>To answer Q1 and Q4, one needs to generate material for comparison. As comparison for Q1,
the StrategyQA-Prompting style was chosen [19]. The prompt is structured in a context-part
(see table 5), which informs the LLM about the workshop. The second part is the task itself.
The third part defines the output format 4. This last part was not necessary in Six Thinking
Chatbots, because the export is handled by the script.</p>
        <sec id="sec-5-1-1">
          <title>The goal of the workshop is to evaluate three alternatives to a problem. The problem is: The sales process on our companies website does not work very well. Management wants us to change the sales process to one of the three alternatives:</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>First, an old-school chatbot with a beforehand written structure.</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>Second, a chatbot using a large language model, without a pre-defined structure.</title>
        </sec>
        <sec id="sec-5-1-4">
          <title>Third, an old-school contact form.</title>
        </sec>
        <sec id="sec-5-1-5">
          <title>Evaluate which solution one would prefer. Give at most 15 arguments. After listing the arguments, write a conclusion including a selection on a solution.</title>
        </sec>
        <sec id="sec-5-1-6">
          <title>Use markdown syntax.</title>
          <p>The response to this prompt is structured along the three alternatives (table 6); each alternative
is structured in a pro- and contra-section with each three to five arguments.</p>
          <p>To answer Q4, one needs to compute several results. After developing the Six Thinking
Chatbots script (and thereby generating several preliminary results), five results were computed.
All five results are available at GitHub. Only one Text (29.12.2023, 13.06.md in the GitHub Repo)
was used a standard analysis material in tables 7 and 8.
4The prompt and its response is available in GitHub, exactly at https://GitHub.com/rachmann-alexander/
sixthinkingchatbots/blob/main/opt/StrategyQA</p>
        </sec>
        <sec id="sec-5-1-7">
          <title>Response</title>
        </sec>
        <sec id="sec-5-1-8">
          <title>The current sales process on the company’s website is under performing, and management is considering three alternatives to improve it:</title>
        </sec>
        <sec id="sec-5-1-9">
          <title>1. Old-school chatbot with Predefined Structure</title>
        </sec>
        <sec id="sec-5-1-10">
          <title>2. chatbot using a Large Language Model</title>
        </sec>
        <sec id="sec-5-1-11">
          <title>3. Old-School Contact Form</title>
        </sec>
        <sec id="sec-5-1-12">
          <title>Evaluation Criteria:</title>
          <p>The evaluation will consider various factors such as user experience, cost, implementation time,
maintenance, and efectiveness among others.
…
chatbot using a Large Language Model
- Pros:
1. Flexibility: Can handle a wide range of queries with natural, conversational responses.
2. Scalability: Can improve over time with more data and interactions.
…
- Cons:</p>
        </sec>
        <sec id="sec-5-1-13">
          <title>1. Cost: Generally more expensive to implement and maintain.</title>
        </sec>
        <sec id="sec-5-1-14">
          <title>2. Complexity: Requires more sophisticated development and maintenance. …</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Semantic cohesion of the Responses (Q1-Q3)</title>
        <p>To answer Q1 and Q2, table 7, page 8, shows the arguments made by the chatbots. One sees,
that the arguments are mostly disjoint; this holds within the results of the colored hats as well
as between the Six Thinking Chatbots result and the Web browser result. The chatbots did
contribute diferent aspects and therefore fulfilled the expected outcome. To answer Q3, see
table 8. Blue hat voted for the AI chatbot, which represents the majority of the votes of the
other chatbots.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Similarity between exports (Q4)</title>
        <p>To answer Q4, five texts were generated. To compute how similar these texts are, one can
use the shingle algorithm to compute the jaccard similarity. The jaccard similarity describes
how similar two documents are to each other. A similarity of 1 means the two documents
are identical, a 0 means the documents do not have an intersection. There is no definition on
how the jaccard similarity must be in order to identify two documents as the same or nearly
same. [20] The average similarity of diferent texts is 0,608. Considering, that there is no
common understanding of how similar this is, one may conclude, that the prompts are always
distinctively diferent, but always alike.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Relation to a real world workshop (Q5)</title>
        <p>The answer to question 4 is not as easy to compute as the other answers. First, the quality of
the outcomes of a workshop depend highly on the participants, their motivation, the moderator,
the conduction of the workshop, etc. If a person workshop would provide diferent or better
outcomes, is not easy to determine. However, the arguments made by the chatbots are mostly
common knowledge and practice. It would be no surprise, if these outcomes were elaborated
by non-experts.</p>
        <p>The chosen application scenario was generic. It is also unclear whether the same results
would arise with very specific use cases or domain experts.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>This paper introduces a prototypical implementation of the Six Thinking Hats method, called
Six Thinking Chatbots. The source code is available at GitHub, as are all evaluation cases. The
experiment showed that the Six Thinking Chatbots deliver distinctively diferent results than
regular prompting (Q1). The diferent hats play out their specific role (Q2 and Q3). The script
delivers always distinctively diferent outputs (Q4). It is unclear, if the responses of the chatbots
are diferent / in what way they are diferent to responses of humans (Q5). However, the author
would use the script only to support human creativity (such as a preparation to a Six Thinking
Hats workshop), not as a substitution for human creativity.
[5] N. Jia, X. Luo, Z. Fang, C. Liao, When and how artificial intelligence augments employee
creativity, Academy of Management Journal (2023).
[6] E. Creely, Conceiving creativity and learning in a world of artificial intelligence: A thinking
model, in: Creative provocations: Speculations on the future of creativity, technology &amp;
learning, Springer, 2023, pp. 35–50.
[7] D. H. Cropley, K. E. Medeiros, A. Damadzic, The intersection of human and artificial
creativity, in: Creative provocations: Speculations on the future of creativity, technology
&amp; learning, Springer, 2023, pp. 19–34.
[8] K. Liu, S. Reddivari, K. Reddivari, Artificial intelligence in software requirements
engineering: State-of-the-art, in: 2022 IEEE 23rd International Conference on Information Reuse
and Integration for Data Science (IRI), IEEE, 2022, pp. 106–111.
[9] J. J. Y. Chung, Artistic user expressions in ai-powered creativity support tools, in:
Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and
Technology, 2022, pp. 1–4.
[10] Y. Jeon, S. Jin, P. C. Shih, K. Han, Fashionq: an ai-driven creativity support tool for
facilitating ideation in fashion design, in: Proceedings of the 2021 CHI Conference on
Human Factors in Computing Systems, 2021, pp. 1–18.
[11] C. Arora, J. Grundy, M. Abdelrazek, Advancing requirements engineering through
generative ai: Assessing the role of llms, arXiv preprint arXiv:2310.13976 (2023).
[12] A. v. Lamsweerde, Systematic requirements engineering-from system goals to uml models
to software specifications, 2008.
[13] A. Brand, Linkedin post: Was wurde denn nun eigentlich aus dem vortrag in
leipzig zu meinem selbstversuch?, 2023. https://www.linkedin.com/posts/anja-brand_
selbstversuch-covid-chatgpt-activity-7136342918503489537-W0Jm/.
[14] J. Zamfirescu-Pereira, R. Y. Wong, B. Hartmann, Q. Yang, Why johnny can’t prompt:
how non-ai experts try (and fail) to design llm prompts, in: Proceedings of the 2023 CHI
Conference on Human Factors in Computing Systems, 2023, pp. 1–21.
[15] Z. Yu, L. He, Z. Wu, X. Dai, J. Chen, Towards better chain-of-thought prompting strategies:</p>
      <p>A survey, arXiv preprint arXiv:2310.04959 (2023).
[16] J. Kim, S. Park, K. Jeong, S. Lee, S. H. Han, J. Lee, P. Kang, Which is better? exploring
prompting strategy for llm-based metrics, arXiv preprint arXiv:2311.03754 (2023).
[17] OpenAI, Prompt engineering, 2024. https://platform.openai.com/docs/guides/
prompt-engineering.
[18] E. De Bono, Six Thinking Hats: The multi-million bestselling guide to running better
meetings and making faster decisions, Penguin uk, 2017.
[19] M. Geva, D. Khashabi, E. Segal, T. Khot, D. Roth, J. Berant, Did Aristotle Use a Laptop? A
Question Answering Benchmark with Implicit Reasoning Strategies, Transactions of the
Association for Computational Linguistics 9 (2021) 346–361.
[20] C. C. Aggarwal, Information Retrieval and Search Engines, Springer International
Publishing, Cham, 2022, pp. 257–302.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shanahan</surname>
          </string-name>
          ,
          <article-title>Talking about large language models</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>67</volume>
          (
          <year>2024</year>
          )
          <fpage>68</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Perri</surname>
          </string-name>
          ,
          <article-title>What's new in artificial intelligence from the 2023 gartner hype cycle</article-title>
          ,
          <year>2023</year>
          . URL: https://www.gartner.com/en/articles/ what-s
          <article-title>-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ye</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Hua, W. Jia,
          <article-title>Cognitive mirage: A review of hallucinations in large language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>06794</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Karamolegkou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Søgaard</surname>
          </string-name>
          ,
          <article-title>Copyright violations and large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2310.13771</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>