<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Large Language Models verbal creativity to human verbal creativity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anca Dinu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andra Maria Florescu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bucharest</institution>
          ,
          <addr-line>S</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study investigates the verbal creativity diferences and similarities between Large Language Models and humans, based on their answers given to the integrated verbal creativity test in [1]. Since this article reported a very small diference of scores in favour of the machines, the aim of the present work is to thoroughly analyse the data through four methods: scoring the uniqueness of the answers of one human or one machine compared to all the others, semantic similarity clustering, binary classification and manual inspection of the data. The results showed that humans and machines are on a par in terms of uniqueness scores, that humans and machines group in two well defined clusters based on semantics similarities between documents comprising all the answers of an individual (human or machine), per tasks and overall, and that the separate answers can be automatically classified in human answers and LLM answers with traditional machine learning methods, with F1 scores ranging from 68 to 74. The manual analysis supported the insight gained from the automated methods in that LLMs behave human-like while performing creativity tasks, but there are still some important distinctive features to tell them apart.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;creativity assessment</kwd>
        <kwd>LLM creativity</kwd>
        <kwd>verbal creativity</kwd>
        <kwd>semantic similarity clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        so on. A good survey on LLMs’ verbal creativity is [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Since work on LLMs creativity is just at the beginning,
Creativity has made it possible for humanity to survive there is a need for methods, resources, and evaluation
and develop since prehistoric times. Despite the per- to better understand LLMs’ creative abilities and their
ception that some people are more creative than others, diferences and similarities with human creative traits.
many psychologists argue that everyone has the capacity In a recent article, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] designed a verbal creativity test,
for creativity or that creativity is innate and encoded in integrating a wide range of tasks and criteria inspired
human nature [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. from psychological creativity testing, and administrating
      </p>
      <p>
        Creativity is inherently interdisciplinary, involving do- it to both humans and LLMs. The scope of this paper
mains like psychology, cognitive sciences, philosophy, is to analyze the answers given by LLMs and human
rearts, engineering, mathematics, or computer science. Re- spondents to this previous study, for a direct comparison
cently, it has become a field of interest in GenerativeAI of human and machine verbal creativity. To this end, we
(GenAI) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in general, and in particular, in Large Lan- will compute uniqueness scores, cluster the individual
guage Models (LLMs) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. answers per task and overall, perform supervised binary
      </p>
      <p>
        However, much of the current research in genera- classification with classic machine learning methods on
tive models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is concerned with constraining them so all answers and manually analyze some of the data
parthey do not harm people, so they are well-behaved, fac- ticularities.
tual, non-hallucinating, non-biased, non-negative,
nonmisleading, non-toxic, etc., and for a good reason. In
contrast, fewer studies (see section 2) focus on encouraging 2. Theoretical background and
them to be original, unconstrained, or creative, although previous work
computational creativity, as a research field, dates back
to the late ’90s [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with various disciplines including
creative writing, music, or graphics, utilizing artificial
intelligence, particularly neural networks, heuristics, and
      </p>
      <sec id="sec-1-1">
        <title>The formal study of creativity and of its mechanisms and processes started with J.P. Guilford’s plead for creativity in the 1950s [9]. Since then, thousands of articles</title>
        <p>
          CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, and books have been published on diferent aspects of
Dec 04 — 06, 2024, Pisa, Italy creativity [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
* Corresponding author. Creativity is a notoriously hard-to-define notion,
be† These authors contributed equally. cause it is trans-disciplinary, branched in a variety of
$ anca.dinu@lls.unibuc.ro (A. Dinu); domains. It can also be of many kinds like verbal,
graphandra-maria.florescu@s.unibuc.ro (A. M. Florescu) ical, musical, or kinetic creativity. While the last three
(A.0M00.0F-l0o0re0s2c-u46)11-3516 (A. Dinu); 0009-0007-1949-9867 kinds of creativity are related to arts, verbal creativity is
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License the most general kind, expressing the overall creativity
Attribution 4.0 International (CC BY 4.0).
of ideas.
        </p>
        <p>
          Regardless of the domain perspective and of the kind
of creativity, a basic idea in defining it, common to most
of the definitions, is that creativity represents the ability
of an individual to come up with something original
or innovative, of good quality, and appropriate, based
on prior knowledge [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. One can be creative, but lack
appropriateness of the idea or artifact produced, hence
diminishing its quality in terms of creativity.
        </p>
        <p>
          Another related aspect of creativity, as stated by [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], is
represented by two types of thinking during the creative
process:
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], ten LLMs and ten humans were tested on this
verbal creativity test, including the six tasks above. The
authors stated that their goal was to test the creativity
of the selected LLMs in their default architecture, and,
thus, they did not change any settings that could have
modified the creativity level, such as temperature or
top• divergent thinking, which concentrates on the nu- K. The collected answers given to this test are the input
merous ideas appearing during a creative task, data for the present article.
and
5. the Consequences, for which one should guess the
        </p>
        <p>efects of a specified situation , and
6. Divergent Association (DAT), where the
respondent has to produce seven nouns that are
maximally semantically diferent, in all their senses
and uses.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Analysis</title>
      <p>
        Creativity assessment is usually performed with human
evaluators who take into account the four creativity
criteria formulated by [
        <xref ref-type="bibr" rid="ref12 ref9">9, 12</xref>
        ]:
1. originality: uniqueness of the creative answers,
2. flexibility: how semantically distant the answers
are,
3. elaboration: how detailed are the answers, and
4. fluency: how many answers are given.
• convergent thinking, which restricts them to the
only best-fitted or appropriate ones. So, even if
an idea or artifact might seem creative from a
divergent perspective if it is unreasonable to the
point of being completely unrelated to the initial
creativity task to begin with, the overall creativity
level drastically diminishes.
      </p>
      <sec id="sec-2-1">
        <title>With the recent rise of generative models like LLMs</title>
        <p>
          such as Chat GPT1 or Copilot, the interest in
computational creativity peaked, in an attempt to harvest the
creative potential of the machines, in spite of many chal- [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] automatically evaluated the verbal creativity by
lenges such as safety, ethical problems, methodological using the Open Creativity Scoring with AI (OCSAI) tool
norms, evaluating standards, etc. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], an open-source software that uses traditional
seman
        </p>
        <p>
          Previous studies on machine creativity are fragmented: tic distance and fine-tuned GPT for scoring the creativity
some are task-specific, like, for instance, using just role- between the prompt and the answer. The results showed
plays[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], or just storytelling [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], while others focus a slightly better score of the overall verbal creativity,
comon just one LLM [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], or just on one type of creativity puted as the mean of the scores for all the 6 tasks, for
assessment [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. the machines, with a value of 0.58, compared to humans,
        </p>
        <p>
          In this study, we mind this research gap by analyz- with 0.51. Given that the diference is of just 7 decimals,
ing the creative responses to a wide range of tasks, of a one of our goals for this study is to analyze more in-depth
considerable number of LLMs, from [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], who proposed the diferences and similarities of the answers of humans
a comprehensive assessment benchmark for testing the and machines to the verbal creativity test, looking
specifverbal creativity of both LLMs and humans, alike. It ically for distinctive features, rather than raw scores. The
consists of six tasks, inspired from human psychology: ten selected LLMs from the previous study were accessed
via: HuggingChat2 (LLAma-3-70B, Mixtral-8x7B3), via
Hugging Space 4(Cohere- c4ai-command-r-plus,
Yichat34B), locally (Falcon through GPT4All5), or directly from
their web pages (Copilot(Balanced Mode) 6 ), Gemini-free
version7, Jais-30B8, Youchat from You.com-Smart mode9,
        </p>
        <p>Character AI (Character Assistant10).
1. Alternative Uses (AUT), where the test taker is
asked to come up with uncommon uses for an
ordinary object,
2. Instances, for which the aim is to name as many
things as one can think of that have a common
feature,
3. the Similarities, which consists of stating as many
as possible commonalities of two specified
objects,
4. the Causes, where the aim is to guess the cause
of a given situation,
2https://huggingface.co/chat/models/
3No longer supported
4https://huggingface.co/spaces
5https://gpt4all.io/index.html
6https://www.bing.com/chat?form=NTPCHB
7https://gemini.google.com/app
8https://auth.arabic-gpt.ai/
9https://you.com/?chatMode=default
10https://c.ai/c/YntB_ZeqRq2l_aVf2gWDCZl4oBttQzDvhj9cXafWcF8</p>
      </sec>
      <sec id="sec-2-2">
        <title>The humans were non-native fluent English speakers</title>
        <p>who responded to the verbal creativity test as volunteers,
either in a lab or at their homes by completing a Google
Form. Their background was all academic, from students,
undergraduates, graduates and professors, the average
age being 26.</p>
        <p>We implemented all the experiments in Google Colab11
and we have used three LLMs to assist us with the codes:
Claude12, Copilot13 and Gemini14, in a setting of mostly
zero-shot prompt engineering, with the standard settings
and parameters.</p>
        <p>For data analysis, we used Python and the following Figure 1: Ranking of uniqueness scores for humans and
malibraries: Spacy15, Scikit-learn16, Matplotlib17, Numpy18, chines
and Pandas19.
3.1. Data
The databases of verbal creativity answers contains 4530
answers, totalling 13714 words. The test was organized in
6 tasks. Five out of the six tasks have five items each and
a maximum of 10 answers per item. An answer can have
a maximum of 5 words. The sixth task, DAT, consists
only of one item of 10 single-words answers, but only the
most semantically diferent 7 out of the ten given by the
respondents were taken into account by the DAT web
page 20. That amounts to 2570 answers for the machines,
which responded always with the maximum number of
answers, 10, even if the instruction was the same for both
humans and machines to give between 1 and 10 answers
per task. The human respondents gave any number of
answers in the range 1 to 10, obtaining thus 1960 human
answers. As such, the database is unbalanced, with with
more than a third more machine answers compared to
human answers.
3.2. Uniqueness scores for the answers of
humans and machines to the verbal
creativity test
since one of their goals was to evaluate the answers fully
automatically. Nevertheless, the uniqueness of the
answers of an individual constitutes an important clue to
their creativity. Hence, to better understand the
uniqueness trait of both humans and machines, we computed
uniqueness scores as if follows.</p>
        <p>We grouped the creativity test answers of both
humans and machines in separate files, each containing all
the answers of a particular individual. We thus obtained
20 answer files, 10 for humans and 10 for LLMs. After
removing the stop words, we generated embeddings for
each file, and then we computed their pairwise semantic
similarity, using spaCY library. The uniqueness scores
were obtained as the inverse of the average semantic
similarity scores between an individual and all the others.</p>
        <p>The ranking obtained in the decreasing order of
uniqueness is depicted in figure 1, where one can see that the
humans (in green) and the machines (in red) are mostly
intermingling.</p>
        <p>This uniform distribution of humans and machines
in terms of uniqueness scores shows that humans and
machines are on a par in this respect.
11https://colab.research.google.com/
12https://claude.ai/chat/
13https://www.microsoft.com/en-us/microsoft-copilot
14https://gemini.google.com/app/
15https://spacy.io/
16https://scikit-learn.org/stable/
17https://matplotlib.org/
18https://numpy.org/
19https://pandas.pydata.org/
20https://www.datcreativity.com/</p>
      </sec>
      <sec id="sec-2-3">
        <title>One of the criteria for assessing creativity in psychology</title>
        <p>
          is the degree of originality of the answers of one
individual, compared to the answers of all the other individuals. The aim of this experiment was to investigate if individual
The evaluation of this criterion is done manually and humans and individual machines cluster together, based
is time-consuming, since it includes assessing not only on semantic similarity of their answers to the creativity
word similarities, but also similarities between ideas of test. We used the word embedding of the 20 individual
the diferent individuals. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] did not use this criterion, files described in subsection 3.2. To reduce the
dimensionality of the vector space for the 2D plot, we used Principal
Component Analysis (PCA), from spaCY library.
        </p>
        <p>In figure 2 we can see how the LLMs (dots in red)
perfectly cluster together, just as the humans (dots in
green) do, considering all responses to the six tasks. This
result indicates that from a semantic perspective, humans
and LLMs generate creative answers diferently, or at
least that there are discriminating features to distinguish</p>
        <sec id="sec-2-3-1">
          <title>3.3. Semantic similarity clustering of the answers of humans and machines</title>
          <p>between the two. In this binary classification experiment, we investigated</p>
          <p>We also plotted the clusters per answers to a specific if they also have distinctive features at the answer level.
task, for all the 6 tasks, in figures 3, 4, 5, 6, 7, and 8. Gen- For this, we trained several traditional machine learning
erally, the answers of the humans and of the machines (ML) classifiers to discriminate between the answers of
clearly clustered by their kind, with the exception of the humans and of LLMs to the verbal creativity test. The
task Instances, where the humans and the LLMs were two classes were represented by all the answers of the
interposed, meaning that the semantic content of their humans and, separately, by all the answers of the LLMs,
answers was not specific to any of the two classes. A with one answer per line, excluding the DAT task, since
bit of mixing appeared also in Divergent Association Task it only required enumerating words. As the LLMs
al(DAT). The not so clear separation of humans and ma- ways gave the maximum number of answers required in
chines for Instances and DAT tasks might result from the the test, the dataset was unbalanced (2500 answers for
fact that the responses to these particular tasks are inher- LLMs and 1890 for humans). To address this problem of
ently very short, of just one or two words for Instances unbalanced dataset, we implemented a simple random
task and of just one word for the DAT. under-sampling technique, thus obtaining 1890 answers
for each class, humans and LLMs. We then employed the
3.4. Binary classification of human and Term Frequency-Inverse Document Frequency (TF-IDF)
vectorization technique to convert the text data into
numachine creativity answers
merical features. The vectorizer used a maximum of 1000
As the clusterization experiment suggested, the answers features, for capturing all important aspects and dealing
to the verbal creativity test are almost linearly separable with computational complexity. Stratified sampling was
in two classes (humans and machines) at individual level. used to ensure a dataset split for an 80/20 training and</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>3.5. General considerations</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>We manually inspected the first two most unique LLMs</title>
        <p>and humans to see what makes their answers so
diferent from the others but also investigated the uniqueness
scores correlation with the quality and creativity.</p>
        <p>Figure 7: Semantic similarity clusters of answers for Conse- The first positioned on the uniqueness ranking, the
quences LLM Jais, had the tendency to respond to the
Similarity task with word obtained by nominalization (deriving
nouns from verbs), like, for instance, "dependency",
"cutesting ratio. Thus, training and testing sets contained riosity", "belonging", and "growth", as opposed to all the
the same number of samples for each category, e.g. 1512 other LLMs, which responded with regular nouns. It also
answers for training, and 378 answers for testing. tended to use answers that started with the same prefix:</p>
        <p>In table 1, we give the best three classifier methods, “Unfiltered”, "Unmatched", "Unrestricted", and
"Unyieldwith precision, recall, accuracies, and F1 scores. The ing", and to use the same word followed by other words,
NaïveBayes classifier obtained the highest accuracy, of like in, for instance, "Thought policing", "Thoughtful
0.74, followed at just three decimals by both the Support shopping", and "Thought clones". In this respect, Jais
Vector Machine (SVM) classifier and the Random Forest gave the most unique answers, which, obviously, were
classifier, with an accuracy of 0.71. not also the most creative.</p>
        <p>This moderate performance of the ML models sug- The second positioned on the uniqueness ranking,
Hugests that either the dataset is too small for the models man 3, started the majority of their answers with "use"
to perform better, or that there is a fair amount of sim- or "use it as". This respondent also repeated the starting</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Ethical considerations</title>
      <p>point of most of their answers, like in "what...", "getting
a ...", "where ...", "in a...". These features seem enough to
score highly w.r.t. uniqueness, but fail to correlate with We did not use or disclose any personal data from the
huthe quality of the creativity. man participants, who remained completely anonymous</p>
      <p>This inspection shows that the most unique answers and took part in this research as volunteers. There are no
are not necessarily the most creative. If the bulk of the ethical concerns with regard to publishing this research.
respondents give good-quality answers, that might
result in a high uniqueness score for lower-quality or less 5. Limitations
creative responses.</p>
      <p>
        We also checked the appropriateness of the answers The dataset for this research was small and slightly
unbalgiven by both humans and machines, which is an im- anced since the humans answered based on their mood
portant requirement of genuine creativity, as mentioned or capabilities, while the LLMs answered strictly with a
in section 1. Creativity requires divergent thinking, but maximum of ten answers per task.
true creativity emerges when convergent thinking also Also, the sample pool is quite small, as there were only
restricts the divergence to only those responses that are ten humans and ten LLMs involved, so the results might
appropriate for the creative assignment [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. be unstable when enlarging the dataset.
      </p>
      <p>In general, humans gave fairly suitable answers. In- Due to lack of space, this study focuses more on
austead, not all the LLMs managed to generate all the an- tomated methods of analysis, than on manual analysis,
swers in an appropriate manner. For instance, for the thus lacking a more in-depth insight into the patterns of
Consequences task, for the item "There is a virus and only the collected answers to the verbal creativity test from
children survive", Gemini, although responded creatively, both humans and machines.
failed to also respond suitably. This model gave four Finally, this study compares the creativity answers
out of the ten answers that are either paradoxical, or of humans and LLMs in English, but the human
particinon-sensical, in a situation that clearly implies that only pants to the test were non-native (fluent) English
speakchildren are alive, so there are no adults around: "Toy Fac- ers, which can potentially decrease their creativity score,
tories booming", "Geriatric Theme Parks", "Grandparents compared to scores they could obtain in their own native
raise parents", "Parents taught by Tablets". language.</p>
      <p>Another manual scrutiny focused on analyzing the
similar or the diferent patterns of LLMs and humans
when responding to a particular task. We found that sev- 6. Conclusion and future works
eral LLMs answered to the Divergent Association Task
with the same word among the seven required ones. This study showed that there are some diferences
beFor instance, "Serendipity" was used by three models. tween human and machine answers given to a verbal
This phenomenon is not specific only to the machines. creativity test, but also plenty of similarities.
For the Guessing Causes Task, Human 3 and Human The LLMs’ answers vary much like the humans
an4 produced similar answers, like, for instance, both swers. Individual, unique answers, w.r.t. to the set of
gave the answer "earthquake", or produced the same all answers are produced by both humans and machines
idea, like "green lights"/"because of green lights", "eating alike, with no noticeable diference.
something bad"/"they ate something bad", "St Patrick’s Still, at a semantic level, humans and machines
generDay"/"St. Patrick’s day party", "poor construction"/"faulty ally group together as individuals.
structural integrity", "looking at screens too much"/"too The performance of automatic classification between
much screen time". human and machine answers is moderate and leaves</p>
      <p>Also, we noticed some peculiarities of individual LLMs, room for improvement.
such as Falcon’s generation of only words starting with The general findings of this study indicate that LLMs’
the letter "a" for DAT, or Cohere’s generation of only op- creative capabilities are comparable with human abilities
posite words for this task: "love", "hate", "peace”, "chaos". and, as such, they could be put to good use in the creative</p>
      <p>Moreover, humans seem more personally involved in domain. Humans "just" need to adapt to their usage, mind
answering than LLMs, which tend to give only general the ethics and safety issues, and discern the information
answers to the tasks, with some exceptions. Some LLMs at every step, instead of blindly using them.
seem to respond "humanly", even producing humor and In future work, we will focus on expanding the dataset,
ifgurative speech, while others only respond quite stan- by adding more LLMs’ and humans’ answers to the test,
dard or "robotic". for a better statistical coverage.</p>
      <p>Overall, the LLMs’s distribution is similar with the Also, we aim to manually investigate more in-depth
humans’ distribution, varying from one individual to the database, to look for more systematic patterns for
another. both humans and machines.</p>
      <sec id="sec-3-1">
        <title>As creativity remains a domain with endless possibili</title>
        <p>ties, we also plan to investigate other aspects of LLMs’
creativity, such as language or image.</p>
        <p>Another future approach worthy of pursuing is using
Deep Learning approaches instead of traditional Machine
Learning approaches for the binary classification task, or
using metrics specific to LLM-generated tasks.
6. Divergent Association Task (DAT)</p>
        <p>Write ten words that are as diferent from each other
as possible, in all meanings and uses of the words.</p>
        <p>Rules:
7. Appendix Verbal Creativity Test Only single words in English. Only nouns (e.g., things,
objects, concepts). No proper nouns (e.g., no specific
There are 6 types of creativity assessments in this test. people or places). No specialized vocabulary (e.g., no
Note: Be as creative, original, and innovative as possible. technical terms). Think of the words on your own (e.g.,
Pay attention to the word and answer limit! Try to think do not just look at objects in your surroundings).
of as many answers as possible within the limit!</p>
        <p>1. Alternative uses Test Name up to ten unusual
uses for the following five items. Use a maximum of five Acknowledgments
words. Give one answer per line.</p>
      </sec>
      <sec id="sec-3-2">
        <title>1. Lipstick</title>
        <p>2. Avocado
3. Whistle
4. Chalk
5. Pantyhose</p>
      </sec>
      <sec id="sec-3-3">
        <title>2. Instances Use a maximum of five words per answer.</title>
        <p>Give one answer per line. Name up to 10 things that:</p>
      </sec>
      <sec id="sec-3-4">
        <title>1. Things that can harm one’s self-esteem</title>
        <p>2. Things that you have control of in your life
3. Situations where it is good to be loud
4. Things that can flow
5. Things that you can mark on a map</p>
      </sec>
      <sec id="sec-3-5">
        <title>3. Similarities How are the following 2 terms alike?</title>
        <p>Use a maximum of three words to describe a common
feature of the following pair of words. Give one answer
per line. Give up to ten answers:
1. Prison &amp; School
2. Eyes &amp; Ears
3. House &amp; Den
4. Earthquake &amp; Tornado
5. Baby &amp; Cub
4. Causes
1. Crash of a building
2. Everybody turns green at a party
3. Social media disappears
4. Humanity becomes shortsighted
5. Your hat does not fit you anymore
5. Consequences</p>
      </sec>
      <sec id="sec-3-6">
        <title>1. There is a mutation and men are the ones giving birth</title>
      </sec>
      <sec id="sec-3-7">
        <title>This work was supported by a mobility project of the Romanian Ministery of Research, Innovation and Digitization, CNCS - UEFISCDI, project number PN-IV-P2-2.2MC-2024-0589, within PNCDI IV.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Anca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Maria</surname>
          </string-name>
          ,
          <article-title>An integrated benchmark for verbal creativity testing of llms and humans</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Knowledge-Based and Intelligent Information &amp; Engineering Systems (KES</source>
          <year>2024</year>
          ),
          <article-title>"</article-title>
          <source>KES 2024"</source>
          ,
          <year>2024</year>
          .
          <article-title>"accepted".</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Csikszentmihalyi</surname>
          </string-name>
          ,
          <article-title>Creativity: Flow and the Psychology of Discovery and Invention</article-title>
          , first ed., HarperCollins Publishers, New York, NY,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Doshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hauser</surname>
          </string-name>
          ,
          <article-title>Generative artificial intelligence enhances creativity but reduces the diversity of novel content</article-title>
          ,
          <source>Science Advances</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>eadn5290</article-title>
          . URL: https://ssrn.com/abstract=4535536. doi:
          <volume>10</volume>
          .2139/ssrn.4535536.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Guzik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Byrge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gilde</surname>
          </string-name>
          ,
          <article-title>The originality of machines: Ai takes the torrance test</article-title>
          ,
          <source>Journal of Creativity</source>
          <volume>33</volume>
          (
          <year>2023</year>
          )
          <article-title>100065</article-title>
          . URL: https://www.sciencedirect.com/science/ article/pii/S2713374523000249. doi:https: //doi.org/10.1016/j.yjoc.
          <year>2023</year>
          .
          <volume>100065</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Boden</surname>
          </string-name>
          ,
          <source>The Creative Mind: Myths and Mechanisms</source>
          , Routledge,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Anantrasirichai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bull</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence in the creative industries: a review</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>55</volume>
          (
          <year>2021</year>
          )
          <fpage>589</fpage>
          -
          <lpage>656</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>A survey on large language model hallucination via a creativity perspective</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>06647</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. J.P.</given-names>
            ,
            <surname>Creativity</surname>
          </string-name>
          , American
          <string-name>
            <surname>Psychologist</surname>
          </string-name>
          (
          <year>1950</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Carayannis</surname>
          </string-name>
          (Ed.),
          <source>Encyclopedia of Creativity, Invention, Innovation and Entrepreneurship</source>
          , Springer International Publishing,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J. Kaufman</surname>
          </string-name>
          , R. Sternberg (Eds.), The Cambridge Handbook of Creativity, Cambridge Handbooks in Psychology, Cambridge University Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. P. J. P.</given-names>
            <surname>Guilford</surname>
          </string-name>
          , The nature of human intelligence / [by]
          <string-name>
            <given-names>J.P.</given-names>
            <surname>Guilford</surname>
          </string-name>
          .,
          <article-title>McGraw-Hill series in psychology, McGraw-</article-title>
          <string-name>
            <surname>Hill</surname>
          </string-name>
          , New York,
          <year>1967</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Assessing and understanding creativity in large language models</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2401</volume>
          .
          <fpage>12491</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakrabarty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Laban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muresan</surname>
          </string-name>
          , C.-S. Wu,
          <article-title>Art or artifice? large language models and the false promise of creativity</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>14556</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cropley</surname>
          </string-name>
          ,
          <article-title>Is artificial intelligence more creative than humans? : Chatgpt and the divergent association task, Learning Letters 2 (</article-title>
          <year>2023</year>
          )
          <article-title>13</article-title>
          . URL: https://learningletters.org/index.php/learn/ article/view/13. doi:
          <volume>10</volume>
          .59453/ll.v2.
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Organisciak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Acar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berthiaume</surname>
          </string-name>
          ,
          <article-title>Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models</article-title>
          ,
          <source>Thinking Skills and Creativity</source>
          <volume>49</volume>
          (
          <year>2023</year>
          )
          <article-title>101356</article-title>
          . URL: https://www.sciencedirect.com/ science/article/pii/S1871187123001256. doi:https: //doi.org/10.1016/j.tsc.
          <year>2023</year>
          .
          <volume>101356</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>