<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Comparing Large Language Models verbal creativity to human verbal creativity</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Anca</forename><surname>Dinu</surname></persName>
							<email>anca.dinu@lls.unibuc.ro</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<addrLine>S, oseaua Panduri 90, Sector 5</addrLine>
									<postCode>050663</postCode>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andra</forename><forename type="middle">Maria</forename><surname>Florescu</surname></persName>
							<email>andra-maria.florescu@s.unibuc.ro</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<addrLine>S, oseaua Panduri 90, Sector 5</addrLine>
									<postCode>050663</postCode>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Comparing Large Language Models verbal creativity to human verbal creativity</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7F8608F35753A93D437B211D9C45C35D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>creativity assessment</term>
					<term>LLM creativity</term>
					<term>verbal creativity</term>
					<term>semantic similarity clustering</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This study investigates the verbal creativity differences and similarities between Large Language Models and humans, based on their answers given to the integrated verbal creativity test in <ref type="bibr" target="#b0">[1]</ref>. Since this article reported a very small difference of scores in favour of the machines, the aim of the present work is to thoroughly analyse the data through four methods: scoring the uniqueness of the answers of one human or one machine compared to all the others, semantic similarity clustering, binary classification and manual inspection of the data. The results showed that humans and machines are on a par in terms of uniqueness scores, that humans and machines group in two well defined clusters based on semantics similarities between documents comprising all the answers of an individual (human or machine), per tasks and overall, and that the separate answers can be automatically classified in human answers and LLM answers with traditional machine learning methods, with F1 scores ranging from 68 to 74. The manual analysis supported the insight gained from the automated methods in that LLMs behave human-like while performing creativity tasks, but there are still some important distinctive features to tell them apart.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Creativity has made it possible for humanity to survive and develop since prehistoric times. Despite the perception that some people are more creative than others, many psychologists argue that everyone has the capacity for creativity or that creativity is innate and encoded in human nature <ref type="bibr" target="#b1">[2]</ref>.</p><p>Creativity is inherently interdisciplinary, involving domains like psychology, cognitive sciences, philosophy, arts, engineering, mathematics, or computer science. Recently, it has become a field of interest in GenerativeAI (GenAI) <ref type="bibr" target="#b2">[3]</ref> in general, and in particular, in Large Language Models (LLMs) <ref type="bibr" target="#b3">[4]</ref>.</p><p>However, much of the current research in generative models <ref type="bibr" target="#b5">[5]</ref> is concerned with constraining them so they do not harm people, so they are well-behaved, factual, non-hallucinating, non-biased, non-negative, nonmisleading, non-toxic, etc., and for a good reason. In contrast, fewer studies (see section 2) focus on encouraging them to be original, unconstrained, or creative, although computational creativity, as a research field, dates back to the late '90s <ref type="bibr" target="#b6">[6]</ref>, <ref type="bibr" target="#b7">[7]</ref> with various disciplines including creative writing, music, or graphics, utilizing artificial intelligence, particularly neural networks, heuristics, and so on. A good survey on LLMs' verbal creativity is <ref type="bibr" target="#b8">[8]</ref>. Since work on LLMs creativity is just at the beginning, there is a need for methods, resources, and evaluation to better understand LLMs' creative abilities and their differences and similarities with human creative traits.</p><p>In a recent article, <ref type="bibr" target="#b0">[1]</ref> designed a verbal creativity test, integrating a wide range of tasks and criteria inspired from psychological creativity testing, and administrating it to both humans and LLMs. The scope of this paper is to analyze the answers given by LLMs and human respondents to this previous study, for a direct comparison of human and machine verbal creativity. To this end, we will compute uniqueness scores, cluster the individual answers per task and overall, perform supervised binary classification with classic machine learning methods on all answers and manually analyze some of the data particularities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Theoretical background and previous work</head><p>The formal study of creativity and of its mechanisms and processes started with J.P. Guilford's plead for creativity in the 1950s <ref type="bibr" target="#b9">[9]</ref>. Since then, thousands of articles and books have been published on different aspects of creativity <ref type="bibr" target="#b10">[10]</ref>.</p><p>Creativity is a notoriously hard-to-define notion, because it is trans-disciplinary, branched in a variety of domains. It can also be of many kinds like verbal, graphical, musical, or kinetic creativity. While the last three kinds of creativity are related to arts, verbal creativity is the most general kind, expressing the overall creativity of ideas.</p><p>Regardless of the domain perspective and of the kind of creativity, a basic idea in defining it, common to most of the definitions, is that creativity represents the ability of an individual to come up with something original or innovative, of good quality, and appropriate, based on prior knowledge <ref type="bibr" target="#b11">[11]</ref>. One can be creative, but lack appropriateness of the idea or artifact produced, hence diminishing its quality in terms of creativity.</p><p>Another related aspect of creativity, as stated by <ref type="bibr" target="#b12">[12]</ref>, is represented by two types of thinking during the creative process:</p><p>• divergent thinking, which concentrates on the numerous ideas appearing during a creative task, and</p><p>• convergent thinking, which restricts them to the only best-fitted or appropriate ones. So, even if an idea or artifact might seem creative from a divergent perspective if it is unreasonable to the point of being completely unrelated to the initial creativity task to begin with, the overall creativity level drastically diminishes.</p><p>With the recent rise of generative models like LLMs such as Chat GPT <ref type="foot" target="#foot_0">1</ref> or Copilot, the interest in computational creativity peaked, in an attempt to harvest the creative potential of the machines, in spite of many challenges such as safety, ethical problems, methodological norms, evaluating standards, etc.</p><p>Previous studies on machine creativity are fragmented: some are task-specific, like, for instance, using just roleplays <ref type="bibr" target="#b13">[13]</ref>, or just storytelling <ref type="bibr" target="#b14">[14]</ref>, while others focus on just one LLM <ref type="bibr" target="#b3">[4]</ref>, or just on one type of creativity assessment <ref type="bibr" target="#b15">[15]</ref>.</p><p>In this study, we mind this research gap by analyzing the creative responses to a wide range of tasks, of a considerable number of LLMs, from <ref type="bibr" target="#b0">[1]</ref>, who proposed a comprehensive assessment benchmark for testing the verbal creativity of both LLMs and humans, alike. It consists of six tasks, inspired from human psychology:</p><p>1. Alternative Uses (AUT), where the test taker is asked to come up with uncommon uses for an ordinary object, 2. Instances, for which the aim is to name as many things as one can think of that have a common feature, 3. the Similarities, which consists of stating as many as possible commonalities of two specified objects, 4. the Causes, where the aim is to guess the cause of a given situation, 5. the Consequences, for which one should guess the effects of a specified situation , and 6. Divergent Association (DAT), where the respondent has to produce seven nouns that are maximally semantically different, in all their senses and uses.</p><p>In <ref type="bibr" target="#b0">[1]</ref>, ten LLMs and ten humans were tested on this verbal creativity test, including the six tasks above. The authors stated that their goal was to test the creativity of the selected LLMs in their default architecture, and, thus, they did not change any settings that could have modified the creativity level, such as temperature or top-K. The collected answers given to this test are the input data for the present article.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Analysis</head><p>Creativity assessment is usually performed with human evaluators who take into account the four creativity criteria formulated by <ref type="bibr" target="#b9">[9,</ref><ref type="bibr" target="#b12">12]</ref>:</p><p>1. originality: uniqueness of the creative answers, 2. flexibility: how semantically distant the answers are, 3. elaboration: how detailed are the answers, and 4. fluency: how many answers are given.</p><p>[1] automatically evaluated the verbal creativity by using the Open Creativity Scoring with AI (OCSAI) tool <ref type="bibr" target="#b16">[16]</ref>, an open-source software that uses traditional semantic distance and fine-tuned GPT for scoring the creativity between the prompt and the answer. The results showed a slightly better score of the overall verbal creativity, computed as the mean of the scores for all the 6 tasks, for the machines, with a value of 0.58, compared to humans, with 0.51. Given that the difference is of just 7 decimals, one of our goals for this study is to analyze more in-depth the differences and similarities of the answers of humans and machines to the verbal creativity test, looking specifically for distinctive features, rather than raw scores. The ten selected LLMs from the previous study were accessed via: HuggingChat<ref type="foot" target="#foot_1">2</ref> (LLAma-3-70B, Mixtral-8x7B<ref type="foot" target="#foot_2">3</ref> ), via Hugging Space<ref type="foot" target="#foot_3">4</ref> (Cohere-c4ai-command-r-plus, Yichat-34B), locally (Falcon through GPT4All<ref type="foot" target="#foot_4">5</ref> ), or directly from their web pages (Copilot(Balanced Mode) <ref type="foot" target="#foot_5">6</ref> ), Gemini-free version <ref type="foot" target="#foot_6">7</ref> , Jais-30B<ref type="foot" target="#foot_7">8</ref> , Youchat from You.com-Smart mode <ref type="foot" target="#foot_8">9</ref> , Character AI (Character Assistant<ref type="foot" target="#foot_9">10</ref> ).</p><p>The humans were non-native fluent English speakers who responded to the verbal creativity test as volunteers, either in a lab or at their homes by completing a Google Form. Their background was all academic, from students, undergraduates, graduates and professors, the average age being 26.</p><p>We implemented all the experiments in Google Colab<ref type="foot" target="#foot_10">11</ref> and we have used three LLMs to assist us with the codes: Claude<ref type="foot" target="#foot_11">12</ref> , Copilot <ref type="foot" target="#foot_12">13</ref> and Gemini <ref type="foot" target="#foot_13">14</ref> , in a setting of mostly zero-shot prompt engineering, with the standard settings and parameters.</p><p>For data analysis, we used Python and the following libraries: Spacy<ref type="foot" target="#foot_14">15</ref> , Scikit-learn <ref type="foot" target="#foot_15">16</ref> , Matplotlib<ref type="foot" target="#foot_16">17</ref> , Numpy<ref type="foot" target="#foot_17">18</ref> , and Pandas<ref type="foot" target="#foot_18">19</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data</head><p>The databases of verbal creativity answers contains 4530 answers, totalling 13714 words. The test was organized in 6 tasks. Five out of the six tasks have five items each and a maximum of 10 answers per item. An answer can have a maximum of 5 words. The sixth task, DAT, consists only of one item of 10 single-words answers, but only the most semantically different 7 out of the ten given by the respondents were taken into account by the DAT web page <ref type="foot" target="#foot_19">20</ref> . That amounts to 2570 answers for the machines, which responded always with the maximum number of answers, 10, even if the instruction was the same for both humans and machines to give between 1 and 10 answers per task. The human respondents gave any number of answers in the range 1 to 10, obtaining thus 1960 human answers. As such, the database is unbalanced, with with more than a third more machine answers compared to human answers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Uniqueness scores for the answers of humans and machines to the verbal creativity test</head><p>One of the criteria for assessing creativity in psychology is the degree of originality of the answers of one individual, compared to the answers of all the other individuals. The evaluation of this criterion is done manually and is time-consuming, since it includes assessing not only word similarities, but also similarities between ideas of the different individuals. <ref type="bibr" target="#b0">[1]</ref> did not use this criterion, We grouped the creativity test answers of both humans and machines in separate files, each containing all the answers of a particular individual. We thus obtained 20 answer files, 10 for humans and 10 for LLMs. After removing the stop words, we generated embeddings for each file, and then we computed their pairwise semantic similarity, using spaCY library. The uniqueness scores were obtained as the inverse of the average semantic similarity scores between an individual and all the others. The ranking obtained in the decreasing order of uniqueness is depicted in figure <ref type="figure" target="#fig_0">1</ref>, where one can see that the humans (in green) and the machines (in red) are mostly intermingling.</p><p>This uniform distribution of humans and machines in terms of uniqueness scores shows that humans and machines are on a par in this respect.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Semantic similarity clustering of the answers of humans and machines</head><p>The aim of this experiment was to investigate if individual humans and individual machines cluster together, based on semantic similarity of their answers to the creativity test. We used the word embedding of the 20 individual files described in subsection 3.2. To reduce the dimensionality of the vector space for the 2D plot, we used Principal Component Analysis (PCA), from spaCY library.</p><p>In figure <ref type="figure" target="#fig_1">2</ref> we can see how the LLMs (dots in red) perfectly cluster together, just as the humans (dots in green) do, considering all responses to the six tasks. This result indicates that from a semantic perspective, humans and LLMs generate creative answers differently, or at least that there are discriminating features to distinguish  We also plotted the clusters per answers to a specific task, for all the 6 tasks, in figures 3, 4, 5, 6, 7, and 8. Generally, the answers of the humans and of the machines clearly clustered by their kind, with the exception of the task Instances, where the humans and the LLMs were interposed, meaning that the semantic content of their answers was not specific to any of the two classes. A bit of mixing appeared also in Divergent Association Task (DAT). The not so clear separation of humans and machines for Instances and DAT tasks might result from the fact that the responses to these particular tasks are inherently very short, of just one or two words for Instances task and of just one word for the DAT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Binary classification of human and machine creativity answers</head><p>As the clusterization experiment suggested, the answers to the verbal creativity test are almost linearly separable in two classes (humans and machines) at individual level.     In table <ref type="table" target="#tab_0">1</ref>, we give the best three classifier methods, with precision, recall, accuracies, and F1 scores. The NaïveBayes classifier obtained the highest accuracy, of 0.74, followed at just three decimals by both the Support Vector Machine (SVM) classifier and the Random Forest classifier, with an accuracy of 0.71.</p><p>This moderate performance of the ML models suggests that either the dataset is too small for the models to perform better, or that there is a fair amount of sim- ilarity between the answers of humans and machines that prevents the model to better learn to discriminate between human and machine answers. Further experiments are needed to see if by enlarging the dataset or by experimenting with SOTA transformers to see wheter the performance rises considerably or not.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">General considerations</head><p>We manually inspected the first two most unique LLMs and humans to see what makes their answers so different from the others but also investigated the uniqueness scores correlation with the quality and creativity.</p><p>The first positioned on the uniqueness ranking, the LLM Jais, had the tendency to respond to the Similarity task with word obtained by nominalization (deriving nouns from verbs), like, for instance, "dependency", "curiosity", "belonging", and "growth", as opposed to all the other LLMs, which responded with regular nouns. It also tended to use answers that started with the same prefix: "Unfiltered", "Unmatched", "Unrestricted", and "Unyielding", and to use the same word followed by other words, like in, for instance, "Thought policing", "Thoughtful shopping", and "Thought clones". In this respect, Jais gave the most unique answers, which, obviously, were not also the most creative.</p><p>The second positioned on the uniqueness ranking, Human 3, started the majority of their answers with "use" or "use it as". This respondent also repeated the starting point of most of their answers, like in "what... ", "getting a ...", "where ...", "in a...". These features seem enough to score highly w.r.t. uniqueness, but fail to correlate with the quality of the creativity.</p><p>This inspection shows that the most unique answers are not necessarily the most creative. If the bulk of the respondents give good-quality answers, that might result in a high uniqueness score for lower-quality or less creative responses.</p><p>We also checked the appropriateness of the answers given by both humans and machines, which is an important requirement of genuine creativity, as mentioned in section 1. Creativity requires divergent thinking, but true creativity emerges when convergent thinking also restricts the divergence to only those responses that are appropriate for the creative assignment <ref type="bibr" target="#b12">[12]</ref>.</p><p>In general, humans gave fairly suitable answers. Instead, not all the LLMs managed to generate all the answers in an appropriate manner. For instance, for the Consequences task, for the item "There is a virus and only children survive", Gemini, although responded creatively, failed to also respond suitably. This model gave four out of the ten answers that are either paradoxical, or non-sensical, in a situation that clearly implies that only children are alive, so there are no adults around: "Toy Factories booming", "Geriatric Theme Parks", "Grandparents raise parents", "Parents taught by Tablets".</p><p>Another manual scrutiny focused on analyzing the similar or the different patterns of LLMs and humans when responding to a particular task. We found that several LLMs answered to the Divergent Association Task with the same word among the seven required ones. For instance, "Serendipity" was used by three models. This phenomenon is not specific only to the machines. For the Guessing Causes Task, Human 3 and Human 4 produced similar answers, like, for instance, both gave the answer "earthquake", or produced the same idea, like "green lights"/"because of green lights", "eating something bad"/"they ate something bad", "St Patrick's Day"/"St. Patrick's day party", "poor construction"/"faulty structural integrity", "looking at screens too much"/"too much screen time". Also, we noticed some peculiarities of individual LLMs, such as Falcon's generation of only words starting with the letter "a" for DAT, or Cohere's generation of only opposite words for this task: "love", "hate", "peace", "chaos".</p><p>Moreover, humans seem more personally involved in answering than LLMs, which tend to give only general answers to the tasks, with some exceptions. Some LLMs seem to respond "humanly", even producing humor and figurative speech, while others only respond quite standard or "robotic".</p><p>Overall, the LLMs's distribution is similar with the humans' distribution, varying from one individual to another.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Ethical considerations</head><p>We did not use or disclose any personal data from the human participants, who remained completely anonymous and took part in this research as volunteers. There are no ethical concerns with regard to publishing this research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Limitations</head><p>The dataset for this research was small and slightly unbalanced since the humans answered based on their mood or capabilities, while the LLMs answered strictly with a maximum of ten answers per task.</p><p>Also, the sample pool is quite small, as there were only ten humans and ten LLMs involved, so the results might be unstable when enlarging the dataset.</p><p>Due to lack of space, this study focuses more on automated methods of analysis, than on manual analysis, thus lacking a more in-depth insight into the patterns of the collected answers to the verbal creativity test from both humans and machines.</p><p>Finally, this study compares the creativity answers of humans and LLMs in English, but the human participants to the test were non-native (fluent) English speakers, which can potentially decrease their creativity score, compared to scores they could obtain in their own native language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and future works</head><p>This study showed that there are some differences between human and machine answers given to a verbal creativity test, but also plenty of similarities.</p><p>The LLMs' answers vary much like the humans answers. Individual, unique answers, w.r.t. to the set of all answers are produced by both humans and machines alike, with no noticeable difference.</p><p>Still, at a semantic level, humans and machines generally group together as individuals.</p><p>The performance of automatic classification between human and machine answers is moderate and leaves room for improvement.</p><p>The general findings of this study indicate that LLMs' creative capabilities are comparable with human abilities and, as such, they could be put to good use in the creative domain. Humans "just" need to adapt to their usage, mind the ethics and safety issues, and discern the information at every step, instead of blindly using them.</p><p>In future work, we will focus on expanding the dataset, by adding more LLMs' and humans' answers to the test, for a better statistical coverage.</p><p>Also, we aim to manually investigate more in-depth the database, to look for more systematic patterns for both humans and machines.</p><p>As creativity remains a domain with endless possibilities, we also plan to investigate other aspects of LLMs' creativity, such as language or image.</p><p>Another future approach worthy of pursuing is using Deep Learning approaches instead of traditional Machine Learning approaches for the binary classification task, or using metrics specific to LLM-generated tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Appendix Verbal Creativity Test</head><p>There are 6 types of creativity assessments in this test. Note: Be as creative, original, and innovative as possible. Pay attention to the word and answer limit! Try to think of as many answers as possible within the limit! 1. Alternative uses Test Name up to ten unusual uses for the following five items. Use a maximum of five words. Give one answer per line. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Consequences</head><p>1. There is a mutation and men are the ones giving birth 2. There is a virus and only children survive 3. People can read each other's thoughts 4. You wake up as your child self 5. AI replaces teachers and professors</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Divergent Association Task (DAT)</head><p>Write ten words that are as different from each other as possible, in all meanings and uses of the words.</p><p>Rules: Only single words in English. Only nouns (e.g., things, objects, concepts). No proper nouns (e.g., no specific people or places). No specialized vocabulary (e.g., no technical terms). Think of the words on your own (e.g., do not just look at objects in your surroundings).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Ranking of uniqueness scores for humans and machines</figDesc><graphic coords="3,302.62,84.19,203.37,120.41" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Semantic similarity clusters of answers for all tasks</figDesc><graphic coords="4,89.29,84.19,203.36,140.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Semantic similarity clusters of answers for Alternative Uses</figDesc><graphic coords="4,89.29,260.26,203.36,140.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Semantic similarity clusters of answers for Instances</figDesc><graphic coords="4,302.62,84.19,203.36,140.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Semantic similarity clusters of answers for Similarities</figDesc><graphic coords="4,302.62,260.26,203.36,140.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Semantic similarity clusters of answers for Causes</figDesc><graphic coords="5,89.29,179.80,203.36,140.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Semantic similarity clusters of answers for Consequences</figDesc><graphic coords="5,89.29,355.87,203.36,140.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Semantic similarity clusters of answers for DAT</figDesc><graphic coords="5,302.62,179.80,203.36,139.56" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>2 . 1 .</head><label>21</label><figDesc>Instances Use a maximum of five words per answer. Give one answer per line. Name up to 10 things that:1. Things that can harm one's self-esteem 2. Things that you have control of in your life 3. Situations where it is good to be loud 4. Things that can flow 5. Things that you can mark on a map 3. Similarities How are the following 2 terms alike? Use a maximum of three words to describe a common feature of the following pair of words. Give one answer per line. Give up to ten answers: Crash of a building 2. Everybody turns green at a party 3. Social media disappears 4. Humanity becomes shortsighted 5. Your hat does not fit you anymore</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Binary classification scores</figDesc><table><row><cell></cell><cell></cell><cell cols="2">SVM</cell><cell></cell><cell></cell><cell cols="2">NaïveBayes</cell><cell></cell><cell></cell><cell cols="2">RandomForest</cell><cell></cell></row><row><cell></cell><cell cols="2">Prec. Rec.</cell><cell>F1</cell><cell cols="3">accu Prec. Rec.</cell><cell>F1</cell><cell cols="3">accu Prec. Rec.</cell><cell>F1</cell><cell>accu</cell></row><row><cell>Humans LLMs</cell><cell>0.78 0.67</cell><cell>0.60 0.83</cell><cell>0.68 0.74</cell><cell>0.71</cell><cell>0.70 0.79</cell><cell>0.83 0.65</cell><cell>0.76 0.71</cell><cell>0.74</cell><cell>0.67 0.76</cell><cell>0.80 0.61</cell><cell>0.73 0.68</cell><cell>0.71</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://help.openai.com/en/articles/ 6825453-chatgpt-release-notes</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/chat/models/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">No longer supported</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://huggingface.co/spaces</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://gpt4all.io/index.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://www.bing.com/chat?form=NTPCHB</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://gemini.google.com/app</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://auth.arabic-gpt.ai/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">https://you.com/?chatMode=default</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">https://c.ai/c/YntB_ZeqRq2l_aVf2gWDCZl4oBttQzDvhj9cXafWcF8</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10">https://colab.research.google.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_11">https://claude.ai/chat/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_12">https://www.microsoft.com/en-us/microsoft-copilot</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_13">https://gemini.google.com/app/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_14">https://spacy.io/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16" xml:id="foot_15">https://scikit-learn.org/stable/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="17" xml:id="foot_16">https://matplotlib.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="18" xml:id="foot_17">https://numpy.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="19" xml:id="foot_18">https://pandas.pydata.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="20" xml:id="foot_19">https://www.datcreativity.com/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by a mobility project of the Romanian Ministery of Research, Innovation and Digitization, CNCS -UEFISCDI, project number PN-IV-P2-2.2-MC-2024-0589, within PNCDI IV.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An integrated benchmark for verbal creativity testing of llms and humans</title>
		<author>
			<persName><forename type="first">D</forename><surname>Anca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Maria</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on Knowledge-Based and Intelligent Information &amp; Engineering Systems (KES 2024)</title>
				<meeting>the 28th International Conference on Knowledge-Based and Intelligent Information &amp; Engineering Systems (KES 2024)</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>KES 2024. accepted</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Creativity: Flow and the Psychology of Discovery and Invention</title>
		<author>
			<persName><forename type="first">M</forename><surname>Csikszentmihalyi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1996">1996</date>
			<publisher>HarperCollins Publishers</publisher>
			<pubPlace>New York, NY</pubPlace>
		</imprint>
	</monogr>
	<note>first ed</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Generative artificial intelligence enhances creativity but reduces the diversity of novel content</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Doshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Hauser</surname></persName>
		</author>
		<idno type="DOI">10.2139/ssrn.4535536</idno>
		<ptr target="https://ssrn.com/abstract=4535536.doi:10.2139/ssrn.4535536" />
	</analytic>
	<monogr>
		<title level="j">Science Advances</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">5290</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The originality of machines: Ai takes the torrance test</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">E</forename><surname>Guzik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Byrge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gilde</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.yjoc.2023.100065</idno>
		<ptr target="https:" />
	</analytic>
	<monogr>
		<title level="j">Journal of Creativity</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page">100065</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<idno type="DOI">10.1016/j.yjoc.2023.100065</idno>
		<idno>.100065</idno>
		<ptr target="//doi.org/10.1016/j.yjoc.2023" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">The Creative Mind: Myths and Mechanisms</title>
		<author>
			<persName><forename type="first">M</forename><surname>Boden</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
			<publisher>Routledge</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Artificial intelligence in the creative industries: a review</title>
		<author>
			<persName><forename type="first">N</forename><surname>Anantrasirichai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bull</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence Review</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="589" to="656" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.06647</idno>
		<title level="m">A survey on large language model hallucination via a creativity perspective</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Creativity</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J P</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">American Psychologist</title>
		<imprint>
			<date type="published" when="1950">1950</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m">Encyclopedia of Creativity, Invention, Innovation and Entrepreneurship</title>
				<editor>
			<persName><forename type="first">E</forename><surname>Carayannis</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The Cambridge Handbook of Creativity</title>
	</analytic>
	<monogr>
		<title level="s">Cambridge Handbooks in Psychology</title>
		<editor>J. Kaufman, R. Sternberg</editor>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>Cambridge University Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P J P</forename><surname>Guilford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The nature of human intelligence</title>
		<title level="s">McGraw-Hill series in psychology</title>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Guilford</surname></persName>
		</editor>
		<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>McGraw-Hill</publisher>
			<date type="published" when="1967">1967</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2401.12491</idno>
		<title level="m">Assessing and understanding creativity in large language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Chakrabarty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Laban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-S</forename><surname>Wu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.14556</idno>
		<title level="m">Art or artifice? large language models and the false promise of creativity</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Is artificial intelligence more creative than humans? : Chatgpt and the divergent association task</title>
		<author>
			<persName><forename type="first">D</forename><surname>Cropley</surname></persName>
		</author>
		<idno type="DOI">10.59453/ll.v2.13</idno>
		<ptr target="https://learningletters.org/index.php/learn/article/view/13.doi:10.59453/ll.v2.13" />
	</analytic>
	<monogr>
		<title level="j">Learning Letters</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">13</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models</title>
		<author>
			<persName><forename type="first">P</forename><surname>Organisciak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Acar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dumas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Berthiaume</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.tsc.2023.101356</idno>
		<ptr target="https://doi.org/10.1016/j.tsc.2023.101356" />
	</analytic>
	<monogr>
		<title level="j">Thinking Skills and Creativity</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page">101356</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
