<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.48550/arXiv</article-id>
      <title-group>
        <article-title>Why try to build try to build a co-creative poetry system that makes people feel that they have “creative superpowers”?⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ibukun Olatunji</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Foundry, Swansea University</institution>
          ,
          <addr-line>Crymlyn Burrows, Skewen, Swansea</addr-line>
          ,
          <country country="UK">United Kingdom</country>
          ,
          <addr-line>SA1 8EN</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>1</volume>
      <fpage>796</fpage>
      <lpage>796</lpage>
      <abstract>
        <p>The paper examines co-creative writing systems, and argues that existing Large Language Models could potentially reduce human capacity. Furthermore, existing sociocultural inequalities might be exacerbated by the widespread adoption of such generative systems. The paper instead suggests a custom approach, using co-creative poetry writing as an example. The system has architectural changes from typical language models to better support poetry. It also uses rap lyrics as part of the training data in order to help reduce sociocultural bias. A high level system implementation is proposed along with some evaluation methods. Evaluation is based on expert judgement on final outputs, and user performance on language tasks associated with human creativity. The final section of the paper explores how and why alternatives to existing co-creative systems could benefit individual users as well as wider society.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Creativity</kwd>
        <kwd>poetry</kwd>
        <kwd>co-creativity</kwd>
        <kwd>natural language processing</kwd>
        <kwd>language models</kwd>
        <kwd>writing support tools</kwd>
        <kwd>data sets</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
discussion of the social and cultural limitations of current
generative systems. It expands on section one in
exploring bias and proposes a mitigation through the use of rap
lyrics. Section five describes the theoretical and practical
limitations of the paper as well as future work. Section
six provides a summary of the paper’s contribution. The
section ends with answers to the question: why try to
build try to build a co-creative poetry system that makes
people feel that they have “creative superpowers”?
This paper examines co-creative systems using poetry
writing as an example. Within the paper ’poetry’ includes
song lyrics. Section one of the paper explores poetry in
terms of human creativity. Poetry is chosen as it is a
creative task that non-expert humans can outperform
machines on vs creative outputs such as image
generation. After introducing the case for poetry, there is an
exploration of recent work in generative computational
systems. As well as being the technical state of the art, 1.1. Human Creativity
these systems provide a conceptual framework to explore
sociocultural issues such as bias and inclusion. Section Human creativity is the ability to come up with ideas or
one then explores a range of poetry-specific systems and artefacts that are new, surprising, and valuable. Rather
ends with a more detailed case study. The case study than a solitary act, it results from the interaction of social
examines a system that combines elements of more pow- elements; a culture that contains symbolic rules, a person
erful general models and custom architectural features who brings novelty into the symbolic domain, and
peospecific to poetry writing. Section two details the eval- ple who recognise and validate the innovation. [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
uation issues and methods that might be employed for Boden makes a further distinction between psychological
the proposed co-creative system. The emphasis on this and historical creativity (P-creativity and H-creativity).
section is on how to evaluate human improvement over P-creativity involves coming up with an idea that’s new
time. Section three explores a high level implementation to the person who comes up with it. H-creativity means
of the system. It builds on the evaluation to propose that (so far as we know) no-one else has had it before:
both an architecture and a method to testing if the pro- it has arisen for the first time in human history [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ].
posed system has, in principal, any benefits over and Machine learning models have the potential to support
above those described in section one. Section four is a human creativity [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. However, questions remain on
their design and influence in augmenting human capacity
Joint Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, as opposed to reducing it [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9, 10</xref>
        ]. Shneiderman
sug⋆AIubsutrkaulniaOlatunji. 2023. Why try to build a co-creative poetry system gests that "researchers’ goals shape the questions they
that makes people feel that they have “creative superpowers”? raise, collaborators they choose, methods they use, and
In Joint Proceedings of the ACM IUI 2023 Workshops. Sydney, outcomes of their work."[11]. This leads to the question:
Australia, 13 pages. how can designers of programming interfaces, interactive
* Corresponding author. tools, and rich social environments enable more people
$ i.o.olatunji.2030349@swansea.ac.uk (I. Olatunji)
      </p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License to be more creative more often? [12]
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)</p>
      <p>Language Model Characteristics</p>
    </sec>
    <sec id="sec-2">
      <title>1.3. General Purpose Language</title>
    </sec>
    <sec id="sec-3">
      <title>Generation</title>
    </sec>
    <sec id="sec-4">
      <title>1.2. Computational Systems</title>
      <p>In computational terms, automated systems are now
capable of writing poetry approaching human levels LLMs are trained to predict the next word, or series of
[13, 14, 15]. Karimi et al consider three three main strate- words, in a a text sequence. They model text corpora as
gies by which the role of humans in creative systems can probability distributions. Users write a short text prompts
be characterized: fully autonomous systems, creativity to tell the system what to generate. Depending on how
support tools, and co-creative systems [16]. Although many examples are provided in the text prompt, the
systhe paper is primarily concerned with co-creative sys- tem is referred to as zero-, one-, and few-shot learning
tems, it will to blend the categories where necessary. The [13, 15, 17]. Pretrained language models have become
reasoning for this is that the human users do not make a cornerstone of modern natural language processing
the same distinctions; also, the features and usage are of- (NLP) pipelines because they often produce better
perten blended in the real-world, e.g an autonomous system formance from smaller quantities of labeled data [23].
that is used by a creator as an input and thus becomes a Within general LLMs, the transformer has established
support tool and/or co-creative system [10]. The next sec- itself as best performing on benchmark language
processtion briefly outlines the state of the art in computational ing tests [13, 15, 24]. As well as being able to perform
writing systems. tasks such as text summarising and question
answer</p>
      <p>
        Language models (LMs) refer to systems that are ing, LLMs have the potential to support creative writing
trained on string prediction tasks: predicting the like- [
        <xref ref-type="bibr" rid="ref6 ref8 ref9">6, 8, 9</xref>
        ]. Current state-of-the-art LLMs are summarized
lihood of a token (character, word or string) given either in table 1. However, despite impressive technical
achievethe preceding context or its surrounding context. Such ments, LLMs have limitations including: (a) models, as
systems are unsupervised and when deployed, take text they scale, might eventually run into the limits of any
preas input, and output scores or string predictions [17]. training objectives; (b) the models are expensive and
difLarge Language Models (LLMs) trained on suficiently ifcult to perform inference on; (c) model decisions are not
large and diverse data sets are able to perform well across easily interpretable; (d) the majority of the research
comdomains and there is a correlation between model per- munity, and by extension disadvantaged social groups,
formance and size [18]. State-of-the-art models are able have been excluded from the development of LLMs as
to generate text that approach or surpasses that of some they are proprietary (see table 1) and, (e) most LLMs are
humans[13, 14, 15, 19]. The emphasis on some humans primarily trained on English-language text that contains
is an important with respect to user characteristics; in data biases [13].
broad terms, humans co-creating poetry can be
considered as either inexperienced or advanced users. Research 1.4. Poetry Specific Language Generation
on creative tasks such as improvisation suggests that
users vary in cognitive processing based in part on their Creating poetry is creative skill that requires extensive
experience and skills levels[20, 21]. A well-designed co- vocabulary, phonemic awareness to produce complex
creative system should therefore take diferences in user rhyme patterns, and general knowledge of enough
subsupport needs into account [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9, 22</xref>
        ]. jects about the world to be able to tell interesting stories
about a range of topics [20, 25, 26, 27].
1.5. Overview of Poetry Support Tools features that allow it to operate as both a co-creative and
autonomous system [37]. Having looked at the
computaHistorically, poetry creation systems tended to built on tional systems, it is instructive to briefly consider poetry
the model of the an AI writing a full poem by itself, thus writing from a human perspective. It will help inform
writing in a closed system [28, 29, 30, 31]. Early sys- the design of a new poetry writing system.
tems tended to be rule-based [32]. More recently, some Writing poetry requires a range of general creative
approaches have started to explore human interaction skills that can be framed in terms of divergent and
converwhen composing poems[33, 34]. Table 2 provides a broad gent thinking; these are used in varying ways throughout
summary of selected systems including autonomous, sup- a multi-stage writing process. For simplicity, the stages
port tools and co-creative as defined by Karimi et al [ 16]. include (a) exploration which is characterised by
diverThe category distinction helps frame a range of (human) gent thinking [21, 38, 39, 40]; (b) focused work is uses
creative processes and (technology) interactions. It is convergent thinking [21, 41] and, (c) re-drafting. It is
also a useful way to consider ways in which the pro- useful in the stages to distinguish between internal and
posed system is diferent to those that currently exist; external co-creation system activities. Internal is when
and as importantly, ways in which it is similar. At a high the user interacts with the system in real-time, e.g writing
level, the autonomous systems are designed to be able or redrafting text; external is when the user participates
to create finished works (sometimes called ’products’ or in activities such as browsing, reading or other things
’artefacts’). The support tools are used as part of the that do not not use the system. The framing of internal
creative workflow. For instance, RhymZone or Rhymer and external system activities is based on the reasoning
help a user find words that sound similar to those they that; (a) skill: inexperienced users are unlikely to possess
might use in a poem [35, 36]. Co-creative systems facili- the improvisational skill required to create full poems in
tate humans and computational systems to make shared real-time due to cognitive processing constraints [20, 42];
products. That said, the distinction is not fixed. For (b) speed: users might choose to write poems over
mulexample, Rytr, contains text editing, display and other
tiple sessions, in this case external system stimuli could
have supported the writing; (c) knowledge: advanced
writers are usually familiar with a body of existing that
informs their work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and, (d) process: reflecting and
redrafting is an important part of writing . The reflecting
stage often takes place separately to the creation of the
work itself [
        <xref ref-type="bibr" rid="ref1">1, 10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>1.6. Case Study: Verse by Verse</title>
      <p>inform their own poetic development.</p>
      <sec id="sec-5-1">
        <title>2. Experiment Design</title>
        <sec id="sec-5-1-1">
          <title>Verse by Verse ran comparative evaluations of the system</title>
          <p>against poems written by classic poets. Although the
system was intended to be used as an interactive co-creator
for the human writing a poem, the author’s stated it was
still worth evaluating how the system could perform on
its own in writing a poem given a first line of verse [ 34].
This approach has been adopted within the proposed
system experimental design, implementation and evaluation.
The next subsection explores evaluation prior to looking
at implementation. The rationale is that the evaluation is
perhaps a harder problem as it involves an intersection
of multiple disciplines (e.g. computational sciences, arts,
linguistics, and pedagogy). Implementation can mostly
be restricted to computational science domains.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>2.1. Evaluation Overview</title>
      <sec id="sec-6-1">
        <title>Evaluating co-creative systems is still an open research</title>
        <p>question and there is no standard metric for measuring
computational co-creativity [16, 43]. Karimi et al describe
Screenshot from Verse by Verse application by Google the limited research investigating how co-creative
systems can be evaluated. They present four questions as a</p>
        <p>
          Google Research Verse by Verse is relevant case study way to compare how (existing) co-creative systems
evalas it is arguably the most technically advanced poetry- uate creativity: who is evaluating the creativity, what is
specific generative system. As well as using transformer being evaluated, when does evaluation occur ,and how
model architecture, it also uses informational retrieval, the evaluation is performed [16]. Calderwood et al point
and considers bias within its design. Verse by Verse aug- out that "writers engaged with co-creative systems are
ments user poetry composition by ofering suggestions looking for creative insight, something not measured by
to a user as they compose a poem. The authors of the perplexity or by a language model’s ability to solve the
system argue that relative to a creating full poems, "this canonical downstream NLP tasks [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. For the evaluation
is a much more challenging task, as one needs be able of the system proposed to be efective it is insightful to
to ofer suggestions with minimal latency while meeting restate its goals in more detail. The co-creative poetry
constraints of the poem structure and handle the chal- system’s goal is “making people feel that they have
“crelenges of user input[34]. Figure 1 shows part of the sys- ative superpowers”? To achieve this, the system supports
tem’s user interface (for PC). From a user’s point of view, users to create better poetry than they might otherwise
the experience is as follows (a) the user selects poet(s) to have done without the system. The terms supports and
inform the suggestions; (b) the user designs poem struc- better will be further explored as they form the basis of
ture as illustrated in figure 2; (c) the user writes the first evaluation.
line of text and, (d) the system ofers suggestions in the Augmenting human users is central to HCAI and a
style of the poet(s) the user selected earlier. The user can contrast to a closed model that creates on behalf of the
then work with, modify or have the system create new user [
          <xref ref-type="bibr" rid="ref8">8, 34, 44</xref>
          ]. This point is made in recent work that
verses. The Verse by Verse design has an external system refers to pitfalls when designing human-AI co-creative
context that, in general , LLMs do not. To some extent, systems, as well as other work which asserts that
generthe system helps poetry writers become better readers. ative models can help writers without writing for them
In his work on creativity it was suggested to Csikszentmi- [
          <xref ref-type="bibr" rid="ref5 ref9">5, 9, 22</xref>
          ]. The arguments these, and similar work, make is
halyi that "the only way you become a poet...is because that too much automated creation can be at the expense
you’ve read a poem...poetry depends on the whole po- of human users [
          <xref ref-type="bibr" rid="ref9">9, 22</xref>
          ]. Adopting this thinking, it is
useetic tradition of the past...you have to decide...out of all ful to evaluate the system and its users independently,
that previous poetry, what is most interesting to me?" as well as in combination. This in theory allows (system)
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] Verse by Verse, by making users aware of the work internal and (human) internal and external measurement.
of other poets, helps users become readers in order to
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>The end goal here is that human users develops their</title>
        <p>capacity; this could be external to the system, whereby
the system as acted as a creative prompt. A description of
how this could work in principle follows. A later section
describes system implementation.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>2.2. Process and Objectives</title>
      <sec id="sec-7-1">
        <title>The system would run a number of experiments with the</title>
        <p>
          purpose of establishing which system components most
support users to write “better” poetry; in goal terms,
better is evaluated (a) subjectively by users via a Likert
scale [45] and (b) by performance on related tasks such as
the Divergent Action Task, Bridge-the-Associative-Gap
Task, or rhyme creation and identification [ 46, 47]
The tasks would be completed external to the system.
The goals of the evaluation are to measure to what
extent users are actually improving their poetry writing
abilities, and the degree to which any improvement
is as a result of internal system features. For a user,
improvement is concerned with "the writer’s goals or
their desire to have an individual voice" [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. With this
as a basis, the evaluation process takes the form of a
number of hypotheses and related experiments, the
purpose of which is to explore; (a) how well general
vs poetry specific language models can write full
poems; (b) if poetry specific language models can
better represent individual users style than generalised
language models; and, (c) the extent to which users
benefit when writing poems from system
recommendations. The hypotheses and experiments are concerned
with poetic text style which describes the ways (an
author) uses language, including prosody, word choice,
sentence structure and use of figurative language [ 48, 49].
        </p>
        <p>
          A central challenge for the proposed system is that
the development and attainment of an individual poetic
voice is highly subjective. Beyond subjectivity, poetry is
from a societal perspective often a question of cultural
value which over time may well change. In reference
to Kendrick Lamar’s 2018 Pulitzer Prize, a first for a rap
album, their administrator of prizes said, "..this is not a
genre we’ve seen celebrated before, so that in that sense
it’s historical." [50] Furthermore, as Boden states, "...even
in science, values are often elusive and sometimes
changeable...because values are highly variable, it follows that
many arguments about creativity are rooted in
disagreements about value. This applies to human activities no
less than to computer performance." [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
be given to users (inexperienced and advanced
) with the same constraints as the system in
terms of keywords, topics, character limits etc.
The evaluation for experiment A is by humans
who judge the quality of the poems (which
are anonymous) by a Likert scale and free text
summary.
2. Hypothesis-B that poetry specific language
generation customised for a given user could
outperform vanilla poetry specific generation
with respect to creating poems. Experiment B:
each system state generates complete poetic
text but some states are pretrained to customise
characteristics with respect to given users and
their poetic styles. The evaluation for experiment
B is by humans who judge the quality of the
poems by a Likert scale and free text summary.
The evaluation is focused on how well the poems
represent the given users’ individual style.
3. Hypothesis-C that external recommendations, full
or part poems, based on given user characteristics
are supportive with respect to users writing their
poems. Experiment C: for given users generated
poetic text inputs, the system state generates
(external to system) poetic text recommendations
that the user reads and reflects on before
completing their poem. The evaluation for experiment
C is by humans who judge how well the poem
recommendations helped them write poems in
the theme, topic or style they were attempting to
achieve.
        </p>
        <p>The approach described provides a sense of how user
activities (internal and external) with respect to the
system can be evaluated. In practice, more fine-grained
evaluation criteria would be required based on further
research and operational or implementation design; as
far as possible, a complete system would have an
awareness of all relevant evaluation data including for instance,
external system reading of poems. At this stage, the
evaluation proposed is limited to the extent necessary in
order to support the explanation of how and why the
system might work. A later section (Limitations and Future
Work) will the explore the limitations as suggest possible
remedies.</p>
        <sec id="sec-7-1-1">
          <title>3. Proposed Implementation</title>
          <p>1. Hypothesis-A that poetry specific language The system would have a number of states that range
generation could outperform general language from full automation to text prompts acting as a starting
generation with respect to creating poems. point for the user. The support states envisaged are:
Experiment A: each system-state generates
complete poetic texts. The prompts would also
1. State-A: general language system implemented as
standard.
2. State-B: general language system implemented
with modified architecture to include user
generated content within training set and/or network
architecture preferences.
3. State-C: poetry specific system implemented with</p>
          <p>standard architecture.
4. State-D: poetry specific system implemented with
modified architecture to include user generated
content within training set and/or network
architecture preferences.</p>
          <p>
            The LLM component of the system would use publicly
available APIs an, where possible, modify network
architecture directly where possible [51, 52, 53]. In most cases
(table 1) LLM are closed black box systems as illustrated
in (figure 3). In part for this reason, ideally a custom
poetry and lyric language model would be implemented;
aside from practicalities (which will be discussed) there
is a a technical challenge in that a poetry and lyric LM
would be far smaller than a general LLM. Given the
research on LLM size and performance, a custom poetry
and lyric LM would in theory therefore under perform
against state-of- the-art LLMs [18, 54, 15]. In line with a
recent study, which experimented with user experiences
of language models, the system could be implemented
with a combination of JavaScript, React, Python and Flask
[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. The system would then be deployed as a web
application for mobile phones. Mobile is preferred to PC on
the basis of its greater reach as a device for both reading
and creating contemporary poetry [55, 56].
1. Text input by user is returned as partially completed poetic
text and/or poetic and lyrical recommendations for the user
to consider. 2. User personalised data submitted as poems or
lyrics and/or recommendations of favourite artists and their
work. These are used to create a corpus of user text. Prior
          </p>
          <p>examples of user generated text uploaded to system;
recommender and/or database search to enhance user text
with additional poetic texts (e.g from web crawl) 3. Database
of poetic texts (and song lyrics) from web crawl. Clean text is
included as well as metadata such as rhyme scheme and</p>
          <p>Parts of Speech (PoS).</p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>The system (figure 2) operates as a Sparse And Dense</title>
        <p>Network (SPAD). The name refers to the system being
sparse with respect to user input tokens as compared to
tokens contained in the LM/LLMs. Against this, the
system is dense in terms of leveraging transformer models
and their associated attention layers (table 1). The
intuition is to use a small amount of personalised user text to
attempt to customise the output of powerful LMs/LLMs.
This difers from existing approaches in the following
ways.</p>
        <p>• State-of-the-Art LLMs form part of the SPAD in
order to help improve the SPADs performance; in
other words, the LLMs are source of input
training data and as such multiple LLMs could in
theory be included in the SPAD architectural design.
• A poetry specific LLM (GPT-NeoX) forms part of
the design; poetry specific refers to adaptations
to the underlying model architecture in order
that token processing and output is more optimal
with respect to poetry than prose. An example
of this might be applying additional linguistic
layers within the network to favour text strings
with syllable frequencies found more regularly
in poems than say news articles or web pages.
Although architecture is referred to, much of any
benefit at this stage might come from modifying
the training data and associated recipes. The
poetry specific LLM would also leverage data from
the general LLM (for simplicity any interaction
between the two elements is not included in
figure 2).
• Poetry and lyric LM is a custom model whose
network architecture and training data is specific
to poetry. In practical terms it is not a LLM as
the available training data is not likely to be
sufifciently extensive vs the current state of the art.
As well as providing a data contrast to the LLMs,
this part of the network will also act as a style
transfer layer in so far as it identifies and tries to
modify input text to create poetic styles. These
styles will be mapped onto user styles upstream
within the system.</p>
        <p>The result of the models described above, is a system
that contains information on generalized poetic style
as well as individual style preference(s) unique for each
user. This allows the system to support users with
specific co-writing tasks (e.g text generation) as well as ofer
personalised recommendations further reading of
relevant poems and/or poets. In user experience terms, this
might be delivered via an interface that allows the user
to switch between (a) writing text; (b) editing generated
text; and (c) reading and reflecting on specific poetic
recommendations made by the system.</p>
        <p>At this stage, the proposed mode is high-level. There
are open questions relating to issues such as real world
implementation, customisation of user text, acquisition
of training data and other areas. The penultimate section
will revisit some of the open design questions and attempt
to provide answers. The next section explores the soical
significance of poetry and how the a system design could
use this to enhance cultural inclusiveness.</p>
        <sec id="sec-7-2-1">
          <title>4. Discussion</title>
          <p>An important goal for poetry is for each writer to discover
or develop their own unique style, or artistic voice. Part
of a writers development will a result of what poetry they
have previously been exposed to. Robert Graves stated
that, “only a poet of experience...can hope to put himself
in the shoes of his predecessors, or contemporaries, and
judge their poems by recreating technical or emotional
dilemmas which they faced while at work on them." [57]
It can be argued that this statement is, in contemporary
terms, biased in gender terms given the assumption of
‘poet’ being male. Graves’s central argument about
experience however is echoed in recent studies on language
models. A study by Cheng and Uthus made the point
that “as creative works are often shaped by the lived
experiences and timely issues of the creator’s life, a poetry
composition system trained on poems from diferent
authors of diferent eras may reflect a variety of societal
biases." [58] Within computer science, social bias is a
subject gathering more research attention [17, 59, 60]
However, as well as attempting to mitigate negative impacts
for disadvantaged groups, considering bias also ofers
possibility of designing systems that leverage cultural,
poetic and linguistic resources that would otherwise be
missed. This can benefit all user groups. The next section
provides a more concrete example.
racist, sexist or ableist) [17]. Studies have how that harms
can also exist because of (a) exclusionary social norms
in language within language. For example, ‘family’ is
often defined as a basic social unit consisting of a
married woman, man and their children; language models
internalizing such social norms could be highly
discriminatory towards people outside this definition [ 60]; (b)
greater propensity to label of language of marginalized
or underrepresented groups as toxic in hate speech
detection (e.g. the ‘angry black woman’ stereotype) [60]; and
(c) over representation of certain groups such as white
males 18-34 within widely used training data (e.g
Reddit posts) [17]. Bender et al assert that, “in the case of
US and UK English...white supremacist and misogynistic,
ageist, etc. views are over represented in the training
data, not only exceeding their prevalence in the general
population.” [17]. The authors go on to say that the data
underpinning LMs stands to “misrepresent social
movements and disproportionately align with existing regimes
of power.”</p>
          <p>There are a number of studies that explore bias
mitigation through computational techniques such as (a)
augmentation of the training data using style transfer
[58] or (b) using counterfactuals to reduce sentiment bias
[59]. However, in their study describing GPT-3 the
authors caution against on over reliance on computational
solutions. They instead ask for “...more research that
engages with the literature outside NLP, better articulates
normative statements about harm, and engages with the
lived experience of communities afected by NLP
systems...mitigation work should not be approached purely
with a metric driven objective to ‘remove’ bias...but in
a holistic manner [15]. For the use case of a poetry
cocreation system, bias could be potentially mitigated by
including rap lyrics as a key part of the training data set.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4.2. Towards Culturally Responsive</title>
    </sec>
    <sec id="sec-9">
      <title>Models</title>
      <sec id="sec-9-1">
        <title>Emerging from a hobby of African American youth in the</title>
        <p>4.1. Bias in Language Models 1970s, rap (as an element of hip-hop) has quickly evolved
It has been recognised and accepted in recent years that into a mainstream culture and is the most popular music
LLM used for text generation contain bias [17, 60] A genre in the U.S and many other territories [61, 62, 63, 64].
study by Uthus suggests that “biases in creative language Writing rap lyrics requires both creativity to construct
applications are under explored”; it goes on to say it is im- a meaningful, interesting story and lyrical skills to
proportant to examine biases in these applications because duce complex rhyme patterns [26, 48, 65]; within the
they intended for contexts such as self-expression, collec- culture of rap, writers are evaluated by peers on the
bative social enjoyment, and education [58]. One of the key sis of their wordplay, linguistic complexity and ability
sources of bias in LLM is in the training data sets. LLM to use multiple rhyme types (perfect and imperfect) as
retains the biases of the data they have been trained on well as multi-syllabic rhymes [26, 66]. In many ways,
[15]. Typically the model’s pick up on, or reflect, biases the writer within the hip-hop tradition sets language
and overtly abusive language patterns in training data. puzzles for their audience. In a recent BBC
documenThis can lead to harms for some users such as encounter- tary, Chuck D, the founder of Public Enemy remarked
ing derogatory language or discriminatory language (e.g. that "poets were always...going to give you everything
the truth...that’s very important not only in the realm of
hip hop...but in the realm of artistry.” [67] Recent
computational studies have explored rap on account it its
complexity and cultural significance [ 65, 68, 69]. Rap
has historically been excluded from most mainstream
discussions on co-creative systems and poetry writing.</p>
        <p>There may well be valid reasons for this such as language
appropriateness, perception around negative sentiments,
ofensive content, and dificulties in accessing material
under copyright. However, although there are challenges,
the benefits of using extensive rap lyrics within LM data
sets include:
• Training data that represents wider audience
con</p>
        <p>cerns, thoughts and feelings.
• Training data will be dynamic and reflect
contem</p>
        <p>porary sociopolitical issues.
• Opens up the possibility of bring voices from
ex</p>
        <p>cluded communities into the NLP community.
• LMs would be enhanced by a linguistically rich</p>
        <p>and varied source of data..
• Allows lyrics to be part of a wider conversation
which potentially generates. new research
insight (for computational, language and social
researchers).</p>
      </sec>
      <sec id="sec-9-2">
        <title>Ultimately, as contemporary music’s biggest genre, and the one most concerned with rhyme and wordplay, there are multiple reasons to explore using rap lyrics as training data.</title>
        <sec id="sec-9-2-1">
          <title>5. Limitations and Future Work</title>
          <p>The paper has a number of limitations. Below some of
these are described along with suggested directions for
future work. System Design and Implementation: the
paper does not fully explore how the proposed system
could be built. In particular, there are challenges around
the following:
• Building custom LLMs. One of the design
limitations is how to efectively experiment with
models of varying degrees of openness (for
convenience referred to as black, grey and white box).</p>
          <p>For black box models (e.g GPT-3) there is no way
at present to modify the architecture. What
instead might be possible is to fine-tune the model
via custom queries over a period of time. So,
what combinations of prompts generate the most
favourable outputs. Grey box models (BLOOM
or GPT-NeoX) ofer the possibility of powerful
models with open-source training and evaluation
code plus model weights [53]. However, the costs
of running and/or adapting these models could
be substantial and not something the paper has
explored.</p>
          <p>• Customizing models for individuals: this is a
system objective but has not been tested. Technically,
there is a conflict between the scale and
performance benefits of LLM/LM and the comparatively
small datasets of individual users. However, as
Vigliensoni et al argue, working with small-scale
datasets is an overlooked but powerful
mechanism for enabling greater human influence over
generative AI systems within in creative
contexts [70]. The authors describe an
experimental project, ReRites by Johnston which involved
ifne-tuning GPT-2 on the artists’ custom poetry
corpus to generate poems. An approach such as
this could be taken although clearly using models
such as GPT-2 (for which source code is
available) has the limitation of performance vs current
state-of-the art LLMs. The personalizing of LLMs
to individual users is an open topic that requires
further research.
• Acquiring training data: training data for poetry
and rap lyrics would not be readily available in
the way that the Pile or equivalents are used for
general LLMs [19]. The solution to this would be
to source data from scraping the web for lyrics, or
directly from services such as MusixMatch [71]
Poetry training data, much of which will be out
of copyright, can be acquired via sites such as
Project Gutenberg and Poetry Foundation. This
approach to training data was used in a 2019
experiment to create a poetry-specific LLM based
on the GPT-2 model [72].</p>
          <p>Evaluation: literature on evaluating the creativity in a
co-creative systems considers a wide number of factors
such who evaluates the creativity (e.g. system itself or
human users), what is being evaluated (e.g. user interaction
or output), when does evaluation occur (e.g in real time
or at the end of a session) and how the evaluation is
performed (e.g. methods and related metrics) [16]. There is
a broad set of metrics for developing computational
models for evaluating creativity. With respect to the system
described, the most relevant include a proposed
computational model by Agres et al. The model reflects human
conceptualization of musical and poetic creativity [73].</p>
          <p>Future work could explore the kind of model described
alongside other linguistic-based metrics such as the
Divergent Action Task, Bridge-the-Associative-Gap Task, or
rhyme creation and identification tasks. [ 46, 47]
Additionally, building on machine learning practices, metrics
could be derived for accuracy in terms of the degree to
which generated output matches a reference dataset. For
example, if the user has a target poetic style, it might
be possible to computationally determine the extent to
which the completed poem was accurate or not. The
paper has not explored these kinds of evaluation in de- part of the most popular music genre. Poetry matters to
tail and they would form part of future work. Finally, society. By extension, it is worth building system that
though the evaluations proposed are limited, they could can help people experience it firsthand and connect with
nevertheless contribute to the wider discussion around its traditions. The aim though should not be to make
the topic. As Karimi et al assert "evaluating co-creative people feel they have "creative superpowers"; instead a
systems is still an open research question and there is no system should support people to actually build "creative
standard metric that can be used across specific systems." superpowers".
[16].</p>
        </sec>
        <sec id="sec-9-2-2">
          <title>6. Conclusion</title>
        </sec>
        <sec id="sec-9-2-3">
          <title>7. Acknowledgements</title>
        </sec>
      </sec>
      <sec id="sec-9-3">
        <title>This work was supported by the Engineering and Phys</title>
        <p>
          Artistic creativity is a process, in which an initial im- ical Sciences Research Council. The author would also
provisational phase is followed by a period of focused like to acknowledge the support of Swansea Council.
re-evaluation and revision [20]. Spontaneous
improvisation is a complex cognitive process that shares features
with what has been characterized as a ‘flow’ state [
          <xref ref-type="bibr" rid="ref1">1, 20</xref>
          ]. References
Much current work on co-creative settings focuses on the
role of the system as a generator that augments what
people can achieve in creative tasks [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. There are problems
with this such aligning the system capabilities and user
expectations, language model bias, system
interpretability, and user interaction design [
          <xref ref-type="bibr" rid="ref8">8, 22, 74</xref>
          ]. Studies have
found that diferent mental expectation of users afects
their strategies and perception of the system role in the
co-writing process [
          <xref ref-type="bibr" rid="ref9">9, 74</xref>
          ].
        </p>
        <p>This position paper explored the recent background
to co-creative writing systems, with poetry as a use case.</p>
        <p>Poetry was defined as including song lyrics for which
the paper argued that rap was the most relevant genre.</p>
        <p>The paper then proposed a system that, as far as the
author is aware, has novel features relative to the state
of the art. The system and how it could be evaluated
and implemented were then described. Importantly, the
design includes recommendations for user activities
external to the system. The rationale for this is that the
system priority is to help the human user to develop
an artistic style rather than to create text on the users
behalf. Issues around the mitigating some system bias
using rap lyrics was also discussed. Future work could
include more detailed analysis of evaluation methods as
well as how these could be delivered internally to the
system. Further work on user interface design is also a topic
to develop. Additionally, the implementation proposal
is high level and constraints such as latency, database
design, and other factors have not been considered. In
order to build a viable prototype, software architecture
would most likely form the next stage of the research.</p>
        <p>Finally, to revisit the title of the paper: why build a
cocreative poetry system that makes people feel that they
have “creative superpowers”? Studies demonstrate that
poetry is an emotional capable of engaging the brain’s
areas of primary reward [75]. It is a form of
communication that has existed throughout human and across
cultures. In modern society, poetry has become a central
shops, 2021. URL: https://ceur-ws.org/Vol-2903/ dataset of diverse text for language modeling,
IUI21WS-HAIGEN-1.pdf. arXiv.org (2020). URL: https://arxiv.org/abs/2101.
[10] A. Ploin, R. Eynon, I. Hjorth, M. A. Osborne, Ai and 00027. doi:10.48550/arXiv.2101.00027.
the arts: How machine learning is changing artistic [20] S. Liu, H. M. Chow, Y. Xu, M. G. Erkkinen, K. E.
work. report from the creative algorithmic intelli- Swett, M. W. Eagle, D. A. Rizik-Baer, A. R. Braun,
gence research project, 2022. URL: https://www.oii. Neural correlates of lyrical improvisation: An fmri
ox.ac.uk/news-events/reports/ai-the-arts/. study of freestyle rap, Scientific Reports 2 (2012).
[11] B. Shneiderman, Design lessons from ai’s two grand URL: https://www.nature.com/articles/srep00834.
goals: Human emulation and useful applications, doi:10.1038/srep00834.</p>
        <p>IEEE Transactions on Technology and Society 1 [21] W. Zhang, Z. Sjoerds, B. Hommel, Metacontrol of
(2020) 73–82. doi:10.1109/tts.2020.2992669. human creativity: The neurocognitive mechanisms
[12] B. Shneiderman, Creativity support tools, Commu- of convergent and divergent thinking, NeuroImage
nications of the ACM 45 (2002) 116–120. 210 (2020).
[13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, [22] D. Buschek, L. Mecke, F. Lehmann, H. Dang, Nine
I. Sutskever, Language models are unsupervised potential pitfalls when designing human-ai
comultitask learners, 2019. URL: https://cdn.openai. creative systems, 2021. URL: https://arxiv.org/abs/
com/better-language-models/language_models_ 2104.00358. doi:10.48550/ARXIV.2104.00358.
are_unsupervised_multitask_learners.pdf. [23] T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić,
[14] S. Black, S. Biderman, E. Hallahan, Q. Anthony, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon,
L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, M. Gallé, J. Tow, A. M. Rush, S. Biderman, A.
WebJ. Phang, M. Pieler, U. S. Prashanth, S. Purohit, son, P. S. Ammanamanchi, T. Wang, B. Sagot,
L. Reynolds, J. Tow, B. Wang, S. Weinbach, Gpt- N. Muennighof, d. Moral, O. Ruwase, R.
Bawneox-20b: An open-source autoregressive language den, S. Bekman, A. McMillan-Major, I. Beltagy,
model, arXiv.org (2022). URL: https://arxiv.org/abs/ H. Nguyen, L. Saulnier, S. Tan, P. O. Suarez, V. Sanh,
2204.06745. doi:10.48550/arXiv.2204.06745. H. Laurençon, Y. Jernite, J. Launay, M. Mitchell,
[15] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, C. Rafel, A. Gokaslan, A. Simhi, A. Soroa, A. F.</p>
        <p>J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, Aji, A. Alfassy, A. A. Rogers, A. K. Nitzav, C. Xu,
G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, C. Mou, C. Emezue, C. Klamm, C. Leong, v. Strien,
G. Krueger, T. Henighan, R. Child, A. Ramesh, D. I. Adelani, D. Radev, E. G. Ponferrada, E.
LevD. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, kovizh, E. Kim, E. B. Natan, D. Toni, G. Dupont,
E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, G. Kruszewski, G. Pistilli, H. Elsahar, H.
BenyamC. Berner, S. McCandlish, A. Radford, I. Sutskever, ina, H. Tran, I. Yu, I. Abdulmumin, I. Johnson,
D. Amodei, Language models are few-shot learners, I. Gonzalez-Dios, R. Javier, J. Chim, J. Dodge, J. Zhu,
2020. URL: https://arxiv.org/abs/2005.14165. J. Chang, J. Frohberg, J. Tobing, J. Bhattacharjee,
[16] P. Karimi, K. Grace, M. L. Maher, N. Davis, Evaluat- K. Almubarak, K. Chen, K. Lo, V. Werra, L. Weber,
ing creativity in computational co-creative systems, L. Phan, L. B. allal, L. Tanguy, M. Dey, M. R. Muñoz,
CoRR abs/1807.09886 (2018). URL: http://arxiv.org/ M. Masoud, M. Grandury, M. Šaško, M. Huang,
abs/1807.09886. arXiv:1807.09886. M. Coavoux, M. Singh, M. T.-J. Jiang, M. C. Vu,
[17] E. M. Bender, T. Gebru, A. McMillan-Major, M. A. Jauhar, M. Ghaleb, N. Subramani, N.
KassS. Shmitchell, On the dangers of stochastic par- ner, N. Khamis, O. Nguyen, O. Espejel, d. Gibert,
rots: Can language models be too big? , in: Pro- P. Villegas, P. Henderson, P. Colombo, P. Amuok,
ceedings of the 2021 ACM Conference on Fairness, Q. Lhoest, R. Harliman, R. Bommasani, R. L. López,
Accountability, and Transparency, FAccT ’21, As- R. Ribeiro, S. Osei, S. Pyysalo, S. Nagel, S. Bose,
sociation for Computing Machinery, New York, S. H. Muhammad, S. Sharma, S. Longpre, S. Nikpoor,
NY, USA, 2021, p. 610–623. URL: https://doi.org/ S. Silberberg, S. Pai, S. Zink, T. T. Torrent, T. Schick,
10.1145/3442188.3445922. doi:10.1145/3442188. T. Thrush, V. Danchev, V. Nikoulina, V. Laippala,
3445922. V. Lepercq, V. Prabhu, Z. Alyafeai, Z. Talat, A. Raja,
[18] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Heinzerling, C. Si, E. Salesky, S. J. Mielke, W. Y.</p>
        <p>B. Chess, R. Child, S. Gray, A. Radford, J. Wu, Lee, A. Sharma, A. Santilli, A. Chafin, A. Stiegler,
D. Amodei, Scaling laws for neural language D. Datta, E. Szczechla, G. Chhablani, H. Wang,
models, CoRR abs/2001.08361 (2020). URL: https: H. Pandey, H. Strobelt, J. A. Fries, J. Rozen, L. Gao,
//arxiv.org/abs/2001.08361. arXiv:2001.08361. L. Sutawika, B. M. Saiful, M. S. Al-shaibani, M.
Man[19] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, ica, N. Nayak, R. Teehan, S. Albanie, S. Shen, S.
BenC. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, David, S. H. Bach, T. Kim, T. Bers, T. Fevry, T. Neeraj,
S. Presser, C. Leahy, The pile: An 800gb U. Thakker, V. Raunak, X. Tang, Z.-X. Yong,
[35] WriteExpress, Rhymer, 2023. URL: https://www. [49] Z. Hu, R. K.-W. Lee, C. C. Aggarwal, A. Zhang, Text
rhymer.com/. style transfer: A review and experimental
evalua[36] Datamuse, Rhymezone rhyming dictionary and the- tion (2020). URL: https://arxiv.org/abs/2010.12742.</p>
        <p>saurus, 2023. URL: https://www.rhymezone.com/. doi:10.48550/ARXIV.2010.12742.
[37] Rytr, Rytr - best ai writer, content generator writing [50] R. Roberts, Kendrick lamar’s pulitzer prize
assistant, 2022. URL: https://rytr.me/. sparks lively — and at times snobby —
conver[38] M. A. Runco, Divergent thinking, creativity, and sations on the aesthetics of music, 2018. URL:
ideation. (2010). https://www.latimes.com/entertainment/music/
[39] C. Lewis, P. J. Lovatt, Breaking away from set la-et-ms-kendrick-pulitzer-reactions-20180420-story.
patterns of thinking: Improvisation and divergent html.
thinking, Thinking Skills and Creativity 9 (2013) [51] OpenAI, Openai api, 2021. URL: https://openai.com/
46–58. api/.
[40] M. A. Runco, S. Acar, Divergent thinking as an [52] Amazon, Alexatm 20b is now available in amazon
indicator of creative potential, Creativity research sagemaker jumpstart | amazon web services, 2022.
journal 24 (2012) 66–75. URL: https://tinyurl.com/amazonGPT.
[41] A. Cropley, In praise of convergent think- [53] HuggingFace, Gpt-neox, 2022. URL:
ing, Creativity Research Journal - CREATIV- https://huggingface.co/docs/transformers/main/
ITY RES J 18 (2006) 391–404. doi:10.1207/ en/model_doc/gpt_neox#overview.</p>
        <p>s15326934crj1803_13. [54] A. Komatsuzaki, Current limitations of
lan[42] A. T. Landau, C. J. Limb, The neuroscience of im- guage models: What you need is retrieval, 2020.
provisation, Music Educators Journal 103 (2017) 27– URL: https://www.researchgate.net/publication/
33. URL: https://doi.org/10.1177/0027432116687373. 344261335_Current_Limitations_of_Language_
doi:10.1177/0027432116687373. Models_What_You_Need_is_Retrieval.
[43] Studying the Impact of AI-based Inspiration on [55] F. Hill, K. Yuan, How instagram saved
poHuman Ideation in a Co-Creative Design Sys- etry: Social media is turning an art form
tem, 2021. URL: https://ceur-ws.org/Vol-2903/ into an industry, 2018. URL: https://www.</p>
        <p>IUI21WS-HAIGEN-7.pdf. theatlantic.com/technology/archive/2018/10/
[44] B. Shneiderman, Human-Centered AI, Oxford Uni- rupi-kaur-instagram-poet-entrepreneur/572746/.</p>
        <p>versity Press, 2022. [56] H. Oliver, Instagram is the future of
po[45] A. Joshi, S. Kale, S. Chandel, D. Pal, Lik- etry, 2021. URL: https://unherd.com/2021/10/
ert scale: Explored and explained, British instagram-is-the-future-of-poetry/.</p>
        <p>Journal of Applied Science Technology 7 [57] M. Schmidt, Lives of the Poets, Phoenix, 1999.
(2015) 396–403. URL: https://eclass.aspete. [58] E. Sheng, D. C. Uthus, Investigating societal biases
gr/modules/document/file.php/EPPAIK269/ in a poetry composition system, ACL Anthology
5a7cc366dd963113c6923ac4a73c3286ab22.pdf. (2020) 93–106. URL: https://aclanthology.org/2020.
doi:10.9734/bjast/2015/14975. gebnlp-1.9/.
[46] J. A. Olson, J. Nahas, D. Chmoulevitch, S. J. Crop- [59] P.-S. Huang, H. Zhang, R. Jiang, R. Stanforth,
per, M. E. Webb, Naming unrelated words predicts J. Welbl, J. Rae, V. Maini, D. Yogatama, P. Kohli,
creativity, Proceedings of the National Academy Reducing sentiment bias in language models via
of Sciences 118 (2021). URL: https://www.pnas.org/ counterfactual evaluation, 2019. URL: https://arxiv.
content/118/25/e2022340118. doi:10.1073/pnas. org/abs/1911.03064. doi:10.48550/ARXIV.1911.
2022340118. 03064.
[47] J. Ocumpaugh, M. Mercedes, T. Rodrigo, [60] A. K., M. P. Gangan, D. P., L. V. L., Towards an
EnK. Porayska-Pomsta, I. Olatunji, R. Luckin, hanced Understanding of Bias in Pre-trained Neural
Becoming better versed: Towards the design Language Models: A Survey with Special Emphasis
of a popular music-based rhyming game for on Afective Bias, Springer Nature, Singapore, 2022.
disadvantaged youths, Proceedings of the 26th In- [61] J. Lynch, Hip-hop passes rock to become most
popternational Conference on Computers in Education. ular music genre for first time in history: Nielsen,
Philippines: Asia-Pacific Society for Computers 2018. URL: https://www.businessinsider.com/
in Education (2018). URL: https://apsce.net/icce/ hip-hop-passes-rock-most-popular-music-genre-nielsen-2018-1?
icce2018/wp-content/uploads/2018/12/C6-04.pdf. r=US&amp;IR=T.
[48] H. Hirjee, D. Brown, Using automated rhyme de- [62] A. Texas, Hip-hop is the most listened to genre in
tection to characterize rhyming style in rap mu- the world, 2015. URL: https://www.nme.com/news/
sic, Empirical Musicology Review 5 (2010) 121–145. music/various-artists-1151-1214849.
doi:10.18061/1811/48548. [63] Wikipedia, Hip hop, 2021. URL: https:
//en.wikipedia.org/wiki/Hip_hop.
[64] T. Ingham, Nearly a third of all streams in
the us last year were of hip-hop and rnb
artists as rock beat pop to second, 2021. URL:
https://www.musicbusinessworldwide.com/
nearly-a-third-of-all-streams-in-the-us-last-year-were-of-hip-hop-and-rb-music/.
[65] E. Malmi, P. Takala, H. Toivonen, R. Tapani, A.
Gionis, Dopelearning: A computational approach to
rap lyrics generation *, KDD ’16: Proceedings of
the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (2016).</p>
        <p>doi:10.1145/2939672.2939679.
[66] J. Eastwood, E. Hinton, We wrote an algorithm to
unravel the rhymes of hit musical ‘hamilton’, 2016.</p>
        <p>URL: http://graphics.wsj.com/hamilton/.
[67] C. D, Fight the power: How hip hop changed
the world, ???? URL: https://www.bbc.co.uk/
programmes/p0dj70yd.
[68] N. Condit-Schultz, MCFlow: A Digital Corpus
of Rap Flow, Ph.D. thesis, 2016. URL: https://etd.
ohiolink.edu/apexprod/rws_etd/send_file/send?
accession=osu1461250949&amp;disposition=inline.
[69] J. Eastwood, E. Hinton, How wsj used an algorithm
to analyze ‘hamilton’ the musical, 2016. URL: http:
//graphics.wsj.com/hamilton-methodology/.
[70] A Small-Data Mindset for Generative AI Creative</p>
        <p>Work, 2022.
[71] Musixmatch developer api, 2023. URL: https://</p>
        <p>developer.musixmatch.com/.
[72] S. Presser, Gpt-2 neural network poetry, 2019. URL:</p>
        <p>https://www.gwern.net/GPT-2.
[73] S. Mcgregor, K. Agres, M. Purver, G. Wiggins, From
distributional semantics to conceptual spaces: A
novel computational method for concept creation,
Journal of Artificial General Intelligence 6 (2015)
55–86. doi:10.1515/jagi-2015-0004.
[74] D. Yang, Y. Zhou, Z. Zhang, T. Jia, J. Li, R. Lc, Ai as an
activewriter: Interaction strategies with generated
text in human-ai collaborative fiction writing, 2019.</p>
        <p>URL: https://ceur-ws.org/Vol-3124/paper6.pdf .
[75] E. Wassiliwizky, S. Koelsch, V. Wagner, T. Jacobsen,</p>
        <p>W. Menninghaus, The emotional power of poetry:
neural circuitry, psychophysiology and
compositional principles, Social Cognitive and Afective
Neuroscience 12 (2017) 1229–1240. doi:10.1093/
scan/nsx069.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Csikszentmihalyi</surname>
          </string-name>
          ,
          <article-title>Creativity : the psychology of discovery and invention</article-title>
          ,
          <source>Harper Perennial Modern Classics</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Boden</surname>
          </string-name>
          , Creativity in a nutshell,
          <source>Think</source>
          <volume>5</volume>
          (
          <year>2009</year>
          )
          <fpage>83</fpage>
          -
          <lpage>96</lpage>
          . doi:
          <volume>10</volume>
          .1017/S147717560000230X.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Guilford</surname>
          </string-name>
          ,
          <article-title>The nature of human intelligence</article-title>
          . (
          <year>1967</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Boden</surname>
          </string-name>
          ,
          <article-title>The creative mind : myths and mechanisms</article-title>
          , Routledge,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Calderwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Chilton</surname>
          </string-name>
          ,
          <article-title>How novelists use generative language models: An exploratory user study</article-title>
          ,
          <source>in: HAIGEN+user2agent@IUI</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          , Y.-h. Sung,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lukacs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Miklos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          ,
          <article-title>Eficient natural language response suggestion for smart reply</article-title>
          , arXiv.org (
          <year>2017</year>
          ). URL: https://arxiv. org/abs/1705.00652. doi:
          <volume>10</volume>
          .48550/arXiv.1705. 00652.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gonçalo Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boavida</surname>
          </string-name>
          ,
          <article-title>Copoetryme: a co-creative interface for the composition of poetry</article-title>
          ,
          <source>Proceedings of the 10th International Conference on Natural Language Generation</source>
          (
          <year>2017</year>
          ). URL: https://aclanthology.org/W17-3508/. doi:
          <volume>10</volume>
          .18653/v1/w17-
          <fpage>3508</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Markert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buschek</surname>
          </string-name>
          ,
          <article-title>Suggestion lists vs. continuous generation: Interaction design for writing with generative models on mobile devices afect text length, wording and perceived authorship</article-title>
          ,
          <source>in: Proceedings of Mensch Und Computer</source>
          <year>2022</year>
          , MuC '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>192</fpage>
          -
          <lpage>208</lpage>
          . URL: https://doi.org/10.1145/3543758. 3543947. doi:
          <volume>10</volume>
          .1145/3543758.3543947.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Volzer</surname>
          </string-name>
          , N. Madrid,
          <article-title>Generative models can help writers without writing for them</article-title>
          ,
          <source>in: Joint Proceedings of the ACM IUI 2021</source>
          Work-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>