=Paper=
{{Paper
|id=Vol-3359/paper8
|storemode=property
|title=Why try to build try to build a co-creative poetry system that makes people feel that they have "creative superpowers"?
|pdfUrl=https://ceur-ws.org/Vol-3359/paper8.pdf
|volume=Vol-3359
|authors=Ibukun Olatunji
|dblpUrl=https://dblp.org/rec/conf/iui/Olatunji23
}}
==Why try to build try to build a co-creative poetry system that makes people feel that they have "creative superpowers"?==
Why try to build try to build a co-creative poetry system that
makes people feel that they have “creative superpowers”?⋆
Ibukun Olatunji*
Computational Foundry, Swansea University, Crymlyn Burrows, Skewen, Swansea, United Kingdom, SA1 8EN
Abstract
The paper examines co-creative writing systems, and argues that existing Large Language Models could potentially reduce
human capacity. Furthermore, existing sociocultural inequalities might be exacerbated by the widespread adoption of such
generative systems. The paper instead suggests a custom approach, using co-creative poetry writing as an example. The
system has architectural changes from typical language models to better support poetry. It also uses rap lyrics as part of the
training data in order to help reduce sociocultural bias. A high level system implementation is proposed along with some
evaluation methods. Evaluation is based on expert judgement on final outputs, and user performance on language tasks
associated with human creativity. The final section of the paper explores how and why alternatives to existing co-creative
systems could benefit individual users as well as wider society.
Keywords
Creativity, poetry, co-creativity, natural language processing, language models, writing support tools, data sets,
1. Introduction discussion of the social and cultural limitations of current
generative systems. It expands on section one in explor-
This paper examines co-creative systems using poetry ing bias and proposes a mitigation through the use of rap
writing as an example. Within the paper ’poetry’ includes lyrics. Section five describes the theoretical and practical
song lyrics. Section one of the paper explores poetry in limitations of the paper as well as future work. Section
terms of human creativity. Poetry is chosen as it is a six provides a summary of the paper’s contribution. The
creative task that non-expert humans can outperform section ends with answers to the question: why try to
machines on vs creative outputs such as image genera- build try to build a co-creative poetry system that makes
tion. After introducing the case for poetry, there is an people feel that they have “creative superpowers”?
exploration of recent work in generative computational
systems. As well as being the technical state of the art,
these systems provide a conceptual framework to explore
1.1. Human Creativity
sociocultural issues such as bias and inclusion. Section Human creativity is the ability to come up with ideas or
one then explores a range of poetry-specific systems and artefacts that are new, surprising, and valuable. Rather
ends with a more detailed case study. The case study than a solitary act, it results from the interaction of social
examines a system that combines elements of more pow- elements; a culture that contains symbolic rules, a person
erful general models and custom architectural features who brings novelty into the symbolic domain, and peo-
specific to poetry writing. Section two details the eval- ple who recognise and validate the innovation. [1, 2, 3].
uation issues and methods that might be employed for Boden makes a further distinction between psychological
the proposed co-creative system. The emphasis on this and historical creativity (P-creativity and H-creativity).
section is on how to evaluate human improvement over P-creativity involves coming up with an idea that’s new
time. Section three explores a high level implementation to the person who comes up with it. H-creativity means
of the system. It builds on the evaluation to propose that (so far as we know) no-one else has had it before:
both an architecture and a method to testing if the pro- it has arisen for the first time in human history [2, 4].
posed system has, in principal, any benefits over and Machine learning models have the potential to support
above those described in section one. Section four is a human creativity [5, 6, 7]. However, questions remain on
their design and influence in augmenting human capacity
Joint Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, as opposed to reducing it [8, 9, 10]. Shneiderman sug-
Australia
⋆
Ibukun Olatunji. 2023. Why try to build a co-creative poetry system
gests that "researchers’ goals shape the questions they
that makes people feel that they have “creative superpowers”? raise, collaborators they choose, methods they use, and
In Joint Proceedings of the ACM IUI 2023 Workshops. Sydney, outcomes of their work."[11]. This leads to the question:
Australia, 13 pages. how can designers of programming interfaces, interactive
*
Corresponding author. tools, and rich social environments enable more people
$ i.o.olatunji.2030349@swansea.ac.uk (I. Olatunji)
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License to be more creative more often? [12]
Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
Language Model Characteristics
Table 1
Summary of State-of-the-Art Language Models by Size, Model Type and Ownership
Model Name Parameters Model Type Owner
BERT 110 - 340 million Transformer Google
GPT-2 1.5 billion Transformer OpenAI
LaMDA 137 billion Transformer Google
GPT-3 175 billion Transformer OpenAI
ChatGPT/InstructGPT 175 billion Transformer OpenAI
BLOOM 176 billion Transformer BLOOM Project
Megatron-Turing NLG 530 billion Transformer Microsoft and NVIDIA
PaLM 540 billion Transformer Google
GLaM 1 trillion Mixture of Experts Google
1.2. Computational Systems 1.3. General Purpose Language
In computational terms, automated systems are now Generation
capable of writing poetry approaching human levels LLMs are trained to predict the next word, or series of
[13, 14, 15]. Karimi et al consider three three main strate- words, in a a text sequence. They model text corpora as
gies by which the role of humans in creative systems can probability distributions. Users write a short text prompts
be characterized: fully autonomous systems, creativity to tell the system what to generate. Depending on how
support tools, and co-creative systems [16]. Although many examples are provided in the text prompt, the sys-
the paper is primarily concerned with co-creative sys- tem is referred to as zero-, one-, and few-shot learning
tems, it will to blend the categories where necessary. The [13, 15, 17]. Pretrained language models have become
reasoning for this is that the human users do not make a cornerstone of modern natural language processing
the same distinctions; also, the features and usage are of- (NLP) pipelines because they often produce better per-
ten blended in the real-world, e.g an autonomous system formance from smaller quantities of labeled data [23].
that is used by a creator as an input and thus becomes a Within general LLMs, the transformer has established
support tool and/or co-creative system [10]. The next sec- itself as best performing on benchmark language process-
tion briefly outlines the state of the art in computational ing tests [13, 15, 24]. As well as being able to perform
writing systems. tasks such as text summarising and question answer-
Language models (LMs) refer to systems that are ing, LLMs have the potential to support creative writing
trained on string prediction tasks: predicting the like- [6, 8, 9]. Current state-of-the-art LLMs are summarized
lihood of a token (character, word or string) given either in table 1. However, despite impressive technical achieve-
the preceding context or its surrounding context. Such ments, LLMs have limitations including: (a) models, as
systems are unsupervised and when deployed, take text they scale, might eventually run into the limits of any pre-
as input, and output scores or string predictions [17]. training objectives; (b) the models are expensive and dif-
Large Language Models (LLMs) trained on sufficiently ficult to perform inference on; (c) model decisions are not
large and diverse data sets are able to perform well across easily interpretable; (d) the majority of the research com-
domains and there is a correlation between model per- munity, and by extension disadvantaged social groups,
formance and size [18]. State-of-the-art models are able have been excluded from the development of LLMs as
to generate text that approach or surpasses that of some they are proprietary (see table 1) and, (e) most LLMs are
humans[13, 14, 15, 19]. The emphasis on some humans primarily trained on English-language text that contains
is an important with respect to user characteristics; in data biases [13].
broad terms, humans co-creating poetry can be consid-
ered as either inexperienced or advanced users. Research
on creative tasks such as improvisation suggests that
1.4. Poetry Specific Language Generation
users vary in cognitive processing based in part on their Creating poetry is creative skill that requires extensive
experience and skills levels[20, 21]. A well-designed co- vocabulary, phonemic awareness to produce complex
creative system should therefore take differences in user rhyme patterns, and general knowledge of enough sub-
support needs into account [8, 9, 22]. jects about the world to be able to tell interesting stories
about a range of topics [20, 25, 26, 27].
Poetry Creation Systems
Table 2
An Overview of Selected Poetry Writing Tools by Type
Type Example Key Features Constraints
Autonomous ChatGPT Natural language input Plain text output
Generates poems and lyrics Customisation by text prompt
Autonomous co:here Natural language input Plain text output
Generates poems and lyrics High latency
Autonomous Rytr UI has song lyric option Uses GPT-3 models
Extensive text processing Not trained on song data
Support RhymeZone Rhyming dictionary/thesaurus Single word only
Generates rhyme suggestions Cannot be used to write text
Support Rhymer Rhyming dictionary Single word only
Generates range of word types Cannot be used to write text
Support Poetry Foundation Poetry archives and tutorials No support for real-time creation
Guides user to external resources No user customisation options
Co-creativity Poem Generator Customise inputs to create poem Input variables fixed
Variety of formal poetic outputs Limited user interaction or feedback
Co-creativity DeepBeat Generates and/or suggests lyrics Confusing user interface
Displays sources of lyric inspiration Unoriginal output vs GPT-3 models
Co-creativity Verse by Verse Suggests stanzas in style of known poets Limited forms of poetry
Language model accounts for bias Trained on selected U.S poets
1.5. Overview of Poetry Support Tools features that allow it to operate as both a co-creative and
autonomous system [37]. Having looked at the computa-
Historically, poetry creation systems tended to built on
tional systems, it is instructive to briefly consider poetry
the model of the an AI writing a full poem by itself, thus
writing from a human perspective. It will help inform
writing in a closed system [28, 29, 30, 31]. Early sys-
the design of a new poetry writing system.
tems tended to be rule-based [32]. More recently, some
Writing poetry requires a range of general creative
approaches have started to explore human interaction
skills that can be framed in terms of divergent and conver-
when composing poems[33, 34]. Table 2 provides a broad
gent thinking; these are used in varying ways throughout
summary of selected systems including autonomous, sup-
a multi-stage writing process. For simplicity, the stages
port tools and co-creative as defined by Karimi et al [16].
include (a) exploration which is characterised by diver-
The category distinction helps frame a range of (human)
gent thinking [21, 38, 39, 40]; (b) focused work is uses
creative processes and (technology) interactions. It is
convergent thinking [21, 41] and, (c) re-drafting. It is
also a useful way to consider ways in which the pro-
useful in the stages to distinguish between internal and
posed system is different to those that currently exist;
external co-creation system activities. Internal is when
and as importantly, ways in which it is similar. At a high
the user interacts with the system in real-time, e.g writing
level, the autonomous systems are designed to be able
or redrafting text; external is when the user participates
to create finished works (sometimes called ’products’ or
in activities such as browsing, reading or other things
’artefacts’). The support tools are used as part of the
that do not not use the system. The framing of internal
creative workflow. For instance, RhymZone or Rhymer
and external system activities is based on the reasoning
help a user find words that sound similar to those they
that; (a) skill: inexperienced users are unlikely to possess
might use in a poem [35, 36]. Co-creative systems facili-
the improvisational skill required to create full poems in
tate humans and computational systems to make shared
real-time due to cognitive processing constraints [20, 42];
products. That said, the distinction is not fixed. For
(b) speed: users might choose to write poems over mul-
example, Rytr, contains text editing, display and other
tiple sessions, in this case external system stimuli could inform their own poetic development.
have supported the writing; (c) knowledge: advanced
writers are usually familiar with a body of existing that
informs their work [1] and, (d) process: reflecting and 2. Experiment Design
redrafting is an important part of writing . The reflecting
Verse by Verse ran comparative evaluations of the system
stage often takes place separately to the creation of the
against poems written by classic poets. Although the sys-
work itself [1, 10].
tem was intended to be used as an interactive co-creator
for the human writing a poem, the author’s stated it was
1.6. Case Study: Verse by Verse still worth evaluating how the system could perform on
its own in writing a poem given a first line of verse [34].
This approach has been adopted within the proposed sys-
tem experimental design, implementation and evaluation.
The next subsection explores evaluation prior to looking
at implementation. The rationale is that the evaluation is
perhaps a harder problem as it involves an intersection
of multiple disciplines (e.g. computational sciences, arts,
linguistics, and pedagogy). Implementation can mostly
be restricted to computational science domains.
Figure 1: Google’s Verse by Verse: users select from a range of
2.1. Evaluation Overview
US poets and custom design a poem by choosing from features Evaluating co-creative systems is still an open research
including the number of syllables per line and the number of question and there is no standard metric for measuring
stanzas.
computational co-creativity [16, 43]. Karimi et al describe
Screenshot from Verse by Verse application by Google the limited research investigating how co-creative sys-
tems can be evaluated. They present four questions as a
Google Research Verse by Verse is relevant case study way to compare how (existing) co-creative systems eval-
as it is arguably the most technically advanced poetry- uate creativity: who is evaluating the creativity, what is
specific generative system. As well as using transformer being evaluated, when does evaluation occur ,and how
model architecture, it also uses informational retrieval, the evaluation is performed [16]. Calderwood et al point
and considers bias within its design. Verse by Verse aug- out that "writers engaged with co-creative systems are
ments user poetry composition by offering suggestions looking for creative insight, something not measured by
to a user as they compose a poem. The authors of the perplexity or by a language model’s ability to solve the
system argue that relative to a creating full poems, "this canonical downstream NLP tasks [5]. For the evaluation
is a much more challenging task, as one needs be able of the system proposed to be effective it is insightful to
to offer suggestions with minimal latency while meeting restate its goals in more detail. The co-creative poetry
constraints of the poem structure and handle the chal- system’s goal is “making people feel that they have “cre-
lenges of user input[34]. Figure 1 shows part of the sys- ative superpowers”? To achieve this, the system supports
tem’s user interface (for PC). From a user’s point of view, users to create better poetry than they might otherwise
the experience is as follows (a) the user selects poet(s) to have done without the system. The terms supports and
inform the suggestions; (b) the user designs poem struc- better will be further explored as they form the basis of
ture as illustrated in figure 2; (c) the user writes the first evaluation.
line of text and, (d) the system offers suggestions in the Augmenting human users is central to HCAI and a
style of the poet(s) the user selected earlier. The user can contrast to a closed model that creates on behalf of the
then work with, modify or have the system create new user [8, 34, 44]. This point is made in recent work that
verses. The Verse by Verse design has an external system refers to pitfalls when designing human-AI co-creative
context that, in general , LLMs do not. To some extent, systems, as well as other work which asserts that gener-
the system helps poetry writers become better readers. ative models can help writers without writing for them
In his work on creativity it was suggested to Csikszentmi- [5, 9, 22]. The arguments these, and similar work, make is
halyi that "the only way you become a poet...is because that too much automated creation can be at the expense
you’ve read a poem...poetry depends on the whole po- of human users [9, 22]. Adopting this thinking, it is use-
etic tradition of the past...you have to decide...out of all ful to evaluate the system and its users independently,
that previous poetry, what is most interesting to me?" as well as in combination. This in theory allows (system)
[1] Verse by Verse, by making users aware of the work internal and (human) internal and external measurement.
of other poets, helps users become readers in order to
The end goal here is that human users develops their be given to users (inexperienced and advanced
capacity; this could be external to the system, whereby ) with the same constraints as the system in
the system as acted as a creative prompt. A description of terms of keywords, topics, character limits etc.
how this could work in principle follows. A later section The evaluation for experiment A is by humans
describes system implementation. who judge the quality of the poems (which
are anonymous) by a Likert scale and free text
2.2. Process and Objectives summary.
The system would run a number of experiments with the 2. Hypothesis-B that poetry specific language
purpose of establishing which system components most generation customised for a given user could
support users to write “better” poetry; in goal terms, outperform vanilla poetry specific generation
better is evaluated (a) subjectively by users via a Likert with respect to creating poems. Experiment B:
scale [45] and (b) by performance on related tasks such as each system state generates complete poetic
the Divergent Action Task, Bridge-the-Associative-Gap text but some states are pretrained to customise
Task, or rhyme creation and identification [46, 47] characteristics with respect to given users and
The tasks would be completed external to the system. their poetic styles. The evaluation for experiment
The goals of the evaluation are to measure to what B is by humans who judge the quality of the
extent users are actually improving their poetry writing poems by a Likert scale and free text summary.
abilities, and the degree to which any improvement The evaluation is focused on how well the poems
is as a result of internal system features. For a user, represent the given users’ individual style.
improvement is concerned with "the writer’s goals or
their desire to have an individual voice" [9]. With this 3. Hypothesis-C that external recommendations, full
as a basis, the evaluation process takes the form of a or part poems, based on given user characteristics
number of hypotheses and related experiments, the are supportive with respect to users writing their
purpose of which is to explore; (a) how well general poems. Experiment C: for given users generated
vs poetry specific language models can write full poetic text inputs, the system state generates (ex-
poems; (b) if poetry specific language models can ternal to system) poetic text recommendations
better represent individual users style than generalised that the user reads and reflects on before complet-
language models; and, (c) the extent to which users ing their poem. The evaluation for experiment
benefit when writing poems from system recommenda- C is by humans who judge how well the poem
tions. The hypotheses and experiments are concerned recommendations helped them write poems in
with poetic text style which describes the ways (an the theme, topic or style they were attempting to
author) uses language, including prosody, word choice, achieve.
sentence structure and use of figurative language [48, 49].
The approach described provides a sense of how user
activities (internal and external) with respect to the sys-
A central challenge for the proposed system is that
tem can be evaluated. In practice, more fine-grained
the development and attainment of an individual poetic
evaluation criteria would be required based on further
voice is highly subjective. Beyond subjectivity, poetry is
research and operational or implementation design; as
from a societal perspective often a question of cultural
far as possible, a complete system would have an aware-
value which over time may well change. In reference
ness of all relevant evaluation data including for instance,
to Kendrick Lamar’s 2018 Pulitzer Prize, a first for a rap
external system reading of poems. At this stage, the
album, their administrator of prizes said, "..this is not a
evaluation proposed is limited to the extent necessary in
genre we’ve seen celebrated before, so that in that sense
order to support the explanation of how and why the sys-
it’s historical." [50] Furthermore, as Boden states, "...even
tem might work. A later section (Limitations and Future
in science, values are often elusive and sometimes change-
Work) will the explore the limitations as suggest possible
able...because values are highly variable, it follows that
remedies.
many arguments about creativity are rooted in disagree-
ments about value. This applies to human activities no
less than to computer performance." [2] 3. Proposed Implementation
1. Hypothesis-A that poetry specific language The system would have a number of states that range
generation could outperform general language from full automation to text prompts acting as a starting
generation with respect to creating poems. point for the user. The support states envisaged are:
Experiment A: each system-state generates
1. State-A: general language system implemented as
complete poetic texts. The prompts would also
standard.
2. State-B: general language system implemented 3.1. Sparse And Dense Network Model
with modified architecture to include user gener-
The system (figure 2) operates as a Sparse And Dense
ated content within training set and/or network
Network (SPAD). The name refers to the system being
architecture preferences.
sparse with respect to user input tokens as compared to
3. State-C: poetry specific system implemented with
tokens contained in the LM/LLMs. Against this, the sys-
standard architecture.
tem is dense in terms of leveraging transformer models
4. State-D: poetry specific system implemented with and their associated attention layers (table 1). The intu-
modified architecture to include user generated ition is to use a small amount of personalised user text to
content within training set and/or network archi- attempt to customise the output of powerful LMs/LLMs.
tecture preferences. This differs from existing approaches in the following
The LLM component of the system would use publicly ways.
available APIs an, where possible, modify network archi-
• State-of-the-Art LLMs form part of the SPAD in
tecture directly where possible [51, 52, 53]. In most cases
order to help improve the SPADs performance; in
(table 1) LLM are closed black box systems as illustrated
other words, the LLMs are source of input train-
in (figure 3). In part for this reason, ideally a custom
ing data and as such multiple LLMs could in the-
poetry and lyric language model would be implemented;
ory be included in the SPAD architectural design.
aside from practicalities (which will be discussed) there
is a a technical challenge in that a poetry and lyric LM • A poetry specific LLM (GPT-NeoX) forms part of
would be far smaller than a general LLM. Given the re- the design; poetry specific refers to adaptations
search on LLM size and performance, a custom poetry to the underlying model architecture in order
and lyric LM would in theory therefore under perform that token processing and output is more optimal
against state-of- the-art LLMs [18, 54, 15]. In line with a with respect to poetry than prose. An example
recent study, which experimented with user experiences of this might be applying additional linguistic
of language models, the system could be implemented layers within the network to favour text strings
with a combination of JavaScript, React, Python and Flask with syllable frequencies found more regularly
[8]. The system would then be deployed as a web appli- in poems than say news articles or web pages.
cation for mobile phones. Mobile is preferred to PC on Although architecture is referred to, much of any
the basis of its greater reach as a device for both reading benefit at this stage might come from modifying
and creating contemporary poetry [55, 56]. the training data and associated recipes. The po-
etry specific LLM would also leverage data from
the general LLM (for simplicity any interaction
between the two elements is not included in fig-
ure 2).
• Poetry and lyric LM is a custom model whose
network architecture and training data is specific
to poetry. In practical terms it is not a LLM as
the available training data is not likely to be suf-
ficiently extensive vs the current state of the art.
As well as providing a data contrast to the LLMs,
this part of the network will also act as a style
Figure 2: SParse And Dense Network Model Elements transfer layer in so far as it identifies and tries to
modify input text to create poetic styles. These
1. Text input by user is returned as partially completed poetic styles will be mapped onto user styles upstream
text and/or poetic and lyrical recommendations for the user within the system.
to consider. 2. User personalised data submitted as poems or
lyrics and/or recommendations of favourite artists and their The result of the models described above, is a system
work. These are used to create a corpus of user text. Prior
that contains information on generalized poetic style
examples of user generated text uploaded to system;
as well as individual style preference(s) unique for each
recommender and/or database search to enhance user text
with additional poetic texts (e.g from web crawl) 3. Database user. This allows the system to support users with spe-
of poetic texts (and song lyrics) from web crawl. Clean text is cific co-writing tasks (e.g text generation) as well as offer
included as well as metadata such as rhyme scheme and personalised recommendations further reading of rele-
Parts of Speech (PoS). vant poems and/or poets. In user experience terms, this
might be delivered via an interface that allows the user
to switch between (a) writing text; (b) editing generated
text; and (c) reading and reflecting on specific poetic racist, sexist or ableist) [17]. Studies have how that harms
recommendations made by the system. can also exist because of (a) exclusionary social norms
At this stage, the proposed mode is high-level. There in language within language. For example, ‘family’ is
are open questions relating to issues such as real world often defined as a basic social unit consisting of a mar-
implementation, customisation of user text, acquisition ried woman, man and their children; language models
of training data and other areas. The penultimate section internalizing such social norms could be highly discrim-
will revisit some of the open design questions and attempt inatory towards people outside this definition [60]; (b)
to provide answers. The next section explores the soical greater propensity to label of language of marginalized
significance of poetry and how the a system design could or underrepresented groups as toxic in hate speech detec-
use this to enhance cultural inclusiveness. tion (e.g. the ‘angry black woman’ stereotype) [60]; and
(c) over representation of certain groups such as white
males 18-34 within widely used training data (e.g Red-
4. Discussion dit posts) [17]. Bender et al assert that, “in the case of
US and UK English...white supremacist and misogynistic,
An important goal for poetry is for each writer to discover
ageist, etc. views are over represented in the training
or develop their own unique style, or artistic voice. Part
data, not only exceeding their prevalence in the general
of a writers development will a result of what poetry they
population.” [17]. The authors go on to say that the data
have previously been exposed to. Robert Graves stated
underpinning LMs stands to “misrepresent social move-
that, “only a poet of experience...can hope to put himself
ments and disproportionately align with existing regimes
in the shoes of his predecessors, or contemporaries, and
of power.”
judge their poems by recreating technical or emotional
There are a number of studies that explore bias mit-
dilemmas which they faced while at work on them." [57]
igation through computational techniques such as (a)
It can be argued that this statement is, in contemporary
augmentation of the training data using style transfer
terms, biased in gender terms given the assumption of
[58] or (b) using counterfactuals to reduce sentiment bias
‘poet’ being male. Graves’s central argument about expe-
[59]. However, in their study describing GPT-3 the au-
rience however is echoed in recent studies on language
thors caution against on over reliance on computational
models. A study by Cheng and Uthus made the point
solutions. They instead ask for “...more research that en-
that “as creative works are often shaped by the lived ex-
gages with the literature outside NLP, better articulates
periences and timely issues of the creator’s life, a poetry
normative statements about harm, and engages with the
composition system trained on poems from different au-
lived experience of communities affected by NLP sys-
thors of different eras may reflect a variety of societal
tems...mitigation work should not be approached purely
biases." [58] Within computer science, social bias is a sub-
with a metric driven objective to ‘remove’ bias...but in
ject gathering more research attention [17, 59, 60] How-
a holistic manner [15]. For the use case of a poetry co-
ever, as well as attempting to mitigate negative impacts
creation system, bias could be potentially mitigated by
for disadvantaged groups, considering bias also offers
including rap lyrics as a key part of the training data set.
possibility of designing systems that leverage cultural,
poetic and linguistic resources that would otherwise be
missed. This can benefit all user groups. The next section 4.2. Towards Culturally Responsive
provides a more concrete example. Models
Emerging from a hobby of African American youth in the
4.1. Bias in Language Models 1970s, rap (as an element of hip-hop) has quickly evolved
It has been recognised and accepted in recent years that into a mainstream culture and is the most popular music
LLM used for text generation contain bias [17, 60] A genre in the U.S and many other territories [61, 62, 63, 64].
study by Uthus suggests that “biases in creative language Writing rap lyrics requires both creativity to construct
applications are under explored”; it goes on to say it is im- a meaningful, interesting story and lyrical skills to pro-
portant to examine biases in these applications because duce complex rhyme patterns [26, 48, 65]; within the
they intended for contexts such as self-expression, collec- culture of rap, writers are evaluated by peers on the ba-
tive social enjoyment, and education [58]. One of the key sis of their wordplay, linguistic complexity and ability
sources of bias in LLM is in the training data sets. LLM to use multiple rhyme types (perfect and imperfect) as
retains the biases of the data they have been trained on well as multi-syllabic rhymes [26, 66]. In many ways,
[15]. Typically the model’s pick up on, or reflect, biases the writer within the hip-hop tradition sets language
and overtly abusive language patterns in training data. puzzles for their audience. In a recent BBC documen-
This can lead to harms for some users such as encounter- tary, Chuck D, the founder of Public Enemy remarked
ing derogatory language or discriminatory language (e.g. that "poets were always...going to give you everything
the truth...that’s very important not only in the realm of
hip hop...but in the realm of artistry.” [67] Recent com- • Customizing models for individuals: this is a sys-
putational studies have explored rap on account it its tem objective but has not been tested. Technically,
complexity and cultural significance [65, 68, 69]. Rap there is a conflict between the scale and perfor-
has historically been excluded from most mainstream mance benefits of LLM/LM and the comparatively
discussions on co-creative systems and poetry writing. small datasets of individual users. However, as
There may well be valid reasons for this such as language Vigliensoni et al argue, working with small-scale
appropriateness, perception around negative sentiments, datasets is an overlooked but powerful mecha-
offensive content, and difficulties in accessing material nism for enabling greater human influence over
under copyright. However, although there are challenges, generative AI systems within in creative con-
the benefits of using extensive rap lyrics within LM data texts [70]. The authors describe an experimen-
sets include: tal project, ReRites by Johnston which involved
fine-tuning GPT-2 on the artists’ custom poetry
• Training data that represents wider audience con- corpus to generate poems. An approach such as
cerns, thoughts and feelings. this could be taken although clearly using models
• Training data will be dynamic and reflect contem- such as GPT-2 (for which source code is avail-
porary sociopolitical issues. able) has the limitation of performance vs current
• Opens up the possibility of bring voices from ex- state-of-the art LLMs. The personalizing of LLMs
cluded communities into the NLP community. to individual users is an open topic that requires
• LMs would be enhanced by a linguistically rich further research.
and varied source of data..
• Allows lyrics to be part of a wider conversation • Acquiring training data: training data for poetry
which potentially generates. new research in- and rap lyrics would not be readily available in
sight (for computational, language and social re- the way that the Pile or equivalents are used for
searchers). general LLMs [19]. The solution to this would be
to source data from scraping the web for lyrics, or
Ultimately, as contemporary music’s biggest genre, directly from services such as MusixMatch [71]
and the one most concerned with rhyme and wordplay, Poetry training data, much of which will be out
there are multiple reasons to explore using rap lyrics as of copyright, can be acquired via sites such as
training data. Project Gutenberg and Poetry Foundation. This
approach to training data was used in a 2019 ex-
periment to create a poetry-specific LLM based
5. Limitations and Future Work on the GPT-2 model [72].
The paper has a number of limitations. Below some of Evaluation: literature on evaluating the creativity in a
these are described along with suggested directions for co-creative systems considers a wide number of factors
future work. System Design and Implementation: the such who evaluates the creativity (e.g. system itself or hu-
paper does not fully explore how the proposed system man users), what is being evaluated (e.g. user interaction
could be built. In particular, there are challenges around or output), when does evaluation occur (e.g in real time
the following: or at the end of a session) and how the evaluation is per-
• Building custom LLMs. One of the design lim- formed (e.g. methods and related metrics) [16]. There is
itations is how to effectively experiment with a broad set of metrics for developing computational mod-
models of varying degrees of openness (for con- els for evaluating creativity. With respect to the system
venience referred to as black, grey and white box). described, the most relevant include a proposed compu-
For black box models (e.g GPT-3) there is no way tational model by Agres et al. The model reflects human
at present to modify the architecture. What in- conceptualization of musical and poetic creativity [73].
stead might be possible is to fine-tune the model Future work could explore the kind of model described
via custom queries over a period of time. So, alongside other linguistic-based metrics such as the Di-
what combinations of prompts generate the most vergent Action Task, Bridge-the-Associative-Gap Task, or
favourable outputs. Grey box models (BLOOM rhyme creation and identification tasks. [46, 47] Addi-
or GPT-NeoX) offer the possibility of powerful tionally, building on machine learning practices, metrics
models with open-source training and evaluation could be derived for accuracy in terms of the degree to
code plus model weights [53]. However, the costs which generated output matches a reference dataset. For
of running and/or adapting these models could example, if the user has a target poetic style, it might
be substantial and not something the paper has be possible to computationally determine the extent to
explored. which the completed poem was accurate or not. The
paper has not explored these kinds of evaluation in de- part of the most popular music genre. Poetry matters to
tail and they would form part of future work. Finally, society. By extension, it is worth building system that
though the evaluations proposed are limited, they could can help people experience it firsthand and connect with
nevertheless contribute to the wider discussion around its traditions. The aim though should not be to make
the topic. As Karimi et al assert "evaluating co-creative people feel they have "creative superpowers"; instead a
systems is still an open research question and there is no system should support people to actually build "creative
standard metric that can be used across specific systems." superpowers".
[16].
7. Acknowledgements
6. Conclusion
This work was supported by the Engineering and Phys-
Artistic creativity is a process, in which an initial im- ical Sciences Research Council. The author would also
provisational phase is followed by a period of focused like to acknowledge the support of Swansea Council.
re-evaluation and revision [20]. Spontaneous improvisa-
tion is a complex cognitive process that shares features
with what has been characterized as a ‘flow’ state [1, 20]. References
Much current work on co-creative settings focuses on the
[1] M. Csikszentmihalyi, Creativity : the psychology of
role of the system as a generator that augments what peo-
discovery and invention, Harper Perennial Modern
ple can achieve in creative tasks [9]. There are problems
Classics, 2013.
with this such aligning the system capabilities and user
[2] M. Boden, Creativity in a nutshell, Think 5 (2009)
expectations, language model bias, system interpretabil-
83–96. doi:10.1017/S147717560000230X.
ity, and user interaction design [8, 22, 74]. Studies have
[3] J. P. Guilford, The nature of human intelligence.
found that different mental expectation of users affects
(1967).
their strategies and perception of the system role in the
[4] M. A. Boden, The creative mind : myths and mech-
co-writing process [9, 74].
anisms, Routledge, 2005.
This position paper explored the recent background
[5] A. Calderwood, V. Qiu, K. Gero, L. B. Chilton,
to co-creative writing systems, with poetry as a use case.
How novelists use generative language mod-
Poetry was defined as including song lyrics for which
els: An exploratory user study, in: HAI-
the paper argued that rap was the most relevant genre.
GEN+user2agent@IUI, 2020.
The paper then proposed a system that, as far as the
[6] M. Henderson, R. Al-Rfou, B. Strope, Y.-h. Sung,
author is aware, has novel features relative to the state
L. Lukacs, R. Guo, S. Kumar, B. Miklos, R. Kurzweil,
of the art. The system and how it could be evaluated
Efficient natural language response suggestion for
and implemented were then described. Importantly, the
smart reply, arXiv.org (2017). URL: https://arxiv.
design includes recommendations for user activities ex-
org/abs/1705.00652. doi:10.48550/arXiv.1705.
ternal to the system. The rationale for this is that the
00652.
system priority is to help the human user to develop
[7] H. Gonçalo Oliveira, T. Mendes, A. Boavida, Co-
an artistic style rather than to create text on the users
poetryme: a co-creative interface for the compo-
behalf. Issues around the mitigating some system bias
sition of poetry, Proceedings of the 10th Interna-
using rap lyrics was also discussed. Future work could
tional Conference on Natural Language Generation
include more detailed analysis of evaluation methods as
(2017). URL: https://aclanthology.org/W17-3508/.
well as how these could be delivered internally to the sys-
doi:10.18653/v1/w17-3508.
tem. Further work on user interface design is also a topic
[8] F. Lehmann, N. Markert, H. Dang, D. Buschek,
to develop. Additionally, the implementation proposal
Suggestion lists vs. continuous generation: Inter-
is high level and constraints such as latency, database
action design for writing with generative models
design, and other factors have not been considered. In
on mobile devices affect text length, wording and
order to build a viable prototype, software architecture
perceived authorship, in: Proceedings of Men-
would most likely form the next stage of the research.
sch Und Computer 2022, MuC ’22, Association for
Finally, to revisit the title of the paper: why build a co-
Computing Machinery, New York, NY, USA, 2022,
creative poetry system that makes people feel that they
p. 192–208. URL: https://doi.org/10.1145/3543758.
have “creative superpowers”? Studies demonstrate that
3543947. doi:10.1145/3543758.3543947.
poetry is an emotional capable of engaging the brain’s
[9] K. Arnold, A. Volzer, N. Madrid, Generative mod-
areas of primary reward [75]. It is a form of communi-
els can help writers without writing for them,
cation that has existed throughout human and across
in: Joint Proceedings of the ACM IUI 2021 Work-
cultures. In modern society, poetry has become a central
shops, 2021. URL: https://ceur-ws.org/Vol-2903/ dataset of diverse text for language modeling,
IUI21WS-HAIGEN-1.pdf. arXiv.org (2020). URL: https://arxiv.org/abs/2101.
[10] A. Ploin, R. Eynon, I. Hjorth, M. A. Osborne, Ai and 00027. doi:10.48550/arXiv.2101.00027.
the arts: How machine learning is changing artistic [20] S. Liu, H. M. Chow, Y. Xu, M. G. Erkkinen, K. E.
work. report from the creative algorithmic intelli- Swett, M. W. Eagle, D. A. Rizik-Baer, A. R. Braun,
gence research project, 2022. URL: https://www.oii. Neural correlates of lyrical improvisation: An fmri
ox.ac.uk/news-events/reports/ai-the-arts/. study of freestyle rap, Scientific Reports 2 (2012).
[11] B. Shneiderman, Design lessons from ai’s two grand URL: https://www.nature.com/articles/srep00834.
goals: Human emulation and useful applications, doi:10.1038/srep00834.
IEEE Transactions on Technology and Society 1 [21] W. Zhang, Z. Sjoerds, B. Hommel, Metacontrol of
(2020) 73–82. doi:10.1109/tts.2020.2992669. human creativity: The neurocognitive mechanisms
[12] B. Shneiderman, Creativity support tools, Commu- of convergent and divergent thinking, NeuroImage
nications of the ACM 45 (2002) 116–120. 210 (2020).
[13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, [22] D. Buschek, L. Mecke, F. Lehmann, H. Dang, Nine
I. Sutskever, Language models are unsupervised potential pitfalls when designing human-ai co-
multitask learners, 2019. URL: https://cdn.openai. creative systems, 2021. URL: https://arxiv.org/abs/
com/better-language-models/language_models_ 2104.00358. doi:10.48550/ARXIV.2104.00358.
are_unsupervised_multitask_learners.pdf. [23] T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić,
[14] S. Black, S. Biderman, E. Hallahan, Q. Anthony, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon,
L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, M. Gallé, J. Tow, A. M. Rush, S. Biderman, A. Web-
J. Phang, M. Pieler, U. S. Prashanth, S. Purohit, son, P. S. Ammanamanchi, T. Wang, B. Sagot,
L. Reynolds, J. Tow, B. Wang, S. Weinbach, Gpt- N. Muennighoff, d. Moral, O. Ruwase, R. Baw-
neox-20b: An open-source autoregressive language den, S. Bekman, A. McMillan-Major, I. Beltagy,
model, arXiv.org (2022). URL: https://arxiv.org/abs/ H. Nguyen, L. Saulnier, S. Tan, P. O. Suarez, V. Sanh,
2204.06745. doi:10.48550/arXiv.2204.06745. H. Laurençon, Y. Jernite, J. Launay, M. Mitchell,
[15] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, C. Raffel, A. Gokaslan, A. Simhi, A. Soroa, A. F.
J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, Aji, A. Alfassy, A. A. Rogers, A. K. Nitzav, C. Xu,
G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, C. Mou, C. Emezue, C. Klamm, C. Leong, v. Strien,
G. Krueger, T. Henighan, R. Child, A. Ramesh, D. I. Adelani, D. Radev, E. G. Ponferrada, E. Lev-
D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, kovizh, E. Kim, E. B. Natan, D. Toni, G. Dupont,
E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, G. Kruszewski, G. Pistilli, H. Elsahar, H. Benyam-
C. Berner, S. McCandlish, A. Radford, I. Sutskever, ina, H. Tran, I. Yu, I. Abdulmumin, I. Johnson,
D. Amodei, Language models are few-shot learners, I. Gonzalez-Dios, R. Javier, J. Chim, J. Dodge, J. Zhu,
2020. URL: https://arxiv.org/abs/2005.14165. J. Chang, J. Frohberg, J. Tobing, J. Bhattacharjee,
[16] P. Karimi, K. Grace, M. L. Maher, N. Davis, Evaluat- K. Almubarak, K. Chen, K. Lo, V. Werra, L. Weber,
ing creativity in computational co-creative systems, L. Phan, L. B. allal, L. Tanguy, M. Dey, M. R. Muñoz,
CoRR abs/1807.09886 (2018). URL: http://arxiv.org/ M. Masoud, M. Grandury, M. Šaško, M. Huang,
abs/1807.09886. arXiv:1807.09886. M. Coavoux, M. Singh, M. T.-J. Jiang, M. C. Vu,
[17] E. M. Bender, T. Gebru, A. McMillan-Major, M. A. Jauhar, M. Ghaleb, N. Subramani, N. Kass-
S. Shmitchell, On the dangers of stochastic par- ner, N. Khamis, O. Nguyen, O. Espejel, d. Gibert,
rots: Can language models be too big? , in: Pro- P. Villegas, P. Henderson, P. Colombo, P. Amuok,
ceedings of the 2021 ACM Conference on Fairness, Q. Lhoest, R. Harliman, R. Bommasani, R. L. López,
Accountability, and Transparency, FAccT ’21, As- R. Ribeiro, S. Osei, S. Pyysalo, S. Nagel, S. Bose,
sociation for Computing Machinery, New York, S. H. Muhammad, S. Sharma, S. Longpre, S. Nikpoor,
NY, USA, 2021, p. 610–623. URL: https://doi.org/ S. Silberberg, S. Pai, S. Zink, T. T. Torrent, T. Schick,
10.1145/3442188.3445922. doi:10.1145/3442188. T. Thrush, V. Danchev, V. Nikoulina, V. Laippala,
3445922. V. Lepercq, V. Prabhu, Z. Alyafeai, Z. Talat, A. Raja,
[18] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Heinzerling, C. Si, E. Salesky, S. J. Mielke, W. Y.
B. Chess, R. Child, S. Gray, A. Radford, J. Wu, Lee, A. Sharma, A. Santilli, A. Chaffin, A. Stiegler,
D. Amodei, Scaling laws for neural language D. Datta, E. Szczechla, G. Chhablani, H. Wang,
models, CoRR abs/2001.08361 (2020). URL: https: H. Pandey, H. Strobelt, J. A. Fries, J. Rozen, L. Gao,
//arxiv.org/abs/2001.08361. arXiv:2001.08361. L. Sutawika, B. M. Saiful, M. S. Al-shaibani, M. Man-
[19] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, ica, N. Nayak, R. Teehan, S. Albanie, S. Shen, S. Ben-
C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, David, S. H. Bach, T. Kim, T. Bers, T. Fevry, T. Neeraj,
S. Presser, C. Leahy, The pile: An 800gb U. Thakker, V. Raunak, X. Tang, Z.-X. Yong,
Z. Sun, S. Brody, Y. Uri, H. Tojarieh, A. Roberts, [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
H. W. Chung, J. Tae, J. Phang, O. Press, C. Li, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At-
D. Narayanan, H. Bourfoune, J. Casper, J. Rasley, tention is all you need, arXiv.org (2017). URL: https:
M. Ryabinin, M. Mishra, M. Zhang, M. Shoeybi, //arxiv.org/abs/1706.03762. doi:10.48550/arXiv.
M. Peyrounette, N. Patry, N. Tazi, O. S Sanseviero, 1706.03762.
v. Platen, P. Cornette, P. F. Lavallée, R. Lacroix, [25] N. L. Hadaway, S. M. Vardell, T. A. Young,
S. Rajbhandari, S. Gandhi, S. Smith, S. Requena, Scaffolding oral language development
S. Patil, T. Dettmers, A. Baruwa, A. Singh, A. Chevel- through poetry for students learning en-
eva, A.-L. Ligozat, A. Subramonian, A. Névéol, glish, The Reading Teacher 54 (2001) 796–796.
C. Lovering, D. Garrette, D. Tunuguntla, E. Re- URL: https://go.gale.com/ps/i.do?id=GALE%
iter, E. Taktasheva, E. Voloshina, E. Bogdanov, G. I. 7CA75085276&sid=googleScholar&v=2.1&it=r&
Winata, H. Schoelkopf, J.-C. Kalo, J. Novikova, linkaccess=abs&issn=00340561&p=AONE&sw=
J. Z. Forde, J. Clive, J. Kasai, K. Kawamura, w&userGroupName=anon%7E20f961a3.
L. Hazan, M. Carpuat, M. Clinciu, N. Kim, N. Cheng, [26] A. Bradley, Book of rhymes : the poetics of hip hop,
O. Serikov, O. Antverg, v. , R. Zhang, R. Zhang, Basic Civitas, 2017.
S. Gehrmann, S. Pais, T. Shavrina, T. Scialom, T. Yun, [27] I. Alonso, L. Davachi, R. Valabrègue, V. Lam-
T. Limisiewicz, V. Rieser, V. Protasov, V. Mikhailov, brecq, S. Dupont, S. Samson, Neural correlates
Y. Pruksachatkun, Y. Belinkov, Z. Bamberger, Z. Kas- of binding lyrics and melodies for the encoding
ner, A. Rueda, A. Pestana, A. Feizpour, A. Khan, of new songs, NeuroImage 127 (2016) 333–345.
A. Faranak, A. Santos, A. Hevia, A. Unldreaj, URL: https://pubmed.ncbi.nlm.nih.gov/26706449/.
A. Aghagol, A. Abdollahi, A. Tammour, A. HajiHos- doi:10.1016/j.neuroimage.2015.12.018.
seini, B. Behroozi, B. Ajibade, B. Saxena, C. M. Fer- [28] H. G. Oliveira, A rest service for po-
randis, D. Contractor, D. Lansky, D. David, D. Kiela, etry generation, 2017. URL: https:
D. A. Nguyen, E. Tan, E. Baylor, E. Ozoani, F. Mirza, //www.semanticscholar.org/paper/
F. Ononiwu, H. Rezanejad, H. Jones, I. Bhattacharya, A-REST-Service-for-Poetry-Generation-Oliveira/
I. Solaiman, I. Sedenko, I. Nejadgholi, J. Passmore, 5b0039186ddb41ad5d037e5dbacfae837eaa5079.
J. Seltzer, J. B. Sanz, K. Fort, L. Dutra, M. Sama- [29] H. G. Oliveira, Poetryme : a versatile plat-
gaio, M. Elbadri, M. Mieskes, M. Gerchick, M. Akin- form for poetry generation, 2012. URL: https:
lolu, M. McKenna, M. Qiu, M. Ghauri, M. Burynok, //www.semanticscholar.org/paper/PoeTryMe-%
N. Abrar, N. Rajani, N. Elkott, N. Fahmy, O. Samuel, 3A-a-versatile-platform-for-poetry-Oliveira/
R. An, R. Kromann, R. Hao, S. Alizadeh, S. Shub- 0c62affa157a453e01514042b55babff428928fa.
ber, S. Wang, S. Roy, S. Viguier, T. Le, T. Oyebade, [30] X. Zhang, M. Lapata, Chinese poetry generation
T. Le, Y. Yang, Z. Nguyen, A. R. Kashyap, A. Palas- with recurrent neural networks, Proceedings of
ciano, A. Callahan, A. Shukla, A. Miranda-Escalada, the 2014 Conference on Empirical Methods in Nat-
A. Singh, B. Beilharz, B. Wang, C. Brito, C. Zhou, ural Language Processing (EMNLP) (2014). URL:
C. Jain, C. Xu, C. Fourrier, D. L. Periñán, D. Molano, https://aclanthology.org/D14-1074/. doi:10.3115/
D. Yu, E. Manjavacas, F. Barth, F. Fuhrimann, G. Al- v1/d14-1074.
tay, G. Bayrak, G. Burns, H. U. Vrabec, I. Bello, [31] T. Van de Cruys, Automatic poetry generation
I. Dash, J. Kang, J. Giorgi, J. Golde, J. D. Posada, from prosaic text, Proceedings of the 58th An-
K. R. Sivaraman, L. Bulchandani, L. Liu, L. Shinzato, nual Meeting of the Association for Computa-
M. Hahn, M. Takeuchi, M. Pàmies, M. A. Castillo, tional Linguistics (2020). URL: https://aclanthology.
M. Nezhurina, M. Sänger, M. Samwald, M. Cul- org/2020.acl-main.223/. doi:10.18653/v1/2020.
lan, M. Weinberg, D. Wolf, M. Mihaljcic, M. Liu, acl-main.223.
M. Freidank, M. Kang, N. Seelam, N. Dahlberg, N. M. [32] J. H. Lau, T. Cohn, T. Baldwin, J. Brooke, A. Ham-
Broad, N. Muellner, P. Fung, P. Haller, R. Chan- mond, Deep-speare: A joint neural model of poetic
drasekhar, R. Eisenberg, R. Martin, R. Canalli, R. Su, language, meter and rhyme, Proceedings of the 56th
R. Su, S. Cahyawijaya, S. Garda, S. S. Deshmukh, Annual Meeting of the Association for Computa-
S. Mishra, S. Kiblawi, S. Ott, S. Sang-aroonsiri, tional Linguistics (Volume 1: Long Papers) (2018).
S. Kumar, S. Schweter, S. Bharati, T. Laud, T. Gi- URL: https://aclanthology.org/P18-1181/. doi:10.
gant, T. Kainuma, W. Kusa, Y. Labrak, Y. S. Ba- 18653/v1/p18-1181.
jaj, Y. Venkatraman, Y. Xu, Y. Xu, Y. Xu, Z. Tan, [33] Google, Verse by verse, 2022. URL: https://sites.
Z. Xie, Z. Ye, M. Bras, Y. Belkada, T. Wolf, Bloom: A research.google/versebyverse/.
176b-parameter open-access multilingual language [34] D. Uthus, M. Voitovich, R. Mical, Augmenting po-
model, arXiv.org (2022). URL: https://arxiv.org/abs/ etry composition with verse by verse, 2022. doi:10.
2211.05100. doi:10.48550/arXiv.2211.05100. 18653/v1/2022.naacl-industry.3.
[35] WriteExpress, Rhymer, 2023. URL: https://www. [49] Z. Hu, R. K.-W. Lee, C. C. Aggarwal, A. Zhang, Text
rhymer.com/. style transfer: A review and experimental evalua-
[36] Datamuse, Rhymezone rhyming dictionary and the- tion (2020). URL: https://arxiv.org/abs/2010.12742.
saurus, 2023. URL: https://www.rhymezone.com/. doi:10.48550/ARXIV.2010.12742.
[37] Rytr, Rytr - best ai writer, content generator writing [50] R. Roberts, Kendrick lamar’s pulitzer prize
assistant, 2022. URL: https://rytr.me/. sparks lively — and at times snobby — conver-
[38] M. A. Runco, Divergent thinking, creativity, and sations on the aesthetics of music, 2018. URL:
ideation. (2010). https://www.latimes.com/entertainment/music/
[39] C. Lewis, P. J. Lovatt, Breaking away from set la-et-ms-kendrick-pulitzer-reactions-20180420-story.
patterns of thinking: Improvisation and divergent html.
thinking, Thinking Skills and Creativity 9 (2013) [51] OpenAI, Openai api, 2021. URL: https://openai.com/
46–58. api/.
[40] M. A. Runco, S. Acar, Divergent thinking as an [52] Amazon, Alexatm 20b is now available in amazon
indicator of creative potential, Creativity research sagemaker jumpstart | amazon web services, 2022.
journal 24 (2012) 66–75. URL: https://tinyurl.com/amazonGPT.
[41] A. Cropley, In praise of convergent think- [53] HuggingFace, Gpt-neox, 2022. URL:
ing, Creativity Research Journal - CREATIV- https://huggingface.co/docs/transformers/main/
ITY RES J 18 (2006) 391–404. doi:10.1207/ en/model_doc/gpt_neox#overview.
s15326934crj1803_13. [54] A. Komatsuzaki, Current limitations of lan-
[42] A. T. Landau, C. J. Limb, The neuroscience of im- guage models: What you need is retrieval, 2020.
provisation, Music Educators Journal 103 (2017) 27– URL: https://www.researchgate.net/publication/
33. URL: https://doi.org/10.1177/0027432116687373. 344261335_Current_Limitations_of_Language_
doi:10.1177/0027432116687373. Models_What_You_Need_is_Retrieval.
[43] Studying the Impact of AI-based Inspiration on [55] F. Hill, K. Yuan, How instagram saved po-
Human Ideation in a Co-Creative Design Sys- etry: Social media is turning an art form
tem, 2021. URL: https://ceur-ws.org/Vol-2903/ into an industry, 2018. URL: https://www.
IUI21WS-HAIGEN-7.pdf. theatlantic.com/technology/archive/2018/10/
[44] B. Shneiderman, Human-Centered AI, Oxford Uni- rupi-kaur-instagram-poet-entrepreneur/572746/.
versity Press, 2022. [56] H. Oliver, Instagram is the future of po-
[45] A. Joshi, S. Kale, S. Chandel, D. Pal, Lik- etry, 2021. URL: https://unherd.com/2021/10/
ert scale: Explored and explained, British instagram-is-the-future-of-poetry/.
Journal of Applied Science Technology 7 [57] M. Schmidt, Lives of the Poets, Phoenix, 1999.
(2015) 396–403. URL: https://eclass.aspete. [58] E. Sheng, D. C. Uthus, Investigating societal biases
gr/modules/document/file.php/EPPAIK269/ in a poetry composition system, ACL Anthology
5a7cc366dd963113c6923ac4a73c3286ab22.pdf. (2020) 93–106. URL: https://aclanthology.org/2020.
doi:10.9734/bjast/2015/14975. gebnlp-1.9/.
[46] J. A. Olson, J. Nahas, D. Chmoulevitch, S. J. Crop- [59] P.-S. Huang, H. Zhang, R. Jiang, R. Stanforth,
per, M. E. Webb, Naming unrelated words predicts J. Welbl, J. Rae, V. Maini, D. Yogatama, P. Kohli,
creativity, Proceedings of the National Academy Reducing sentiment bias in language models via
of Sciences 118 (2021). URL: https://www.pnas.org/ counterfactual evaluation, 2019. URL: https://arxiv.
content/118/25/e2022340118. doi:10.1073/pnas. org/abs/1911.03064. doi:10.48550/ARXIV.1911.
2022340118. 03064.
[47] J. Ocumpaugh, M. Mercedes, T. Rodrigo, [60] A. K., M. P. Gangan, D. P., L. V. L., Towards an En-
K. Porayska-Pomsta, I. Olatunji, R. Luckin, hanced Understanding of Bias in Pre-trained Neural
Becoming better versed: Towards the design Language Models: A Survey with Special Emphasis
of a popular music-based rhyming game for on Affective Bias, Springer Nature, Singapore, 2022.
disadvantaged youths, Proceedings of the 26th In- [61] J. Lynch, Hip-hop passes rock to become most pop-
ternational Conference on Computers in Education. ular music genre for first time in history: Nielsen,
Philippines: Asia-Pacific Society for Computers 2018. URL: https://www.businessinsider.com/
in Education (2018). URL: https://apsce.net/icce/ hip-hop-passes-rock-most-popular-music-genre-nielsen-2018-1?
icce2018/wp-content/uploads/2018/12/C6-04.pdf. r=US&IR=T.
[48] H. Hirjee, D. Brown, Using automated rhyme de- [62] A. Texas, Hip-hop is the most listened to genre in
tection to characterize rhyming style in rap mu- the world, 2015. URL: https://www.nme.com/news/
sic, Empirical Musicology Review 5 (2010) 121–145. music/various-artists-1151-1214849.
doi:10.18061/1811/48548. [63] Wikipedia, Hip hop, 2021. URL: https:
//en.wikipedia.org/wiki/Hip_hop.
[64] T. Ingham, Nearly a third of all streams in
the us last year were of hip-hop and rnb
artists as rock beat pop to second, 2021. URL:
https://www.musicbusinessworldwide.com/
nearly-a-third-of-all-streams-in-the-us-last-year-were-of-hip-hop-and-rb-music/.
[65] E. Malmi, P. Takala, H. Toivonen, R. Tapani, A. Gio-
nis, Dopelearning: A computational approach to
rap lyrics generation *, KDD ’16: Proceedings of
the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (2016).
doi:10.1145/2939672.2939679.
[66] J. Eastwood, E. Hinton, We wrote an algorithm to
unravel the rhymes of hit musical ‘hamilton’, 2016.
URL: http://graphics.wsj.com/hamilton/.
[67] C. D, Fight the power: How hip hop changed
the world, ???? URL: https://www.bbc.co.uk/
programmes/p0dj70yd.
[68] N. Condit-Schultz, MCFlow: A Digital Corpus
of Rap Flow, Ph.D. thesis, 2016. URL: https://etd.
ohiolink.edu/apexprod/rws_etd/send_file/send?
accession=osu1461250949&disposition=inline.
[69] J. Eastwood, E. Hinton, How wsj used an algorithm
to analyze ‘hamilton’ the musical, 2016. URL: http:
//graphics.wsj.com/hamilton-methodology/.
[70] A Small-Data Mindset for Generative AI Creative
Work, 2022.
[71] Musixmatch developer api, 2023. URL: https://
developer.musixmatch.com/.
[72] S. Presser, Gpt-2 neural network poetry, 2019. URL:
https://www.gwern.net/GPT-2.
[73] S. Mcgregor, K. Agres, M. Purver, G. Wiggins, From
distributional semantics to conceptual spaces: A
novel computational method for concept creation,
Journal of Artificial General Intelligence 6 (2015)
55–86. doi:10.1515/jagi-2015-0004.
[74] D. Yang, Y. Zhou, Z. Zhang, T. Jia, J. Li, R. Lc, Ai as an
activewriter: Interaction strategies with generated
text in human-ai collaborative fiction writing, 2019.
URL: https://ceur-ws.org/Vol-3124/paper6.pdf.
[75] E. Wassiliwizky, S. Koelsch, V. Wagner, T. Jacobsen,
W. Menninghaus, The emotional power of poetry:
neural circuitry, psychophysiology and composi-
tional principles, Social Cognitive and Affective
Neuroscience 12 (2017) 1229–1240. doi:10.1093/
scan/nsx069.