=Paper= {{Paper |id=Vol-3359/paper8 |storemode=property |title=Why try to build try to build a co-creative poetry system that makes people feel that they have "creative superpowers"? |pdfUrl=https://ceur-ws.org/Vol-3359/paper8.pdf |volume=Vol-3359 |authors=Ibukun Olatunji |dblpUrl=https://dblp.org/rec/conf/iui/Olatunji23 }} ==Why try to build try to build a co-creative poetry system that makes people feel that they have "creative superpowers"?== https://ceur-ws.org/Vol-3359/paper8.pdf
Why try to build try to build a co-creative poetry system that
makes people feel that they have “creative superpowers”?⋆
Ibukun Olatunji*
Computational Foundry, Swansea University, Crymlyn Burrows, Skewen, Swansea, United Kingdom, SA1 8EN


                                          Abstract
                                          The paper examines co-creative writing systems, and argues that existing Large Language Models could potentially reduce
                                          human capacity. Furthermore, existing sociocultural inequalities might be exacerbated by the widespread adoption of such
                                          generative systems. The paper instead suggests a custom approach, using co-creative poetry writing as an example. The
                                          system has architectural changes from typical language models to better support poetry. It also uses rap lyrics as part of the
                                          training data in order to help reduce sociocultural bias. A high level system implementation is proposed along with some
                                          evaluation methods. Evaluation is based on expert judgement on final outputs, and user performance on language tasks
                                          associated with human creativity. The final section of the paper explores how and why alternatives to existing co-creative
                                          systems could benefit individual users as well as wider society.

                                          Keywords
                                          Creativity, poetry, co-creativity, natural language processing, language models, writing support tools, data sets,



1. Introduction                                                                                        discussion of the social and cultural limitations of current
                                                                                                       generative systems. It expands on section one in explor-
This paper examines co-creative systems using poetry ing bias and proposes a mitigation through the use of rap
writing as an example. Within the paper ’poetry’ includes lyrics. Section five describes the theoretical and practical
song lyrics. Section one of the paper explores poetry in limitations of the paper as well as future work. Section
terms of human creativity. Poetry is chosen as it is a six provides a summary of the paper’s contribution. The
creative task that non-expert humans can outperform section ends with answers to the question: why try to
machines on vs creative outputs such as image genera- build try to build a co-creative poetry system that makes
tion. After introducing the case for poetry, there is an people feel that they have “creative superpowers”?
exploration of recent work in generative computational
systems. As well as being the technical state of the art,
these systems provide a conceptual framework to explore
                                                                                                       1.1. Human Creativity
sociocultural issues such as bias and inclusion. Section Human creativity is the ability to come up with ideas or
one then explores a range of poetry-specific systems and artefacts that are new, surprising, and valuable. Rather
ends with a more detailed case study. The case study than a solitary act, it results from the interaction of social
examines a system that combines elements of more pow- elements; a culture that contains symbolic rules, a person
erful general models and custom architectural features who brings novelty into the symbolic domain, and peo-
specific to poetry writing. Section two details the eval- ple who recognise and validate the innovation. [1, 2, 3].
uation issues and methods that might be employed for Boden makes a further distinction between psychological
the proposed co-creative system. The emphasis on this and historical creativity (P-creativity and H-creativity).
section is on how to evaluate human improvement over P-creativity involves coming up with an idea that’s new
time. Section three explores a high level implementation to the person who comes up with it. H-creativity means
of the system. It builds on the evaluation to propose that (so far as we know) no-one else has had it before:
both an architecture and a method to testing if the pro- it has arisen for the first time in human history [2, 4].
posed system has, in principal, any benefits over and Machine learning models have the potential to support
above those described in section one. Section four is a human creativity [5, 6, 7]. However, questions remain on
                                                                                                       their design and influence in augmenting human capacity
Joint Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, as opposed to reducing it [8, 9, 10]. Shneiderman sug-
Australia
⋆
  Ibukun Olatunji. 2023. Why try to build a co-creative poetry system
                                                                                                       gests that "researchers’ goals shape the questions they
  that makes people feel that they have “creative superpowers”? raise, collaborators they choose, methods they use, and
  In Joint Proceedings of the ACM IUI 2023 Workshops. Sydney, outcomes of their work."[11]. This leads to the question:
  Australia, 13 pages.                                                                                 how can designers of programming interfaces, interactive
*
  Corresponding author.                                                                                tools, and rich social environments enable more people
$ i.o.olatunji.2030349@swansea.ac.uk (I. Olatunji)
          © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License to be more creative more often? [12]
                                    Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
                                           Language Model Characteristics
Table 1
Summary of State-of-the-Art Language Models by Size, Model Type and Ownership

                     Model Name             Parameters            Model Type        Owner
                       BERT              110 - 340 million       Transformer        Google
                       GPT-2                1.5 billion          Transformer        OpenAI
                      LaMDA                 137 billion          Transformer        Google
                       GPT-3                175 billion          Transformer        OpenAI
                ChatGPT/InstructGPT         175 billion          Transformer        OpenAI
                      BLOOM                 176 billion          Transformer        BLOOM Project
                Megatron-Turing NLG         530 billion          Transformer        Microsoft and NVIDIA
                       PaLM                 540 billion          Transformer        Google
                       GLaM                  1 trillion        Mixture of Experts   Google




1.2. Computational Systems                        1.3. General Purpose Language
In computational terms, automated systems are now      Generation
capable of writing poetry approaching human levels             LLMs are trained to predict the next word, or series of
[13, 14, 15]. Karimi et al consider three three main strate-   words, in a a text sequence. They model text corpora as
gies by which the role of humans in creative systems can       probability distributions. Users write a short text prompts
be characterized: fully autonomous systems, creativity         to tell the system what to generate. Depending on how
support tools, and co-creative systems [16]. Although          many examples are provided in the text prompt, the sys-
the paper is primarily concerned with co-creative sys-         tem is referred to as zero-, one-, and few-shot learning
tems, it will to blend the categories where necessary. The     [13, 15, 17]. Pretrained language models have become
reasoning for this is that the human users do not make         a cornerstone of modern natural language processing
the same distinctions; also, the features and usage are of-    (NLP) pipelines because they often produce better per-
ten blended in the real-world, e.g an autonomous system        formance from smaller quantities of labeled data [23].
that is used by a creator as an input and thus becomes a       Within general LLMs, the transformer has established
support tool and/or co-creative system [10]. The next sec-     itself as best performing on benchmark language process-
tion briefly outlines the state of the art in computational    ing tests [13, 15, 24]. As well as being able to perform
writing systems.                                               tasks such as text summarising and question answer-
   Language models (LMs) refer to systems that are             ing, LLMs have the potential to support creative writing
trained on string prediction tasks: predicting the like-       [6, 8, 9]. Current state-of-the-art LLMs are summarized
lihood of a token (character, word or string) given either     in table 1. However, despite impressive technical achieve-
the preceding context or its surrounding context. Such         ments, LLMs have limitations including: (a) models, as
systems are unsupervised and when deployed, take text          they scale, might eventually run into the limits of any pre-
as input, and output scores or string predictions [17].        training objectives; (b) the models are expensive and dif-
Large Language Models (LLMs) trained on sufficiently           ficult to perform inference on; (c) model decisions are not
large and diverse data sets are able to perform well across    easily interpretable; (d) the majority of the research com-
domains and there is a correlation between model per-          munity, and by extension disadvantaged social groups,
formance and size [18]. State-of-the-art models are able       have been excluded from the development of LLMs as
to generate text that approach or surpasses that of some       they are proprietary (see table 1) and, (e) most LLMs are
humans[13, 14, 15, 19]. The emphasis on some humans            primarily trained on English-language text that contains
is an important with respect to user characteristics; in       data biases [13].
broad terms, humans co-creating poetry can be consid-
ered as either inexperienced or advanced users. Research
on creative tasks such as improvisation suggests that
                                                               1.4. Poetry Specific Language Generation
users vary in cognitive processing based in part on their      Creating poetry is creative skill that requires extensive
experience and skills levels[20, 21]. A well-designed co-      vocabulary, phonemic awareness to produce complex
creative system should therefore take differences in user      rhyme patterns, and general knowledge of enough sub-
support needs into account [8, 9, 22].                         jects about the world to be able to tell interesting stories
                                                               about a range of topics [20, 25, 26, 27].
                                              Poetry Creation Systems
Table 2
An Overview of Selected Poetry Writing Tools by Type

      Type             Example                         Key Features                Constraints
  Autonomous          ChatGPT                  Natural language input              Plain text output
                                              Generates poems and lyrics           Customisation by text prompt

  Autonomous           co:here                 Natural language input              Plain text output
                                              Generates poems and lyrics           High latency

  Autonomous             Rytr                  UI has song lyric option            Uses GPT-3 models
                                               Extensive text processing           Not trained on song data

    Support          RhymeZone              Rhyming dictionary/thesaurus           Single word only
                                            Generates rhyme suggestions            Cannot be used to write text

    Support            Rhymer                   Rhyming dictionary                 Single word only
                                            Generates range of word types          Cannot be used to write text

    Support       Poetry Foundation         Poetry archives and tutorials          No support for real-time creation
                                           Guides user to external resources       No user customisation options

  Co-creativity    Poem Generator          Customise inputs to create poem         Input variables fixed
                                           Variety of formal poetic outputs        Limited user interaction or feedback

  Co-creativity       DeepBeat             Generates and/or suggests lyrics        Confusing user interface
                                          Displays sources of lyric inspiration    Unoriginal output vs GPT-3 models

  Co-creativity     Verse by Verse     Suggests stanzas in style of known poets    Limited forms of poetry
                                          Language model accounts for bias         Trained on selected U.S poets




1.5. Overview of Poetry Support Tools                         features that allow it to operate as both a co-creative and
                                                              autonomous system [37]. Having looked at the computa-
Historically, poetry creation systems tended to built on
                                                              tional systems, it is instructive to briefly consider poetry
the model of the an AI writing a full poem by itself, thus
                                                              writing from a human perspective. It will help inform
writing in a closed system [28, 29, 30, 31]. Early sys-
                                                              the design of a new poetry writing system.
tems tended to be rule-based [32]. More recently, some
                                                                 Writing poetry requires a range of general creative
approaches have started to explore human interaction
                                                              skills that can be framed in terms of divergent and conver-
when composing poems[33, 34]. Table 2 provides a broad
                                                              gent thinking; these are used in varying ways throughout
summary of selected systems including autonomous, sup-
                                                              a multi-stage writing process. For simplicity, the stages
port tools and co-creative as defined by Karimi et al [16].
                                                              include (a) exploration which is characterised by diver-
The category distinction helps frame a range of (human)
                                                              gent thinking [21, 38, 39, 40]; (b) focused work is uses
creative processes and (technology) interactions. It is
                                                              convergent thinking [21, 41] and, (c) re-drafting. It is
also a useful way to consider ways in which the pro-
                                                              useful in the stages to distinguish between internal and
posed system is different to those that currently exist;
                                                              external co-creation system activities. Internal is when
and as importantly, ways in which it is similar. At a high
                                                              the user interacts with the system in real-time, e.g writing
level, the autonomous systems are designed to be able
                                                              or redrafting text; external is when the user participates
to create finished works (sometimes called ’products’ or
                                                              in activities such as browsing, reading or other things
’artefacts’). The support tools are used as part of the
                                                              that do not not use the system. The framing of internal
creative workflow. For instance, RhymZone or Rhymer
                                                              and external system activities is based on the reasoning
help a user find words that sound similar to those they
                                                              that; (a) skill: inexperienced users are unlikely to possess
might use in a poem [35, 36]. Co-creative systems facili-
                                                              the improvisational skill required to create full poems in
tate humans and computational systems to make shared
                                                              real-time due to cognitive processing constraints [20, 42];
products. That said, the distinction is not fixed. For
                                                              (b) speed: users might choose to write poems over mul-
example, Rytr, contains text editing, display and other
tiple sessions, in this case external system stimuli could        inform their own poetic development.
have supported the writing; (c) knowledge: advanced
writers are usually familiar with a body of existing that
informs their work [1] and, (d) process: reflecting and           2. Experiment Design
redrafting is an important part of writing . The reflecting
                                                                  Verse by Verse ran comparative evaluations of the system
stage often takes place separately to the creation of the
                                                                  against poems written by classic poets. Although the sys-
work itself [1, 10].
                                                                  tem was intended to be used as an interactive co-creator
                                                                  for the human writing a poem, the author’s stated it was
1.6. Case Study: Verse by Verse                                   still worth evaluating how the system could perform on
                                                                  its own in writing a poem given a first line of verse [34].
                                                                  This approach has been adopted within the proposed sys-
                                                                  tem experimental design, implementation and evaluation.
                                                                  The next subsection explores evaluation prior to looking
                                                                  at implementation. The rationale is that the evaluation is
                                                                  perhaps a harder problem as it involves an intersection
                                                                  of multiple disciplines (e.g. computational sciences, arts,
                                                                  linguistics, and pedagogy). Implementation can mostly
                                                                  be restricted to computational science domains.


Figure 1: Google’s Verse by Verse: users select from a range of
                                                                  2.1. Evaluation Overview
US poets and custom design a poem by choosing from features       Evaluating co-creative systems is still an open research
including the number of syllables per line and the number of      question and there is no standard metric for measuring
stanzas.
                                                                  computational co-creativity [16, 43]. Karimi et al describe
   Screenshot from Verse by Verse application by Google           the limited research investigating how co-creative sys-
                                                                  tems can be evaluated. They present four questions as a
   Google Research Verse by Verse is relevant case study          way to compare how (existing) co-creative systems eval-
as it is arguably the most technically advanced poetry-           uate creativity: who is evaluating the creativity, what is
specific generative system. As well as using transformer          being evaluated, when does evaluation occur ,and how
model architecture, it also uses informational retrieval,         the evaluation is performed [16]. Calderwood et al point
and considers bias within its design. Verse by Verse aug-         out that "writers engaged with co-creative systems are
ments user poetry composition by offering suggestions             looking for creative insight, something not measured by
to a user as they compose a poem. The authors of the              perplexity or by a language model’s ability to solve the
system argue that relative to a creating full poems, "this        canonical downstream NLP tasks [5]. For the evaluation
is a much more challenging task, as one needs be able             of the system proposed to be effective it is insightful to
to offer suggestions with minimal latency while meeting           restate its goals in more detail. The co-creative poetry
constraints of the poem structure and handle the chal-            system’s goal is “making people feel that they have “cre-
lenges of user input[34]. Figure 1 shows part of the sys-         ative superpowers”? To achieve this, the system supports
tem’s user interface (for PC). From a user’s point of view,       users to create better poetry than they might otherwise
the experience is as follows (a) the user selects poet(s) to      have done without the system. The terms supports and
inform the suggestions; (b) the user designs poem struc-          better will be further explored as they form the basis of
ture as illustrated in figure 2; (c) the user writes the first    evaluation.
line of text and, (d) the system offers suggestions in the           Augmenting human users is central to HCAI and a
style of the poet(s) the user selected earlier. The user can      contrast to a closed model that creates on behalf of the
then work with, modify or have the system create new              user [8, 34, 44]. This point is made in recent work that
verses. The Verse by Verse design has an external system          refers to pitfalls when designing human-AI co-creative
context that, in general , LLMs do not. To some extent,           systems, as well as other work which asserts that gener-
the system helps poetry writers become better readers.            ative models can help writers without writing for them
In his work on creativity it was suggested to Csikszentmi-        [5, 9, 22]. The arguments these, and similar work, make is
halyi that "the only way you become a poet...is because           that too much automated creation can be at the expense
you’ve read a poem...poetry depends on the whole po-              of human users [9, 22]. Adopting this thinking, it is use-
etic tradition of the past...you have to decide...out of all      ful to evaluate the system and its users independently,
that previous poetry, what is most interesting to me?"            as well as in combination. This in theory allows (system)
[1] Verse by Verse, by making users aware of the work             internal and (human) internal and external measurement.
of other poets, helps users become readers in order to
The end goal here is that human users develops their                  be given to users (inexperienced and advanced
capacity; this could be external to the system, whereby               ) with the same constraints as the system in
the system as acted as a creative prompt. A description of            terms of keywords, topics, character limits etc.
how this could work in principle follows. A later section             The evaluation for experiment A is by humans
describes system implementation.                                      who judge the quality of the poems (which
                                                                      are anonymous) by a Likert scale and free text
2.2. Process and Objectives                                           summary.

The system would run a number of experiments with the             2. Hypothesis-B that poetry specific language
purpose of establishing which system components most                 generation customised for a given user could
support users to write “better” poetry; in goal terms,               outperform vanilla poetry specific generation
better is evaluated (a) subjectively by users via a Likert           with respect to creating poems. Experiment B:
scale [45] and (b) by performance on related tasks such as           each system state generates complete poetic
the Divergent Action Task, Bridge-the-Associative-Gap                text but some states are pretrained to customise
Task, or rhyme creation and identification [46, 47]                  characteristics with respect to given users and
The tasks would be completed external to the system.                 their poetic styles. The evaluation for experiment
The goals of the evaluation are to measure to what                   B is by humans who judge the quality of the
extent users are actually improving their poetry writing             poems by a Likert scale and free text summary.
abilities, and the degree to which any improvement                   The evaluation is focused on how well the poems
is as a result of internal system features. For a user,              represent the given users’ individual style.
improvement is concerned with "the writer’s goals or
their desire to have an individual voice" [9]. With this          3. Hypothesis-C that external recommendations, full
as a basis, the evaluation process takes the form of a               or part poems, based on given user characteristics
number of hypotheses and related experiments, the                    are supportive with respect to users writing their
purpose of which is to explore; (a) how well general                 poems. Experiment C: for given users generated
vs poetry specific language models can write full                    poetic text inputs, the system state generates (ex-
poems; (b) if poetry specific language models can                    ternal to system) poetic text recommendations
better represent individual users style than generalised             that the user reads and reflects on before complet-
language models; and, (c) the extent to which users                  ing their poem. The evaluation for experiment
benefit when writing poems from system recommenda-                   C is by humans who judge how well the poem
tions. The hypotheses and experiments are concerned                  recommendations helped them write poems in
with poetic text style which describes the ways (an                  the theme, topic or style they were attempting to
author) uses language, including prosody, word choice,               achieve.
sentence structure and use of figurative language [48, 49].
                                                                 The approach described provides a sense of how user
                                                              activities (internal and external) with respect to the sys-
   A central challenge for the proposed system is that
                                                              tem can be evaluated. In practice, more fine-grained
the development and attainment of an individual poetic
                                                              evaluation criteria would be required based on further
voice is highly subjective. Beyond subjectivity, poetry is
                                                              research and operational or implementation design; as
from a societal perspective often a question of cultural
                                                              far as possible, a complete system would have an aware-
value which over time may well change. In reference
                                                              ness of all relevant evaluation data including for instance,
to Kendrick Lamar’s 2018 Pulitzer Prize, a first for a rap
                                                              external system reading of poems. At this stage, the
album, their administrator of prizes said, "..this is not a
                                                              evaluation proposed is limited to the extent necessary in
genre we’ve seen celebrated before, so that in that sense
                                                              order to support the explanation of how and why the sys-
it’s historical." [50] Furthermore, as Boden states, "...even
                                                              tem might work. A later section (Limitations and Future
in science, values are often elusive and sometimes change-
                                                              Work) will the explore the limitations as suggest possible
able...because values are highly variable, it follows that
                                                              remedies.
many arguments about creativity are rooted in disagree-
ments about value. This applies to human activities no
less than to computer performance." [2]                       3. Proposed Implementation
    1. Hypothesis-A that poetry specific language The system would have a number of states that range
       generation could outperform general language from full automation to text prompts acting as a starting
       generation with respect to creating poems. point for the user. The support states envisaged are:
       Experiment A: each system-state generates
                                                        1. State-A: general language system implemented as
       complete poetic texts. The prompts would also
                                                           standard.
    2. State-B: general language system implemented               3.1. Sparse And Dense Network Model
       with modified architecture to include user gener-
                                                                  The system (figure 2) operates as a Sparse And Dense
       ated content within training set and/or network
                                                                  Network (SPAD). The name refers to the system being
       architecture preferences.
                                                                  sparse with respect to user input tokens as compared to
    3. State-C: poetry specific system implemented with
                                                                  tokens contained in the LM/LLMs. Against this, the sys-
       standard architecture.
                                                                  tem is dense in terms of leveraging transformer models
    4. State-D: poetry specific system implemented with           and their associated attention layers (table 1). The intu-
       modified architecture to include user generated            ition is to use a small amount of personalised user text to
       content within training set and/or network archi-          attempt to customise the output of powerful LMs/LLMs.
       tecture preferences.                                       This differs from existing approaches in the following
   The LLM component of the system would use publicly             ways.
available APIs an, where possible, modify network archi-
                                                                       • State-of-the-Art LLMs form part of the SPAD in
tecture directly where possible [51, 52, 53]. In most cases
                                                                         order to help improve the SPADs performance; in
(table 1) LLM are closed black box systems as illustrated
                                                                         other words, the LLMs are source of input train-
in (figure 3). In part for this reason, ideally a custom
                                                                         ing data and as such multiple LLMs could in the-
poetry and lyric language model would be implemented;
                                                                         ory be included in the SPAD architectural design.
aside from practicalities (which will be discussed) there
is a a technical challenge in that a poetry and lyric LM               • A poetry specific LLM (GPT-NeoX) forms part of
would be far smaller than a general LLM. Given the re-                   the design; poetry specific refers to adaptations
search on LLM size and performance, a custom poetry                      to the underlying model architecture in order
and lyric LM would in theory therefore under perform                     that token processing and output is more optimal
against state-of- the-art LLMs [18, 54, 15]. In line with a              with respect to poetry than prose. An example
recent study, which experimented with user experiences                   of this might be applying additional linguistic
of language models, the system could be implemented                      layers within the network to favour text strings
with a combination of JavaScript, React, Python and Flask                with syllable frequencies found more regularly
[8]. The system would then be deployed as a web appli-                   in poems than say news articles or web pages.
cation for mobile phones. Mobile is preferred to PC on                   Although architecture is referred to, much of any
the basis of its greater reach as a device for both reading              benefit at this stage might come from modifying
and creating contemporary poetry [55, 56].                               the training data and associated recipes. The po-
                                                                         etry specific LLM would also leverage data from
                                                                         the general LLM (for simplicity any interaction
                                                                         between the two elements is not included in fig-
                                                                         ure 2).
                                                                       • Poetry and lyric LM is a custom model whose
                                                                         network architecture and training data is specific
                                                                         to poetry. In practical terms it is not a LLM as
                                                                         the available training data is not likely to be suf-
                                                                         ficiently extensive vs the current state of the art.
                                                                         As well as providing a data contrast to the LLMs,
                                                                         this part of the network will also act as a style
Figure 2: SParse And Dense Network Model Elements                        transfer layer in so far as it identifies and tries to
                                                                         modify input text to create poetic styles. These
1. Text input by user is returned as partially completed poetic          styles will be mapped onto user styles upstream
text and/or poetic and lyrical recommendations for the user              within the system.
to consider. 2. User personalised data submitted as poems or
lyrics and/or recommendations of favourite artists and their         The result of the models described above, is a system
  work. These are used to create a corpus of user text. Prior
                                                                  that contains information on generalized poetic style
     examples of user generated text uploaded to system;
                                                                  as well as individual style preference(s) unique for each
 recommender and/or database search to enhance user text
with additional poetic texts (e.g from web crawl) 3. Database     user. This allows the system to support users with spe-
of poetic texts (and song lyrics) from web crawl. Clean text is   cific co-writing tasks (e.g text generation) as well as offer
   included as well as metadata such as rhyme scheme and          personalised recommendations further reading of rele-
                     Parts of Speech (PoS).                       vant poems and/or poets. In user experience terms, this
                                                                  might be delivered via an interface that allows the user
                                                                  to switch between (a) writing text; (b) editing generated
text; and (c) reading and reflecting on specific poetic         racist, sexist or ableist) [17]. Studies have how that harms
recommendations made by the system.                             can also exist because of (a) exclusionary social norms
   At this stage, the proposed mode is high-level. There        in language within language. For example, ‘family’ is
are open questions relating to issues such as real world        often defined as a basic social unit consisting of a mar-
implementation, customisation of user text, acquisition         ried woman, man and their children; language models
of training data and other areas. The penultimate section       internalizing such social norms could be highly discrim-
will revisit some of the open design questions and attempt      inatory towards people outside this definition [60]; (b)
to provide answers. The next section explores the soical        greater propensity to label of language of marginalized
significance of poetry and how the a system design could        or underrepresented groups as toxic in hate speech detec-
use this to enhance cultural inclusiveness.                     tion (e.g. the ‘angry black woman’ stereotype) [60]; and
                                                                (c) over representation of certain groups such as white
                                                                males 18-34 within widely used training data (e.g Red-
4. Discussion                                                   dit posts) [17]. Bender et al assert that, “in the case of
                                                                US and UK English...white supremacist and misogynistic,
An important goal for poetry is for each writer to discover
                                                                ageist, etc. views are over represented in the training
or develop their own unique style, or artistic voice. Part
                                                                data, not only exceeding their prevalence in the general
of a writers development will a result of what poetry they
                                                                population.” [17]. The authors go on to say that the data
have previously been exposed to. Robert Graves stated
                                                                underpinning LMs stands to “misrepresent social move-
that, “only a poet of experience...can hope to put himself
                                                                ments and disproportionately align with existing regimes
in the shoes of his predecessors, or contemporaries, and
                                                                of power.”
judge their poems by recreating technical or emotional
                                                                   There are a number of studies that explore bias mit-
dilemmas which they faced while at work on them." [57]
                                                                igation through computational techniques such as (a)
It can be argued that this statement is, in contemporary
                                                                augmentation of the training data using style transfer
terms, biased in gender terms given the assumption of
                                                                [58] or (b) using counterfactuals to reduce sentiment bias
‘poet’ being male. Graves’s central argument about expe-
                                                                [59]. However, in their study describing GPT-3 the au-
rience however is echoed in recent studies on language
                                                                thors caution against on over reliance on computational
models. A study by Cheng and Uthus made the point
                                                                solutions. They instead ask for “...more research that en-
that “as creative works are often shaped by the lived ex-
                                                                gages with the literature outside NLP, better articulates
periences and timely issues of the creator’s life, a poetry
                                                                normative statements about harm, and engages with the
composition system trained on poems from different au-
                                                                lived experience of communities affected by NLP sys-
thors of different eras may reflect a variety of societal
                                                                tems...mitigation work should not be approached purely
biases." [58] Within computer science, social bias is a sub-
                                                                with a metric driven objective to ‘remove’ bias...but in
ject gathering more research attention [17, 59, 60] How-
                                                                a holistic manner [15]. For the use case of a poetry co-
ever, as well as attempting to mitigate negative impacts
                                                                creation system, bias could be potentially mitigated by
for disadvantaged groups, considering bias also offers
                                                                including rap lyrics as a key part of the training data set.
possibility of designing systems that leverage cultural,
poetic and linguistic resources that would otherwise be
missed. This can benefit all user groups. The next section 4.2. Towards Culturally Responsive
provides a more concrete example.                               Models
                                                                Emerging from a hobby of African American youth in the
4.1. Bias in Language Models                                    1970s, rap (as an element of hip-hop) has quickly evolved
It has been recognised and accepted in recent years that        into a mainstream culture and is the most popular music
LLM used for text generation contain bias [17, 60] A            genre in the U.S and many other territories [61, 62, 63, 64].
study by Uthus suggests that “biases in creative language       Writing rap lyrics requires both creativity to construct
applications are under explored”; it goes on to say it is im-   a meaningful, interesting story and lyrical skills to pro-
portant to examine biases in these applications because         duce complex rhyme patterns [26, 48, 65]; within the
they intended for contexts such as self-expression, collec-     culture of rap, writers are evaluated by peers on the ba-
tive social enjoyment, and education [58]. One of the key       sis of their wordplay, linguistic complexity and ability
sources of bias in LLM is in the training data sets. LLM        to use multiple rhyme types (perfect and imperfect) as
retains the biases of the data they have been trained on        well as multi-syllabic rhymes [26, 66]. In many ways,
[15]. Typically the model’s pick up on, or reflect, biases      the writer within the hip-hop tradition sets language
and overtly abusive language patterns in training data.         puzzles for their audience. In a recent BBC documen-
This can lead to harms for some users such as encounter-        tary, Chuck D, the founder of Public Enemy remarked
ing derogatory language or discriminatory language (e.g.        that "poets were always...going to give you everything
                                                                the truth...that’s very important not only in the realm of
hip hop...but in the realm of artistry.” [67] Recent com-         • Customizing models for individuals: this is a sys-
putational studies have explored rap on account it its              tem objective but has not been tested. Technically,
complexity and cultural significance [65, 68, 69]. Rap              there is a conflict between the scale and perfor-
has historically been excluded from most mainstream                 mance benefits of LLM/LM and the comparatively
discussions on co-creative systems and poetry writing.              small datasets of individual users. However, as
There may well be valid reasons for this such as language           Vigliensoni et al argue, working with small-scale
appropriateness, perception around negative sentiments,             datasets is an overlooked but powerful mecha-
offensive content, and difficulties in accessing material           nism for enabling greater human influence over
under copyright. However, although there are challenges,            generative AI systems within in creative con-
the benefits of using extensive rap lyrics within LM data           texts [70]. The authors describe an experimen-
sets include:                                                       tal project, ReRites by Johnston which involved
                                                                    fine-tuning GPT-2 on the artists’ custom poetry
     • Training data that represents wider audience con-            corpus to generate poems. An approach such as
       cerns, thoughts and feelings.                                this could be taken although clearly using models
     • Training data will be dynamic and reflect contem-            such as GPT-2 (for which source code is avail-
       porary sociopolitical issues.                                able) has the limitation of performance vs current
     • Opens up the possibility of bring voices from ex-            state-of-the art LLMs. The personalizing of LLMs
       cluded communities into the NLP community.                   to individual users is an open topic that requires
     • LMs would be enhanced by a linguistically rich               further research.
       and varied source of data..
     • Allows lyrics to be part of a wider conversation           • Acquiring training data: training data for poetry
       which potentially generates. new research in-                and rap lyrics would not be readily available in
       sight (for computational, language and social re-            the way that the Pile or equivalents are used for
       searchers).                                                  general LLMs [19]. The solution to this would be
                                                                    to source data from scraping the web for lyrics, or
   Ultimately, as contemporary music’s biggest genre,               directly from services such as MusixMatch [71]
and the one most concerned with rhyme and wordplay,                 Poetry training data, much of which will be out
there are multiple reasons to explore using rap lyrics as           of copyright, can be acquired via sites such as
training data.                                                      Project Gutenberg and Poetry Foundation. This
                                                                    approach to training data was used in a 2019 ex-
                                                                    periment to create a poetry-specific LLM based
5. Limitations and Future Work                                      on the GPT-2 model [72].
The paper has a number of limitations. Below some of            Evaluation: literature on evaluating the creativity in a
these are described along with suggested directions for      co-creative systems considers a wide number of factors
future work. System Design and Implementation: the           such who evaluates the creativity (e.g. system itself or hu-
paper does not fully explore how the proposed system         man users), what is being evaluated (e.g. user interaction
could be built. In particular, there are challenges around   or output), when does evaluation occur (e.g in real time
the following:                                               or at the end of a session) and how the evaluation is per-
     • Building custom LLMs. One of the design lim-          formed (e.g. methods and related metrics) [16]. There is
       itations is how to effectively experiment with        a broad set of metrics for developing computational mod-
       models of varying degrees of openness (for con-       els for evaluating creativity. With respect to the system
       venience referred to as black, grey and white box).   described, the most relevant include a proposed compu-
       For black box models (e.g GPT-3) there is no way      tational model by Agres et al. The model reflects human
       at present to modify the architecture. What in-       conceptualization of musical and poetic creativity [73].
       stead might be possible is to fine-tune the model     Future work could explore the kind of model described
       via custom queries over a period of time. So,         alongside other linguistic-based metrics such as the Di-
       what combinations of prompts generate the most        vergent Action Task, Bridge-the-Associative-Gap Task, or
       favourable outputs. Grey box models (BLOOM            rhyme creation and identification tasks. [46, 47] Addi-
       or GPT-NeoX) offer the possibility of powerful        tionally, building on machine learning practices, metrics
       models with open-source training and evaluation       could be derived for accuracy in terms of the degree to
       code plus model weights [53]. However, the costs      which generated output matches a reference dataset. For
       of running and/or adapting these models could         example, if the user has a target poetic style, it might
       be substantial and not something the paper has        be possible to computationally determine the extent to
       explored.                                             which the completed poem was accurate or not. The
paper has not explored these kinds of evaluation in de-      part of the most popular music genre. Poetry matters to
tail and they would form part of future work. Finally,       society. By extension, it is worth building system that
though the evaluations proposed are limited, they could      can help people experience it firsthand and connect with
nevertheless contribute to the wider discussion around       its traditions. The aim though should not be to make
the topic. As Karimi et al assert "evaluating co-creative    people feel they have "creative superpowers"; instead a
systems is still an open research question and there is no   system should support people to actually build "creative
standard metric that can be used across specific systems."   superpowers".
[16].

                                                             7. Acknowledgements
6. Conclusion
                                                            This work was supported by the Engineering and Phys-
Artistic creativity is a process, in which an initial im- ical Sciences Research Council. The author would also
provisational phase is followed by a period of focused like to acknowledge the support of Swansea Council.
re-evaluation and revision [20]. Spontaneous improvisa-
tion is a complex cognitive process that shares features
with what has been characterized as a ‘flow’ state [1, 20]. References
Much current work on co-creative settings focuses on the
                                                             [1] M. Csikszentmihalyi, Creativity : the psychology of
role of the system as a generator that augments what peo-
                                                                 discovery and invention, Harper Perennial Modern
ple can achieve in creative tasks [9]. There are problems
                                                                 Classics, 2013.
with this such aligning the system capabilities and user
                                                             [2] M. Boden, Creativity in a nutshell, Think 5 (2009)
expectations, language model bias, system interpretabil-
                                                                 83–96. doi:10.1017/S147717560000230X.
ity, and user interaction design [8, 22, 74]. Studies have
                                                             [3] J. P. Guilford, The nature of human intelligence.
found that different mental expectation of users affects
                                                                 (1967).
their strategies and perception of the system role in the
                                                             [4] M. A. Boden, The creative mind : myths and mech-
co-writing process [9, 74].
                                                                 anisms, Routledge, 2005.
   This position paper explored the recent background
                                                             [5] A. Calderwood, V. Qiu, K. Gero, L. B. Chilton,
to co-creative writing systems, with poetry as a use case.
                                                                 How novelists use generative language mod-
Poetry was defined as including song lyrics for which
                                                                 els: An exploratory user study,           in: HAI-
the paper argued that rap was the most relevant genre.
                                                                 GEN+user2agent@IUI, 2020.
The paper then proposed a system that, as far as the
                                                             [6] M. Henderson, R. Al-Rfou, B. Strope, Y.-h. Sung,
author is aware, has novel features relative to the state
                                                                 L. Lukacs, R. Guo, S. Kumar, B. Miklos, R. Kurzweil,
of the art. The system and how it could be evaluated
                                                                 Efficient natural language response suggestion for
and implemented were then described. Importantly, the
                                                                 smart reply, arXiv.org (2017). URL: https://arxiv.
design includes recommendations for user activities ex-
                                                                 org/abs/1705.00652. doi:10.48550/arXiv.1705.
ternal to the system. The rationale for this is that the
                                                                 00652.
system priority is to help the human user to develop
                                                             [7] H. Gonçalo Oliveira, T. Mendes, A. Boavida, Co-
an artistic style rather than to create text on the users
                                                                 poetryme: a co-creative interface for the compo-
behalf. Issues around the mitigating some system bias
                                                                 sition of poetry, Proceedings of the 10th Interna-
using rap lyrics was also discussed. Future work could
                                                                 tional Conference on Natural Language Generation
include more detailed analysis of evaluation methods as
                                                                 (2017). URL: https://aclanthology.org/W17-3508/.
well as how these could be delivered internally to the sys-
                                                                 doi:10.18653/v1/w17-3508.
tem. Further work on user interface design is also a topic
                                                             [8] F. Lehmann, N. Markert, H. Dang, D. Buschek,
to develop. Additionally, the implementation proposal
                                                                 Suggestion lists vs. continuous generation: Inter-
is high level and constraints such as latency, database
                                                                 action design for writing with generative models
design, and other factors have not been considered. In
                                                                 on mobile devices affect text length, wording and
order to build a viable prototype, software architecture
                                                                 perceived authorship, in: Proceedings of Men-
would most likely form the next stage of the research.
                                                                 sch Und Computer 2022, MuC ’22, Association for
Finally, to revisit the title of the paper: why build a co-
                                                                 Computing Machinery, New York, NY, USA, 2022,
creative poetry system that makes people feel that they
                                                                 p. 192–208. URL: https://doi.org/10.1145/3543758.
have “creative superpowers”? Studies demonstrate that
                                                                 3543947. doi:10.1145/3543758.3543947.
poetry is an emotional capable of engaging the brain’s
                                                             [9] K. Arnold, A. Volzer, N. Madrid, Generative mod-
areas of primary reward [75]. It is a form of communi-
                                                                 els can help writers without writing for them,
cation that has existed throughout human and across
                                                                 in: Joint Proceedings of the ACM IUI 2021 Work-
cultures. In modern society, poetry has become a central
     shops, 2021. URL: https://ceur-ws.org/Vol-2903/             dataset of diverse text for language modeling,
     IUI21WS-HAIGEN-1.pdf.                                       arXiv.org (2020). URL: https://arxiv.org/abs/2101.
[10] A. Ploin, R. Eynon, I. Hjorth, M. A. Osborne, Ai and        00027. doi:10.48550/arXiv.2101.00027.
     the arts: How machine learning is changing artistic    [20] S. Liu, H. M. Chow, Y. Xu, M. G. Erkkinen, K. E.
     work. report from the creative algorithmic intelli-         Swett, M. W. Eagle, D. A. Rizik-Baer, A. R. Braun,
     gence research project, 2022. URL: https://www.oii.         Neural correlates of lyrical improvisation: An fmri
     ox.ac.uk/news-events/reports/ai-the-arts/.                  study of freestyle rap, Scientific Reports 2 (2012).
[11] B. Shneiderman, Design lessons from ai’s two grand          URL: https://www.nature.com/articles/srep00834.
     goals: Human emulation and useful applications,             doi:10.1038/srep00834.
     IEEE Transactions on Technology and Society 1          [21] W. Zhang, Z. Sjoerds, B. Hommel, Metacontrol of
     (2020) 73–82. doi:10.1109/tts.2020.2992669.                 human creativity: The neurocognitive mechanisms
[12] B. Shneiderman, Creativity support tools, Commu-            of convergent and divergent thinking, NeuroImage
     nications of the ACM 45 (2002) 116–120.                     210 (2020).
[13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei,       [22] D. Buschek, L. Mecke, F. Lehmann, H. Dang, Nine
     I. Sutskever, Language models are unsupervised              potential pitfalls when designing human-ai co-
     multitask learners, 2019. URL: https://cdn.openai.          creative systems, 2021. URL: https://arxiv.org/abs/
     com/better-language-models/language_models_                 2104.00358. doi:10.48550/ARXIV.2104.00358.
     are_unsupervised_multitask_learners.pdf.               [23] T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić,
[14] S. Black, S. Biderman, E. Hallahan, Q. Anthony,             D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon,
     L. Gao, L. Golding, H. He, C. Leahy, K. McDonell,           M. Gallé, J. Tow, A. M. Rush, S. Biderman, A. Web-
     J. Phang, M. Pieler, U. S. Prashanth, S. Purohit,           son, P. S. Ammanamanchi, T. Wang, B. Sagot,
     L. Reynolds, J. Tow, B. Wang, S. Weinbach, Gpt-             N. Muennighoff, d. Moral, O. Ruwase, R. Baw-
     neox-20b: An open-source autoregressive language            den, S. Bekman, A. McMillan-Major, I. Beltagy,
     model, arXiv.org (2022). URL: https://arxiv.org/abs/        H. Nguyen, L. Saulnier, S. Tan, P. O. Suarez, V. Sanh,
     2204.06745. doi:10.48550/arXiv.2204.06745.                  H. Laurençon, Y. Jernite, J. Launay, M. Mitchell,
[15] T. B. Brown, B. Mann, N. Ryder, M. Subbiah,                 C. Raffel, A. Gokaslan, A. Simhi, A. Soroa, A. F.
     J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,           Aji, A. Alfassy, A. A. Rogers, A. K. Nitzav, C. Xu,
     G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,          C. Mou, C. Emezue, C. Klamm, C. Leong, v. Strien,
     G. Krueger, T. Henighan, R. Child, A. Ramesh,               D. I. Adelani, D. Radev, E. G. Ponferrada, E. Lev-
     D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen,         kovizh, E. Kim, E. B. Natan, D. Toni, G. Dupont,
     E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark,          G. Kruszewski, G. Pistilli, H. Elsahar, H. Benyam-
     C. Berner, S. McCandlish, A. Radford, I. Sutskever,         ina, H. Tran, I. Yu, I. Abdulmumin, I. Johnson,
     D. Amodei, Language models are few-shot learners,           I. Gonzalez-Dios, R. Javier, J. Chim, J. Dodge, J. Zhu,
     2020. URL: https://arxiv.org/abs/2005.14165.                J. Chang, J. Frohberg, J. Tobing, J. Bhattacharjee,
[16] P. Karimi, K. Grace, M. L. Maher, N. Davis, Evaluat-        K. Almubarak, K. Chen, K. Lo, V. Werra, L. Weber,
     ing creativity in computational co-creative systems,        L. Phan, L. B. allal, L. Tanguy, M. Dey, M. R. Muñoz,
     CoRR abs/1807.09886 (2018). URL: http://arxiv.org/          M. Masoud, M. Grandury, M. Šaško, M. Huang,
     abs/1807.09886. arXiv:1807.09886.                           M. Coavoux, M. Singh, M. T.-J. Jiang, M. C. Vu,
[17] E. M. Bender, T. Gebru, A. McMillan-Major,                  M. A. Jauhar, M. Ghaleb, N. Subramani, N. Kass-
     S. Shmitchell, On the dangers of stochastic par-            ner, N. Khamis, O. Nguyen, O. Espejel, d. Gibert,
     rots: Can language models be too big? , in: Pro-            P. Villegas, P. Henderson, P. Colombo, P. Amuok,
     ceedings of the 2021 ACM Conference on Fairness,            Q. Lhoest, R. Harliman, R. Bommasani, R. L. López,
     Accountability, and Transparency, FAccT ’21, As-            R. Ribeiro, S. Osei, S. Pyysalo, S. Nagel, S. Bose,
     sociation for Computing Machinery, New York,                S. H. Muhammad, S. Sharma, S. Longpre, S. Nikpoor,
     NY, USA, 2021, p. 610–623. URL: https://doi.org/            S. Silberberg, S. Pai, S. Zink, T. T. Torrent, T. Schick,
     10.1145/3442188.3445922. doi:10.1145/3442188.               T. Thrush, V. Danchev, V. Nikoulina, V. Laippala,
     3445922.                                                    V. Lepercq, V. Prabhu, Z. Alyafeai, Z. Talat, A. Raja,
[18] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown,         B. Heinzerling, C. Si, E. Salesky, S. J. Mielke, W. Y.
     B. Chess, R. Child, S. Gray, A. Radford, J. Wu,             Lee, A. Sharma, A. Santilli, A. Chaffin, A. Stiegler,
     D. Amodei, Scaling laws for neural language                 D. Datta, E. Szczechla, G. Chhablani, H. Wang,
     models, CoRR abs/2001.08361 (2020). URL: https:             H. Pandey, H. Strobelt, J. A. Fries, J. Rozen, L. Gao,
     //arxiv.org/abs/2001.08361. arXiv:2001.08361.               L. Sutawika, B. M. Saiful, M. S. Al-shaibani, M. Man-
[19] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe,        ica, N. Nayak, R. Teehan, S. Albanie, S. Shen, S. Ben-
     C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima,         David, S. H. Bach, T. Kim, T. Bers, T. Fevry, T. Neeraj,
     S. Presser, C. Leahy,        The pile: An 800gb             U. Thakker, V. Raunak, X. Tang, Z.-X. Yong,
Z. Sun, S. Brody, Y. Uri, H. Tojarieh, A. Roberts,       [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
H. W. Chung, J. Tae, J. Phang, O. Press, C. Li,               L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At-
D. Narayanan, H. Bourfoune, J. Casper, J. Rasley,             tention is all you need, arXiv.org (2017). URL: https:
M. Ryabinin, M. Mishra, M. Zhang, M. Shoeybi,                 //arxiv.org/abs/1706.03762. doi:10.48550/arXiv.
M. Peyrounette, N. Patry, N. Tazi, O. S Sanseviero,           1706.03762.
v. Platen, P. Cornette, P. F. Lavallée, R. Lacroix,      [25] N. L. Hadaway, S. M. Vardell, T. A. Young,
S. Rajbhandari, S. Gandhi, S. Smith, S. Requena,              Scaffolding      oral     language      development
S. Patil, T. Dettmers, A. Baruwa, A. Singh, A. Chevel-        through poetry for students learning en-
eva, A.-L. Ligozat, A. Subramonian, A. Névéol,                glish, The Reading Teacher 54 (2001) 796–796.
C. Lovering, D. Garrette, D. Tunuguntla, E. Re-               URL:        https://go.gale.com/ps/i.do?id=GALE%
iter, E. Taktasheva, E. Voloshina, E. Bogdanov, G. I.         7CA75085276&sid=googleScholar&v=2.1&it=r&
Winata, H. Schoelkopf, J.-C. Kalo, J. Novikova,               linkaccess=abs&issn=00340561&p=AONE&sw=
J. Z. Forde, J. Clive, J. Kasai, K. Kawamura,                 w&userGroupName=anon%7E20f961a3.
L. Hazan, M. Carpuat, M. Clinciu, N. Kim, N. Cheng,      [26] A. Bradley, Book of rhymes : the poetics of hip hop,
O. Serikov, O. Antverg, v. , R. Zhang, R. Zhang,              Basic Civitas, 2017.
S. Gehrmann, S. Pais, T. Shavrina, T. Scialom, T. Yun,   [27] I. Alonso, L. Davachi, R. Valabrègue, V. Lam-
T. Limisiewicz, V. Rieser, V. Protasov, V. Mikhailov,         brecq, S. Dupont, S. Samson, Neural correlates
Y. Pruksachatkun, Y. Belinkov, Z. Bamberger, Z. Kas-          of binding lyrics and melodies for the encoding
ner, A. Rueda, A. Pestana, A. Feizpour, A. Khan,              of new songs, NeuroImage 127 (2016) 333–345.
A. Faranak, A. Santos, A. Hevia, A. Unldreaj,                 URL: https://pubmed.ncbi.nlm.nih.gov/26706449/.
A. Aghagol, A. Abdollahi, A. Tammour, A. HajiHos-             doi:10.1016/j.neuroimage.2015.12.018.
seini, B. Behroozi, B. Ajibade, B. Saxena, C. M. Fer-    [28] H. G. Oliveira, A rest service for po-
randis, D. Contractor, D. Lansky, D. David, D. Kiela,         etry     generation,        2017.     URL:      https:
D. A. Nguyen, E. Tan, E. Baylor, E. Ozoani, F. Mirza,         //www.semanticscholar.org/paper/
F. Ononiwu, H. Rezanejad, H. Jones, I. Bhattacharya,          A-REST-Service-for-Poetry-Generation-Oliveira/
I. Solaiman, I. Sedenko, I. Nejadgholi, J. Passmore,          5b0039186ddb41ad5d037e5dbacfae837eaa5079.
J. Seltzer, J. B. Sanz, K. Fort, L. Dutra, M. Sama-      [29] H. G. Oliveira, Poetryme : a versatile plat-
gaio, M. Elbadri, M. Mieskes, M. Gerchick, M. Akin-           form for poetry generation, 2012. URL: https:
lolu, M. McKenna, M. Qiu, M. Ghauri, M. Burynok,              //www.semanticscholar.org/paper/PoeTryMe-%
N. Abrar, N. Rajani, N. Elkott, N. Fahmy, O. Samuel,          3A-a-versatile-platform-for-poetry-Oliveira/
R. An, R. Kromann, R. Hao, S. Alizadeh, S. Shub-              0c62affa157a453e01514042b55babff428928fa.
ber, S. Wang, S. Roy, S. Viguier, T. Le, T. Oyebade,     [30] X. Zhang, M. Lapata, Chinese poetry generation
T. Le, Y. Yang, Z. Nguyen, A. R. Kashyap, A. Palas-           with recurrent neural networks, Proceedings of
ciano, A. Callahan, A. Shukla, A. Miranda-Escalada,           the 2014 Conference on Empirical Methods in Nat-
A. Singh, B. Beilharz, B. Wang, C. Brito, C. Zhou,            ural Language Processing (EMNLP) (2014). URL:
C. Jain, C. Xu, C. Fourrier, D. L. Periñán, D. Molano,        https://aclanthology.org/D14-1074/. doi:10.3115/
D. Yu, E. Manjavacas, F. Barth, F. Fuhrimann, G. Al-          v1/d14-1074.
tay, G. Bayrak, G. Burns, H. U. Vrabec, I. Bello,        [31] T. Van de Cruys, Automatic poetry generation
I. Dash, J. Kang, J. Giorgi, J. Golde, J. D. Posada,          from prosaic text, Proceedings of the 58th An-
K. R. Sivaraman, L. Bulchandani, L. Liu, L. Shinzato,         nual Meeting of the Association for Computa-
M. Hahn, M. Takeuchi, M. Pàmies, M. A. Castillo,              tional Linguistics (2020). URL: https://aclanthology.
M. Nezhurina, M. Sänger, M. Samwald, M. Cul-                  org/2020.acl-main.223/. doi:10.18653/v1/2020.
lan, M. Weinberg, D. Wolf, M. Mihaljcic, M. Liu,              acl-main.223.
M. Freidank, M. Kang, N. Seelam, N. Dahlberg, N. M.      [32] J. H. Lau, T. Cohn, T. Baldwin, J. Brooke, A. Ham-
Broad, N. Muellner, P. Fung, P. Haller, R. Chan-              mond, Deep-speare: A joint neural model of poetic
drasekhar, R. Eisenberg, R. Martin, R. Canalli, R. Su,        language, meter and rhyme, Proceedings of the 56th
R. Su, S. Cahyawijaya, S. Garda, S. S. Deshmukh,              Annual Meeting of the Association for Computa-
S. Mishra, S. Kiblawi, S. Ott, S. Sang-aroonsiri,             tional Linguistics (Volume 1: Long Papers) (2018).
S. Kumar, S. Schweter, S. Bharati, T. Laud, T. Gi-            URL: https://aclanthology.org/P18-1181/. doi:10.
gant, T. Kainuma, W. Kusa, Y. Labrak, Y. S. Ba-               18653/v1/p18-1181.
jaj, Y. Venkatraman, Y. Xu, Y. Xu, Y. Xu, Z. Tan,        [33] Google, Verse by verse, 2022. URL: https://sites.
Z. Xie, Z. Ye, M. Bras, Y. Belkada, T. Wolf, Bloom: A         research.google/versebyverse/.
176b-parameter open-access multilingual language         [34] D. Uthus, M. Voitovich, R. Mical, Augmenting po-
model, arXiv.org (2022). URL: https://arxiv.org/abs/          etry composition with verse by verse, 2022. doi:10.
2211.05100. doi:10.48550/arXiv.2211.05100.                    18653/v1/2022.naacl-industry.3.
[35] WriteExpress, Rhymer, 2023. URL: https://www.            [49] Z. Hu, R. K.-W. Lee, C. C. Aggarwal, A. Zhang, Text
     rhymer.com/.                                                  style transfer: A review and experimental evalua-
[36] Datamuse, Rhymezone rhyming dictionary and the-               tion (2020). URL: https://arxiv.org/abs/2010.12742.
     saurus, 2023. URL: https://www.rhymezone.com/.                doi:10.48550/ARXIV.2010.12742.
[37] Rytr, Rytr - best ai writer, content generator writing   [50] R. Roberts, Kendrick lamar’s pulitzer prize
     assistant, 2022. URL: https://rytr.me/.                       sparks lively — and at times snobby — conver-
[38] M. A. Runco, Divergent thinking, creativity, and              sations on the aesthetics of music, 2018. URL:
     ideation. (2010).                                             https://www.latimes.com/entertainment/music/
[39] C. Lewis, P. J. Lovatt, Breaking away from set                la-et-ms-kendrick-pulitzer-reactions-20180420-story.
     patterns of thinking: Improvisation and divergent             html.
     thinking, Thinking Skills and Creativity 9 (2013)        [51] OpenAI, Openai api, 2021. URL: https://openai.com/
     46–58.                                                        api/.
[40] M. A. Runco, S. Acar, Divergent thinking as an           [52] Amazon, Alexatm 20b is now available in amazon
     indicator of creative potential, Creativity research          sagemaker jumpstart | amazon web services, 2022.
     journal 24 (2012) 66–75.                                      URL: https://tinyurl.com/amazonGPT.
[41] A. Cropley,        In praise of convergent think-        [53] HuggingFace,         Gpt-neox,         2022.   URL:
     ing, Creativity Research Journal - CREATIV-                   https://huggingface.co/docs/transformers/main/
     ITY RES J 18 (2006) 391–404. doi:10.1207/                     en/model_doc/gpt_neox#overview.
     s15326934crj1803_13.                                     [54] A. Komatsuzaki, Current limitations of lan-
[42] A. T. Landau, C. J. Limb, The neuroscience of im-             guage models: What you need is retrieval, 2020.
     provisation, Music Educators Journal 103 (2017) 27–           URL: https://www.researchgate.net/publication/
     33. URL: https://doi.org/10.1177/0027432116687373.            344261335_Current_Limitations_of_Language_
     doi:10.1177/0027432116687373.                                 Models_What_You_Need_is_Retrieval.
[43] Studying the Impact of AI-based Inspiration on           [55] F. Hill, K. Yuan, How instagram saved po-
     Human Ideation in a Co-Creative Design Sys-                   etry: Social media is turning an art form
     tem, 2021. URL: https://ceur-ws.org/Vol-2903/                 into an industry, 2018. URL: https://www.
     IUI21WS-HAIGEN-7.pdf.                                         theatlantic.com/technology/archive/2018/10/
[44] B. Shneiderman, Human-Centered AI, Oxford Uni-                rupi-kaur-instagram-poet-entrepreneur/572746/.
     versity Press, 2022.                                     [56] H. Oliver, Instagram is the future of po-
[45] A. Joshi, S. Kale, S. Chandel, D. Pal,            Lik-        etry, 2021. URL: https://unherd.com/2021/10/
     ert scale: Explored and explained,             British        instagram-is-the-future-of-poetry/.
     Journal of Applied Science             Technology 7      [57] M. Schmidt, Lives of the Poets, Phoenix, 1999.
     (2015) 396–403. URL: https://eclass.aspete.              [58] E. Sheng, D. C. Uthus, Investigating societal biases
     gr/modules/document/file.php/EPPAIK269/                       in a poetry composition system, ACL Anthology
     5a7cc366dd963113c6923ac4a73c3286ab22.pdf.                     (2020) 93–106. URL: https://aclanthology.org/2020.
     doi:10.9734/bjast/2015/14975.                                 gebnlp-1.9/.
[46] J. A. Olson, J. Nahas, D. Chmoulevitch, S. J. Crop-      [59] P.-S. Huang, H. Zhang, R. Jiang, R. Stanforth,
     per, M. E. Webb, Naming unrelated words predicts              J. Welbl, J. Rae, V. Maini, D. Yogatama, P. Kohli,
     creativity, Proceedings of the National Academy               Reducing sentiment bias in language models via
     of Sciences 118 (2021). URL: https://www.pnas.org/            counterfactual evaluation, 2019. URL: https://arxiv.
     content/118/25/e2022340118. doi:10.1073/pnas.                 org/abs/1911.03064. doi:10.48550/ARXIV.1911.
     2022340118.                                                   03064.
[47] J. Ocumpaugh, M. Mercedes, T. Rodrigo,                   [60] A. K., M. P. Gangan, D. P., L. V. L., Towards an En-
     K. Porayska-Pomsta, I. Olatunji, R. Luckin,                   hanced Understanding of Bias in Pre-trained Neural
     Becoming better versed: Towards the design                    Language Models: A Survey with Special Emphasis
     of a popular music-based rhyming game for                     on Affective Bias, Springer Nature, Singapore, 2022.
     disadvantaged youths, Proceedings of the 26th In-        [61] J. Lynch, Hip-hop passes rock to become most pop-
     ternational Conference on Computers in Education.             ular music genre for first time in history: Nielsen,
     Philippines: Asia-Pacific Society for Computers               2018. URL: https://www.businessinsider.com/
     in Education (2018). URL: https://apsce.net/icce/             hip-hop-passes-rock-most-popular-music-genre-nielsen-2018-1?
     icce2018/wp-content/uploads/2018/12/C6-04.pdf.                r=US&IR=T.
[48] H. Hirjee, D. Brown, Using automated rhyme de-           [62] A. Texas, Hip-hop is the most listened to genre in
     tection to characterize rhyming style in rap mu-              the world, 2015. URL: https://www.nme.com/news/
     sic, Empirical Musicology Review 5 (2010) 121–145.            music/various-artists-1151-1214849.
     doi:10.18061/1811/48548.                                 [63] Wikipedia, Hip hop, 2021. URL: https:
     //en.wikipedia.org/wiki/Hip_hop.
[64] T. Ingham, Nearly a third of all streams in
     the us last year were of hip-hop and rnb
     artists as rock beat pop to second, 2021. URL:
     https://www.musicbusinessworldwide.com/
     nearly-a-third-of-all-streams-in-the-us-last-year-were-of-hip-hop-and-rb-music/.
[65] E. Malmi, P. Takala, H. Toivonen, R. Tapani, A. Gio-
     nis, Dopelearning: A computational approach to
     rap lyrics generation *, KDD ’16: Proceedings of
     the 22nd ACM SIGKDD International Conference
     on Knowledge Discovery and Data Mining (2016).
     doi:10.1145/2939672.2939679.
[66] J. Eastwood, E. Hinton, We wrote an algorithm to
     unravel the rhymes of hit musical ‘hamilton’, 2016.
     URL: http://graphics.wsj.com/hamilton/.
[67] C. D, Fight the power: How hip hop changed
     the world, ???? URL: https://www.bbc.co.uk/
     programmes/p0dj70yd.
[68] N. Condit-Schultz, MCFlow: A Digital Corpus
     of Rap Flow, Ph.D. thesis, 2016. URL: https://etd.
     ohiolink.edu/apexprod/rws_etd/send_file/send?
     accession=osu1461250949&disposition=inline.
[69] J. Eastwood, E. Hinton, How wsj used an algorithm
     to analyze ‘hamilton’ the musical, 2016. URL: http:
     //graphics.wsj.com/hamilton-methodology/.
[70] A Small-Data Mindset for Generative AI Creative
     Work, 2022.
[71] Musixmatch developer api, 2023. URL: https://
     developer.musixmatch.com/.
[72] S. Presser, Gpt-2 neural network poetry, 2019. URL:
     https://www.gwern.net/GPT-2.
[73] S. Mcgregor, K. Agres, M. Purver, G. Wiggins, From
     distributional semantics to conceptual spaces: A
     novel computational method for concept creation,
     Journal of Artificial General Intelligence 6 (2015)
     55–86. doi:10.1515/jagi-2015-0004.
[74] D. Yang, Y. Zhou, Z. Zhang, T. Jia, J. Li, R. Lc, Ai as an
     activewriter: Interaction strategies with generated
     text in human-ai collaborative fiction writing, 2019.
     URL: https://ceur-ws.org/Vol-3124/paper6.pdf.
[75] E. Wassiliwizky, S. Koelsch, V. Wagner, T. Jacobsen,
     W. Menninghaus, The emotional power of poetry:
     neural circuitry, psychophysiology and composi-
     tional principles, Social Cognitive and Affective
     Neuroscience 12 (2017) 1229–1240. doi:10.1093/
     scan/nsx069.