Why try to build try to build a co-creative poetry system that makes people feel that they have “creative superpowers”?⋆ Ibukun Olatunji* Computational Foundry, Swansea University, Crymlyn Burrows, Skewen, Swansea, United Kingdom, SA1 8EN Abstract The paper examines co-creative writing systems, and argues that existing Large Language Models could potentially reduce human capacity. Furthermore, existing sociocultural inequalities might be exacerbated by the widespread adoption of such generative systems. The paper instead suggests a custom approach, using co-creative poetry writing as an example. The system has architectural changes from typical language models to better support poetry. It also uses rap lyrics as part of the training data in order to help reduce sociocultural bias. A high level system implementation is proposed along with some evaluation methods. Evaluation is based on expert judgement on final outputs, and user performance on language tasks associated with human creativity. The final section of the paper explores how and why alternatives to existing co-creative systems could benefit individual users as well as wider society. Keywords Creativity, poetry, co-creativity, natural language processing, language models, writing support tools, data sets, 1. Introduction discussion of the social and cultural limitations of current generative systems. It expands on section one in explor- This paper examines co-creative systems using poetry ing bias and proposes a mitigation through the use of rap writing as an example. Within the paper ’poetry’ includes lyrics. Section five describes the theoretical and practical song lyrics. Section one of the paper explores poetry in limitations of the paper as well as future work. Section terms of human creativity. Poetry is chosen as it is a six provides a summary of the paper’s contribution. The creative task that non-expert humans can outperform section ends with answers to the question: why try to machines on vs creative outputs such as image genera- build try to build a co-creative poetry system that makes tion. After introducing the case for poetry, there is an people feel that they have “creative superpowers”? exploration of recent work in generative computational systems. As well as being the technical state of the art, these systems provide a conceptual framework to explore 1.1. Human Creativity sociocultural issues such as bias and inclusion. Section Human creativity is the ability to come up with ideas or one then explores a range of poetry-specific systems and artefacts that are new, surprising, and valuable. Rather ends with a more detailed case study. The case study than a solitary act, it results from the interaction of social examines a system that combines elements of more pow- elements; a culture that contains symbolic rules, a person erful general models and custom architectural features who brings novelty into the symbolic domain, and peo- specific to poetry writing. Section two details the eval- ple who recognise and validate the innovation. [1, 2, 3]. uation issues and methods that might be employed for Boden makes a further distinction between psychological the proposed co-creative system. The emphasis on this and historical creativity (P-creativity and H-creativity). section is on how to evaluate human improvement over P-creativity involves coming up with an idea that’s new time. Section three explores a high level implementation to the person who comes up with it. H-creativity means of the system. It builds on the evaluation to propose that (so far as we know) no-one else has had it before: both an architecture and a method to testing if the pro- it has arisen for the first time in human history [2, 4]. posed system has, in principal, any benefits over and Machine learning models have the potential to support above those described in section one. Section four is a human creativity [5, 6, 7]. However, questions remain on their design and influence in augmenting human capacity Joint Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, as opposed to reducing it [8, 9, 10]. Shneiderman sug- Australia ⋆ Ibukun Olatunji. 2023. Why try to build a co-creative poetry system gests that "researchers’ goals shape the questions they that makes people feel that they have “creative superpowers”? raise, collaborators they choose, methods they use, and In Joint Proceedings of the ACM IUI 2023 Workshops. Sydney, outcomes of their work."[11]. This leads to the question: Australia, 13 pages. how can designers of programming interfaces, interactive * Corresponding author. tools, and rich social environments enable more people $ i.o.olatunji.2030349@swansea.ac.uk (I. Olatunji) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License to be more creative more often? [12] Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Language Model Characteristics Table 1 Summary of State-of-the-Art Language Models by Size, Model Type and Ownership Model Name Parameters Model Type Owner BERT 110 - 340 million Transformer Google GPT-2 1.5 billion Transformer OpenAI LaMDA 137 billion Transformer Google GPT-3 175 billion Transformer OpenAI ChatGPT/InstructGPT 175 billion Transformer OpenAI BLOOM 176 billion Transformer BLOOM Project Megatron-Turing NLG 530 billion Transformer Microsoft and NVIDIA PaLM 540 billion Transformer Google GLaM 1 trillion Mixture of Experts Google 1.2. Computational Systems 1.3. General Purpose Language In computational terms, automated systems are now Generation capable of writing poetry approaching human levels LLMs are trained to predict the next word, or series of [13, 14, 15]. Karimi et al consider three three main strate- words, in a a text sequence. They model text corpora as gies by which the role of humans in creative systems can probability distributions. Users write a short text prompts be characterized: fully autonomous systems, creativity to tell the system what to generate. Depending on how support tools, and co-creative systems [16]. Although many examples are provided in the text prompt, the sys- the paper is primarily concerned with co-creative sys- tem is referred to as zero-, one-, and few-shot learning tems, it will to blend the categories where necessary. The [13, 15, 17]. Pretrained language models have become reasoning for this is that the human users do not make a cornerstone of modern natural language processing the same distinctions; also, the features and usage are of- (NLP) pipelines because they often produce better per- ten blended in the real-world, e.g an autonomous system formance from smaller quantities of labeled data [23]. that is used by a creator as an input and thus becomes a Within general LLMs, the transformer has established support tool and/or co-creative system [10]. The next sec- itself as best performing on benchmark language process- tion briefly outlines the state of the art in computational ing tests [13, 15, 24]. As well as being able to perform writing systems. tasks such as text summarising and question answer- Language models (LMs) refer to systems that are ing, LLMs have the potential to support creative writing trained on string prediction tasks: predicting the like- [6, 8, 9]. Current state-of-the-art LLMs are summarized lihood of a token (character, word or string) given either in table 1. However, despite impressive technical achieve- the preceding context or its surrounding context. Such ments, LLMs have limitations including: (a) models, as systems are unsupervised and when deployed, take text they scale, might eventually run into the limits of any pre- as input, and output scores or string predictions [17]. training objectives; (b) the models are expensive and dif- Large Language Models (LLMs) trained on sufficiently ficult to perform inference on; (c) model decisions are not large and diverse data sets are able to perform well across easily interpretable; (d) the majority of the research com- domains and there is a correlation between model per- munity, and by extension disadvantaged social groups, formance and size [18]. State-of-the-art models are able have been excluded from the development of LLMs as to generate text that approach or surpasses that of some they are proprietary (see table 1) and, (e) most LLMs are humans[13, 14, 15, 19]. The emphasis on some humans primarily trained on English-language text that contains is an important with respect to user characteristics; in data biases [13]. broad terms, humans co-creating poetry can be consid- ered as either inexperienced or advanced users. Research on creative tasks such as improvisation suggests that 1.4. Poetry Specific Language Generation users vary in cognitive processing based in part on their Creating poetry is creative skill that requires extensive experience and skills levels[20, 21]. A well-designed co- vocabulary, phonemic awareness to produce complex creative system should therefore take differences in user rhyme patterns, and general knowledge of enough sub- support needs into account [8, 9, 22]. jects about the world to be able to tell interesting stories about a range of topics [20, 25, 26, 27]. Poetry Creation Systems Table 2 An Overview of Selected Poetry Writing Tools by Type Type Example Key Features Constraints Autonomous ChatGPT Natural language input Plain text output Generates poems and lyrics Customisation by text prompt Autonomous co:here Natural language input Plain text output Generates poems and lyrics High latency Autonomous Rytr UI has song lyric option Uses GPT-3 models Extensive text processing Not trained on song data Support RhymeZone Rhyming dictionary/thesaurus Single word only Generates rhyme suggestions Cannot be used to write text Support Rhymer Rhyming dictionary Single word only Generates range of word types Cannot be used to write text Support Poetry Foundation Poetry archives and tutorials No support for real-time creation Guides user to external resources No user customisation options Co-creativity Poem Generator Customise inputs to create poem Input variables fixed Variety of formal poetic outputs Limited user interaction or feedback Co-creativity DeepBeat Generates and/or suggests lyrics Confusing user interface Displays sources of lyric inspiration Unoriginal output vs GPT-3 models Co-creativity Verse by Verse Suggests stanzas in style of known poets Limited forms of poetry Language model accounts for bias Trained on selected U.S poets 1.5. Overview of Poetry Support Tools features that allow it to operate as both a co-creative and autonomous system [37]. Having looked at the computa- Historically, poetry creation systems tended to built on tional systems, it is instructive to briefly consider poetry the model of the an AI writing a full poem by itself, thus writing from a human perspective. It will help inform writing in a closed system [28, 29, 30, 31]. Early sys- the design of a new poetry writing system. tems tended to be rule-based [32]. More recently, some Writing poetry requires a range of general creative approaches have started to explore human interaction skills that can be framed in terms of divergent and conver- when composing poems[33, 34]. Table 2 provides a broad gent thinking; these are used in varying ways throughout summary of selected systems including autonomous, sup- a multi-stage writing process. For simplicity, the stages port tools and co-creative as defined by Karimi et al [16]. include (a) exploration which is characterised by diver- The category distinction helps frame a range of (human) gent thinking [21, 38, 39, 40]; (b) focused work is uses creative processes and (technology) interactions. It is convergent thinking [21, 41] and, (c) re-drafting. It is also a useful way to consider ways in which the pro- useful in the stages to distinguish between internal and posed system is different to those that currently exist; external co-creation system activities. Internal is when and as importantly, ways in which it is similar. At a high the user interacts with the system in real-time, e.g writing level, the autonomous systems are designed to be able or redrafting text; external is when the user participates to create finished works (sometimes called ’products’ or in activities such as browsing, reading or other things ’artefacts’). The support tools are used as part of the that do not not use the system. The framing of internal creative workflow. For instance, RhymZone or Rhymer and external system activities is based on the reasoning help a user find words that sound similar to those they that; (a) skill: inexperienced users are unlikely to possess might use in a poem [35, 36]. Co-creative systems facili- the improvisational skill required to create full poems in tate humans and computational systems to make shared real-time due to cognitive processing constraints [20, 42]; products. That said, the distinction is not fixed. For (b) speed: users might choose to write poems over mul- example, Rytr, contains text editing, display and other tiple sessions, in this case external system stimuli could inform their own poetic development. have supported the writing; (c) knowledge: advanced writers are usually familiar with a body of existing that informs their work [1] and, (d) process: reflecting and 2. Experiment Design redrafting is an important part of writing . The reflecting Verse by Verse ran comparative evaluations of the system stage often takes place separately to the creation of the against poems written by classic poets. Although the sys- work itself [1, 10]. tem was intended to be used as an interactive co-creator for the human writing a poem, the author’s stated it was 1.6. Case Study: Verse by Verse still worth evaluating how the system could perform on its own in writing a poem given a first line of verse [34]. This approach has been adopted within the proposed sys- tem experimental design, implementation and evaluation. The next subsection explores evaluation prior to looking at implementation. The rationale is that the evaluation is perhaps a harder problem as it involves an intersection of multiple disciplines (e.g. computational sciences, arts, linguistics, and pedagogy). Implementation can mostly be restricted to computational science domains. Figure 1: Google’s Verse by Verse: users select from a range of 2.1. Evaluation Overview US poets and custom design a poem by choosing from features Evaluating co-creative systems is still an open research including the number of syllables per line and the number of question and there is no standard metric for measuring stanzas. computational co-creativity [16, 43]. Karimi et al describe Screenshot from Verse by Verse application by Google the limited research investigating how co-creative sys- tems can be evaluated. They present four questions as a Google Research Verse by Verse is relevant case study way to compare how (existing) co-creative systems eval- as it is arguably the most technically advanced poetry- uate creativity: who is evaluating the creativity, what is specific generative system. As well as using transformer being evaluated, when does evaluation occur ,and how model architecture, it also uses informational retrieval, the evaluation is performed [16]. Calderwood et al point and considers bias within its design. Verse by Verse aug- out that "writers engaged with co-creative systems are ments user poetry composition by offering suggestions looking for creative insight, something not measured by to a user as they compose a poem. The authors of the perplexity or by a language model’s ability to solve the system argue that relative to a creating full poems, "this canonical downstream NLP tasks [5]. For the evaluation is a much more challenging task, as one needs be able of the system proposed to be effective it is insightful to to offer suggestions with minimal latency while meeting restate its goals in more detail. The co-creative poetry constraints of the poem structure and handle the chal- system’s goal is “making people feel that they have “cre- lenges of user input[34]. Figure 1 shows part of the sys- ative superpowers”? To achieve this, the system supports tem’s user interface (for PC). From a user’s point of view, users to create better poetry than they might otherwise the experience is as follows (a) the user selects poet(s) to have done without the system. The terms supports and inform the suggestions; (b) the user designs poem struc- better will be further explored as they form the basis of ture as illustrated in figure 2; (c) the user writes the first evaluation. line of text and, (d) the system offers suggestions in the Augmenting human users is central to HCAI and a style of the poet(s) the user selected earlier. The user can contrast to a closed model that creates on behalf of the then work with, modify or have the system create new user [8, 34, 44]. This point is made in recent work that verses. The Verse by Verse design has an external system refers to pitfalls when designing human-AI co-creative context that, in general , LLMs do not. To some extent, systems, as well as other work which asserts that gener- the system helps poetry writers become better readers. ative models can help writers without writing for them In his work on creativity it was suggested to Csikszentmi- [5, 9, 22]. The arguments these, and similar work, make is halyi that "the only way you become a poet...is because that too much automated creation can be at the expense you’ve read a poem...poetry depends on the whole po- of human users [9, 22]. Adopting this thinking, it is use- etic tradition of the past...you have to decide...out of all ful to evaluate the system and its users independently, that previous poetry, what is most interesting to me?" as well as in combination. This in theory allows (system) [1] Verse by Verse, by making users aware of the work internal and (human) internal and external measurement. of other poets, helps users become readers in order to The end goal here is that human users develops their be given to users (inexperienced and advanced capacity; this could be external to the system, whereby ) with the same constraints as the system in the system as acted as a creative prompt. A description of terms of keywords, topics, character limits etc. how this could work in principle follows. A later section The evaluation for experiment A is by humans describes system implementation. who judge the quality of the poems (which are anonymous) by a Likert scale and free text 2.2. Process and Objectives summary. The system would run a number of experiments with the 2. Hypothesis-B that poetry specific language purpose of establishing which system components most generation customised for a given user could support users to write “better” poetry; in goal terms, outperform vanilla poetry specific generation better is evaluated (a) subjectively by users via a Likert with respect to creating poems. Experiment B: scale [45] and (b) by performance on related tasks such as each system state generates complete poetic the Divergent Action Task, Bridge-the-Associative-Gap text but some states are pretrained to customise Task, or rhyme creation and identification [46, 47] characteristics with respect to given users and The tasks would be completed external to the system. their poetic styles. The evaluation for experiment The goals of the evaluation are to measure to what B is by humans who judge the quality of the extent users are actually improving their poetry writing poems by a Likert scale and free text summary. abilities, and the degree to which any improvement The evaluation is focused on how well the poems is as a result of internal system features. For a user, represent the given users’ individual style. improvement is concerned with "the writer’s goals or their desire to have an individual voice" [9]. With this 3. Hypothesis-C that external recommendations, full as a basis, the evaluation process takes the form of a or part poems, based on given user characteristics number of hypotheses and related experiments, the are supportive with respect to users writing their purpose of which is to explore; (a) how well general poems. Experiment C: for given users generated vs poetry specific language models can write full poetic text inputs, the system state generates (ex- poems; (b) if poetry specific language models can ternal to system) poetic text recommendations better represent individual users style than generalised that the user reads and reflects on before complet- language models; and, (c) the extent to which users ing their poem. The evaluation for experiment benefit when writing poems from system recommenda- C is by humans who judge how well the poem tions. The hypotheses and experiments are concerned recommendations helped them write poems in with poetic text style which describes the ways (an the theme, topic or style they were attempting to author) uses language, including prosody, word choice, achieve. sentence structure and use of figurative language [48, 49]. The approach described provides a sense of how user activities (internal and external) with respect to the sys- A central challenge for the proposed system is that tem can be evaluated. In practice, more fine-grained the development and attainment of an individual poetic evaluation criteria would be required based on further voice is highly subjective. Beyond subjectivity, poetry is research and operational or implementation design; as from a societal perspective often a question of cultural far as possible, a complete system would have an aware- value which over time may well change. In reference ness of all relevant evaluation data including for instance, to Kendrick Lamar’s 2018 Pulitzer Prize, a first for a rap external system reading of poems. At this stage, the album, their administrator of prizes said, "..this is not a evaluation proposed is limited to the extent necessary in genre we’ve seen celebrated before, so that in that sense order to support the explanation of how and why the sys- it’s historical." [50] Furthermore, as Boden states, "...even tem might work. A later section (Limitations and Future in science, values are often elusive and sometimes change- Work) will the explore the limitations as suggest possible able...because values are highly variable, it follows that remedies. many arguments about creativity are rooted in disagree- ments about value. This applies to human activities no less than to computer performance." [2] 3. Proposed Implementation 1. Hypothesis-A that poetry specific language The system would have a number of states that range generation could outperform general language from full automation to text prompts acting as a starting generation with respect to creating poems. point for the user. The support states envisaged are: Experiment A: each system-state generates 1. State-A: general language system implemented as complete poetic texts. The prompts would also standard. 2. State-B: general language system implemented 3.1. Sparse And Dense Network Model with modified architecture to include user gener- The system (figure 2) operates as a Sparse And Dense ated content within training set and/or network Network (SPAD). The name refers to the system being architecture preferences. sparse with respect to user input tokens as compared to 3. State-C: poetry specific system implemented with tokens contained in the LM/LLMs. Against this, the sys- standard architecture. tem is dense in terms of leveraging transformer models 4. State-D: poetry specific system implemented with and their associated attention layers (table 1). The intu- modified architecture to include user generated ition is to use a small amount of personalised user text to content within training set and/or network archi- attempt to customise the output of powerful LMs/LLMs. tecture preferences. This differs from existing approaches in the following The LLM component of the system would use publicly ways. available APIs an, where possible, modify network archi- • State-of-the-Art LLMs form part of the SPAD in tecture directly where possible [51, 52, 53]. In most cases order to help improve the SPADs performance; in (table 1) LLM are closed black box systems as illustrated other words, the LLMs are source of input train- in (figure 3). In part for this reason, ideally a custom ing data and as such multiple LLMs could in the- poetry and lyric language model would be implemented; ory be included in the SPAD architectural design. aside from practicalities (which will be discussed) there is a a technical challenge in that a poetry and lyric LM • A poetry specific LLM (GPT-NeoX) forms part of would be far smaller than a general LLM. Given the re- the design; poetry specific refers to adaptations search on LLM size and performance, a custom poetry to the underlying model architecture in order and lyric LM would in theory therefore under perform that token processing and output is more optimal against state-of- the-art LLMs [18, 54, 15]. In line with a with respect to poetry than prose. An example recent study, which experimented with user experiences of this might be applying additional linguistic of language models, the system could be implemented layers within the network to favour text strings with a combination of JavaScript, React, Python and Flask with syllable frequencies found more regularly [8]. The system would then be deployed as a web appli- in poems than say news articles or web pages. cation for mobile phones. Mobile is preferred to PC on Although architecture is referred to, much of any the basis of its greater reach as a device for both reading benefit at this stage might come from modifying and creating contemporary poetry [55, 56]. the training data and associated recipes. The po- etry specific LLM would also leverage data from the general LLM (for simplicity any interaction between the two elements is not included in fig- ure 2). • Poetry and lyric LM is a custom model whose network architecture and training data is specific to poetry. In practical terms it is not a LLM as the available training data is not likely to be suf- ficiently extensive vs the current state of the art. As well as providing a data contrast to the LLMs, this part of the network will also act as a style Figure 2: SParse And Dense Network Model Elements transfer layer in so far as it identifies and tries to modify input text to create poetic styles. These 1. Text input by user is returned as partially completed poetic styles will be mapped onto user styles upstream text and/or poetic and lyrical recommendations for the user within the system. to consider. 2. User personalised data submitted as poems or lyrics and/or recommendations of favourite artists and their The result of the models described above, is a system work. These are used to create a corpus of user text. Prior that contains information on generalized poetic style examples of user generated text uploaded to system; as well as individual style preference(s) unique for each recommender and/or database search to enhance user text with additional poetic texts (e.g from web crawl) 3. Database user. This allows the system to support users with spe- of poetic texts (and song lyrics) from web crawl. Clean text is cific co-writing tasks (e.g text generation) as well as offer included as well as metadata such as rhyme scheme and personalised recommendations further reading of rele- Parts of Speech (PoS). vant poems and/or poets. In user experience terms, this might be delivered via an interface that allows the user to switch between (a) writing text; (b) editing generated text; and (c) reading and reflecting on specific poetic racist, sexist or ableist) [17]. Studies have how that harms recommendations made by the system. can also exist because of (a) exclusionary social norms At this stage, the proposed mode is high-level. There in language within language. For example, ‘family’ is are open questions relating to issues such as real world often defined as a basic social unit consisting of a mar- implementation, customisation of user text, acquisition ried woman, man and their children; language models of training data and other areas. The penultimate section internalizing such social norms could be highly discrim- will revisit some of the open design questions and attempt inatory towards people outside this definition [60]; (b) to provide answers. The next section explores the soical greater propensity to label of language of marginalized significance of poetry and how the a system design could or underrepresented groups as toxic in hate speech detec- use this to enhance cultural inclusiveness. tion (e.g. the ‘angry black woman’ stereotype) [60]; and (c) over representation of certain groups such as white males 18-34 within widely used training data (e.g Red- 4. Discussion dit posts) [17]. Bender et al assert that, “in the case of US and UK English...white supremacist and misogynistic, An important goal for poetry is for each writer to discover ageist, etc. views are over represented in the training or develop their own unique style, or artistic voice. Part data, not only exceeding their prevalence in the general of a writers development will a result of what poetry they population.” [17]. The authors go on to say that the data have previously been exposed to. Robert Graves stated underpinning LMs stands to “misrepresent social move- that, “only a poet of experience...can hope to put himself ments and disproportionately align with existing regimes in the shoes of his predecessors, or contemporaries, and of power.” judge their poems by recreating technical or emotional There are a number of studies that explore bias mit- dilemmas which they faced while at work on them." [57] igation through computational techniques such as (a) It can be argued that this statement is, in contemporary augmentation of the training data using style transfer terms, biased in gender terms given the assumption of [58] or (b) using counterfactuals to reduce sentiment bias ‘poet’ being male. Graves’s central argument about expe- [59]. However, in their study describing GPT-3 the au- rience however is echoed in recent studies on language thors caution against on over reliance on computational models. A study by Cheng and Uthus made the point solutions. They instead ask for “...more research that en- that “as creative works are often shaped by the lived ex- gages with the literature outside NLP, better articulates periences and timely issues of the creator’s life, a poetry normative statements about harm, and engages with the composition system trained on poems from different au- lived experience of communities affected by NLP sys- thors of different eras may reflect a variety of societal tems...mitigation work should not be approached purely biases." [58] Within computer science, social bias is a sub- with a metric driven objective to ‘remove’ bias...but in ject gathering more research attention [17, 59, 60] How- a holistic manner [15]. For the use case of a poetry co- ever, as well as attempting to mitigate negative impacts creation system, bias could be potentially mitigated by for disadvantaged groups, considering bias also offers including rap lyrics as a key part of the training data set. possibility of designing systems that leverage cultural, poetic and linguistic resources that would otherwise be missed. This can benefit all user groups. The next section 4.2. Towards Culturally Responsive provides a more concrete example. Models Emerging from a hobby of African American youth in the 4.1. Bias in Language Models 1970s, rap (as an element of hip-hop) has quickly evolved It has been recognised and accepted in recent years that into a mainstream culture and is the most popular music LLM used for text generation contain bias [17, 60] A genre in the U.S and many other territories [61, 62, 63, 64]. study by Uthus suggests that “biases in creative language Writing rap lyrics requires both creativity to construct applications are under explored”; it goes on to say it is im- a meaningful, interesting story and lyrical skills to pro- portant to examine biases in these applications because duce complex rhyme patterns [26, 48, 65]; within the they intended for contexts such as self-expression, collec- culture of rap, writers are evaluated by peers on the ba- tive social enjoyment, and education [58]. One of the key sis of their wordplay, linguistic complexity and ability sources of bias in LLM is in the training data sets. LLM to use multiple rhyme types (perfect and imperfect) as retains the biases of the data they have been trained on well as multi-syllabic rhymes [26, 66]. In many ways, [15]. Typically the model’s pick up on, or reflect, biases the writer within the hip-hop tradition sets language and overtly abusive language patterns in training data. puzzles for their audience. In a recent BBC documen- This can lead to harms for some users such as encounter- tary, Chuck D, the founder of Public Enemy remarked ing derogatory language or discriminatory language (e.g. that "poets were always...going to give you everything the truth...that’s very important not only in the realm of hip hop...but in the realm of artistry.” [67] Recent com- • Customizing models for individuals: this is a sys- putational studies have explored rap on account it its tem objective but has not been tested. Technically, complexity and cultural significance [65, 68, 69]. Rap there is a conflict between the scale and perfor- has historically been excluded from most mainstream mance benefits of LLM/LM and the comparatively discussions on co-creative systems and poetry writing. small datasets of individual users. However, as There may well be valid reasons for this such as language Vigliensoni et al argue, working with small-scale appropriateness, perception around negative sentiments, datasets is an overlooked but powerful mecha- offensive content, and difficulties in accessing material nism for enabling greater human influence over under copyright. However, although there are challenges, generative AI systems within in creative con- the benefits of using extensive rap lyrics within LM data texts [70]. The authors describe an experimen- sets include: tal project, ReRites by Johnston which involved fine-tuning GPT-2 on the artists’ custom poetry • Training data that represents wider audience con- corpus to generate poems. An approach such as cerns, thoughts and feelings. this could be taken although clearly using models • Training data will be dynamic and reflect contem- such as GPT-2 (for which source code is avail- porary sociopolitical issues. able) has the limitation of performance vs current • Opens up the possibility of bring voices from ex- state-of-the art LLMs. The personalizing of LLMs cluded communities into the NLP community. to individual users is an open topic that requires • LMs would be enhanced by a linguistically rich further research. and varied source of data.. • Allows lyrics to be part of a wider conversation • Acquiring training data: training data for poetry which potentially generates. new research in- and rap lyrics would not be readily available in sight (for computational, language and social re- the way that the Pile or equivalents are used for searchers). general LLMs [19]. The solution to this would be to source data from scraping the web for lyrics, or Ultimately, as contemporary music’s biggest genre, directly from services such as MusixMatch [71] and the one most concerned with rhyme and wordplay, Poetry training data, much of which will be out there are multiple reasons to explore using rap lyrics as of copyright, can be acquired via sites such as training data. Project Gutenberg and Poetry Foundation. This approach to training data was used in a 2019 ex- periment to create a poetry-specific LLM based 5. Limitations and Future Work on the GPT-2 model [72]. The paper has a number of limitations. Below some of Evaluation: literature on evaluating the creativity in a these are described along with suggested directions for co-creative systems considers a wide number of factors future work. System Design and Implementation: the such who evaluates the creativity (e.g. system itself or hu- paper does not fully explore how the proposed system man users), what is being evaluated (e.g. user interaction could be built. In particular, there are challenges around or output), when does evaluation occur (e.g in real time the following: or at the end of a session) and how the evaluation is per- • Building custom LLMs. One of the design lim- formed (e.g. methods and related metrics) [16]. There is itations is how to effectively experiment with a broad set of metrics for developing computational mod- models of varying degrees of openness (for con- els for evaluating creativity. With respect to the system venience referred to as black, grey and white box). described, the most relevant include a proposed compu- For black box models (e.g GPT-3) there is no way tational model by Agres et al. The model reflects human at present to modify the architecture. What in- conceptualization of musical and poetic creativity [73]. stead might be possible is to fine-tune the model Future work could explore the kind of model described via custom queries over a period of time. So, alongside other linguistic-based metrics such as the Di- what combinations of prompts generate the most vergent Action Task, Bridge-the-Associative-Gap Task, or favourable outputs. Grey box models (BLOOM rhyme creation and identification tasks. [46, 47] Addi- or GPT-NeoX) offer the possibility of powerful tionally, building on machine learning practices, metrics models with open-source training and evaluation could be derived for accuracy in terms of the degree to code plus model weights [53]. However, the costs which generated output matches a reference dataset. For of running and/or adapting these models could example, if the user has a target poetic style, it might be substantial and not something the paper has be possible to computationally determine the extent to explored. which the completed poem was accurate or not. The paper has not explored these kinds of evaluation in de- part of the most popular music genre. Poetry matters to tail and they would form part of future work. Finally, society. By extension, it is worth building system that though the evaluations proposed are limited, they could can help people experience it firsthand and connect with nevertheless contribute to the wider discussion around its traditions. The aim though should not be to make the topic. As Karimi et al assert "evaluating co-creative people feel they have "creative superpowers"; instead a systems is still an open research question and there is no system should support people to actually build "creative standard metric that can be used across specific systems." superpowers". [16]. 7. Acknowledgements 6. Conclusion This work was supported by the Engineering and Phys- Artistic creativity is a process, in which an initial im- ical Sciences Research Council. The author would also provisational phase is followed by a period of focused like to acknowledge the support of Swansea Council. re-evaluation and revision [20]. Spontaneous improvisa- tion is a complex cognitive process that shares features with what has been characterized as a ‘flow’ state [1, 20]. References Much current work on co-creative settings focuses on the [1] M. Csikszentmihalyi, Creativity : the psychology of role of the system as a generator that augments what peo- discovery and invention, Harper Perennial Modern ple can achieve in creative tasks [9]. There are problems Classics, 2013. with this such aligning the system capabilities and user [2] M. Boden, Creativity in a nutshell, Think 5 (2009) expectations, language model bias, system interpretabil- 83–96. doi:10.1017/S147717560000230X. ity, and user interaction design [8, 22, 74]. Studies have [3] J. P. Guilford, The nature of human intelligence. found that different mental expectation of users affects (1967). their strategies and perception of the system role in the [4] M. A. Boden, The creative mind : myths and mech- co-writing process [9, 74]. anisms, Routledge, 2005. This position paper explored the recent background [5] A. Calderwood, V. Qiu, K. Gero, L. B. Chilton, to co-creative writing systems, with poetry as a use case. How novelists use generative language mod- Poetry was defined as including song lyrics for which els: An exploratory user study, in: HAI- the paper argued that rap was the most relevant genre. GEN+user2agent@IUI, 2020. The paper then proposed a system that, as far as the [6] M. Henderson, R. Al-Rfou, B. Strope, Y.-h. Sung, author is aware, has novel features relative to the state L. Lukacs, R. Guo, S. Kumar, B. Miklos, R. Kurzweil, of the art. The system and how it could be evaluated Efficient natural language response suggestion for and implemented were then described. Importantly, the smart reply, arXiv.org (2017). URL: https://arxiv. design includes recommendations for user activities ex- org/abs/1705.00652. doi:10.48550/arXiv.1705. ternal to the system. The rationale for this is that the 00652. system priority is to help the human user to develop [7] H. Gonçalo Oliveira, T. Mendes, A. Boavida, Co- an artistic style rather than to create text on the users poetryme: a co-creative interface for the compo- behalf. Issues around the mitigating some system bias sition of poetry, Proceedings of the 10th Interna- using rap lyrics was also discussed. Future work could tional Conference on Natural Language Generation include more detailed analysis of evaluation methods as (2017). URL: https://aclanthology.org/W17-3508/. well as how these could be delivered internally to the sys- doi:10.18653/v1/w17-3508. tem. Further work on user interface design is also a topic [8] F. Lehmann, N. Markert, H. Dang, D. Buschek, to develop. Additionally, the implementation proposal Suggestion lists vs. continuous generation: Inter- is high level and constraints such as latency, database action design for writing with generative models design, and other factors have not been considered. In on mobile devices affect text length, wording and order to build a viable prototype, software architecture perceived authorship, in: Proceedings of Men- would most likely form the next stage of the research. sch Und Computer 2022, MuC ’22, Association for Finally, to revisit the title of the paper: why build a co- Computing Machinery, New York, NY, USA, 2022, creative poetry system that makes people feel that they p. 192–208. URL: https://doi.org/10.1145/3543758. have “creative superpowers”? Studies demonstrate that 3543947. doi:10.1145/3543758.3543947. poetry is an emotional capable of engaging the brain’s [9] K. Arnold, A. Volzer, N. Madrid, Generative mod- areas of primary reward [75]. It is a form of communi- els can help writers without writing for them, cation that has existed throughout human and across in: Joint Proceedings of the ACM IUI 2021 Work- cultures. In modern society, poetry has become a central shops, 2021. URL: https://ceur-ws.org/Vol-2903/ dataset of diverse text for language modeling, IUI21WS-HAIGEN-1.pdf. arXiv.org (2020). URL: https://arxiv.org/abs/2101. [10] A. Ploin, R. Eynon, I. Hjorth, M. A. Osborne, Ai and 00027. doi:10.48550/arXiv.2101.00027. the arts: How machine learning is changing artistic [20] S. Liu, H. M. Chow, Y. Xu, M. G. Erkkinen, K. E. work. report from the creative algorithmic intelli- Swett, M. W. Eagle, D. A. Rizik-Baer, A. R. Braun, gence research project, 2022. URL: https://www.oii. Neural correlates of lyrical improvisation: An fmri ox.ac.uk/news-events/reports/ai-the-arts/. study of freestyle rap, Scientific Reports 2 (2012). [11] B. Shneiderman, Design lessons from ai’s two grand URL: https://www.nature.com/articles/srep00834. goals: Human emulation and useful applications, doi:10.1038/srep00834. IEEE Transactions on Technology and Society 1 [21] W. Zhang, Z. Sjoerds, B. Hommel, Metacontrol of (2020) 73–82. doi:10.1109/tts.2020.2992669. human creativity: The neurocognitive mechanisms [12] B. Shneiderman, Creativity support tools, Commu- of convergent and divergent thinking, NeuroImage nications of the ACM 45 (2002) 116–120. 210 (2020). [13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, [22] D. Buschek, L. Mecke, F. Lehmann, H. Dang, Nine I. Sutskever, Language models are unsupervised potential pitfalls when designing human-ai co- multitask learners, 2019. URL: https://cdn.openai. creative systems, 2021. URL: https://arxiv.org/abs/ com/better-language-models/language_models_ 2104.00358. doi:10.48550/ARXIV.2104.00358. are_unsupervised_multitask_learners.pdf. [23] T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić, [14] S. Black, S. Biderman, E. Hallahan, Q. Anthony, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, M. Gallé, J. Tow, A. M. Rush, S. Biderman, A. Web- J. Phang, M. Pieler, U. S. Prashanth, S. Purohit, son, P. S. Ammanamanchi, T. Wang, B. Sagot, L. Reynolds, J. Tow, B. Wang, S. Weinbach, Gpt- N. Muennighoff, d. Moral, O. Ruwase, R. Baw- neox-20b: An open-source autoregressive language den, S. Bekman, A. McMillan-Major, I. Beltagy, model, arXiv.org (2022). URL: https://arxiv.org/abs/ H. Nguyen, L. Saulnier, S. Tan, P. O. Suarez, V. Sanh, 2204.06745. doi:10.48550/arXiv.2204.06745. H. Laurençon, Y. Jernite, J. Launay, M. Mitchell, [15] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, C. Raffel, A. Gokaslan, A. Simhi, A. Soroa, A. F. J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, Aji, A. Alfassy, A. A. Rogers, A. K. Nitzav, C. Xu, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, C. Mou, C. Emezue, C. Klamm, C. Leong, v. Strien, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. I. Adelani, D. Radev, E. G. Ponferrada, E. Lev- D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, kovizh, E. Kim, E. B. Natan, D. Toni, G. Dupont, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, G. Kruszewski, G. Pistilli, H. Elsahar, H. Benyam- C. Berner, S. McCandlish, A. Radford, I. Sutskever, ina, H. Tran, I. Yu, I. Abdulmumin, I. Johnson, D. Amodei, Language models are few-shot learners, I. Gonzalez-Dios, R. Javier, J. Chim, J. Dodge, J. Zhu, 2020. URL: https://arxiv.org/abs/2005.14165. J. Chang, J. Frohberg, J. Tobing, J. Bhattacharjee, [16] P. Karimi, K. Grace, M. L. Maher, N. Davis, Evaluat- K. Almubarak, K. Chen, K. Lo, V. Werra, L. Weber, ing creativity in computational co-creative systems, L. Phan, L. B. allal, L. Tanguy, M. Dey, M. R. Muñoz, CoRR abs/1807.09886 (2018). URL: http://arxiv.org/ M. Masoud, M. Grandury, M. Šaško, M. Huang, abs/1807.09886. arXiv:1807.09886. M. Coavoux, M. Singh, M. T.-J. Jiang, M. C. Vu, [17] E. M. Bender, T. Gebru, A. McMillan-Major, M. A. Jauhar, M. Ghaleb, N. Subramani, N. Kass- S. Shmitchell, On the dangers of stochastic par- ner, N. Khamis, O. Nguyen, O. Espejel, d. Gibert, rots: Can language models be too big? , in: Pro- P. Villegas, P. Henderson, P. Colombo, P. Amuok, ceedings of the 2021 ACM Conference on Fairness, Q. Lhoest, R. Harliman, R. Bommasani, R. L. López, Accountability, and Transparency, FAccT ’21, As- R. Ribeiro, S. Osei, S. Pyysalo, S. Nagel, S. Bose, sociation for Computing Machinery, New York, S. H. Muhammad, S. Sharma, S. Longpre, S. Nikpoor, NY, USA, 2021, p. 610–623. URL: https://doi.org/ S. Silberberg, S. Pai, S. Zink, T. T. Torrent, T. Schick, 10.1145/3442188.3445922. doi:10.1145/3442188. T. Thrush, V. Danchev, V. Nikoulina, V. Laippala, 3445922. V. Lepercq, V. Prabhu, Z. Alyafeai, Z. Talat, A. Raja, [18] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Heinzerling, C. Si, E. Salesky, S. J. Mielke, W. Y. B. Chess, R. Child, S. Gray, A. Radford, J. Wu, Lee, A. Sharma, A. Santilli, A. Chaffin, A. Stiegler, D. Amodei, Scaling laws for neural language D. Datta, E. Szczechla, G. Chhablani, H. Wang, models, CoRR abs/2001.08361 (2020). URL: https: H. Pandey, H. Strobelt, J. A. Fries, J. Rozen, L. Gao, //arxiv.org/abs/2001.08361. arXiv:2001.08361. L. Sutawika, B. M. Saiful, M. S. Al-shaibani, M. Man- [19] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, ica, N. Nayak, R. Teehan, S. Albanie, S. Shen, S. Ben- C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, David, S. H. Bach, T. Kim, T. Bers, T. Fevry, T. Neeraj, S. Presser, C. Leahy, The pile: An 800gb U. Thakker, V. Raunak, X. Tang, Z.-X. Yong, Z. Sun, S. Brody, Y. Uri, H. Tojarieh, A. Roberts, [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, H. W. Chung, J. Tae, J. Phang, O. Press, C. Li, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At- D. Narayanan, H. Bourfoune, J. Casper, J. Rasley, tention is all you need, arXiv.org (2017). URL: https: M. Ryabinin, M. Mishra, M. Zhang, M. Shoeybi, //arxiv.org/abs/1706.03762. doi:10.48550/arXiv. M. Peyrounette, N. Patry, N. Tazi, O. S Sanseviero, 1706.03762. v. Platen, P. Cornette, P. F. Lavallée, R. Lacroix, [25] N. L. Hadaway, S. M. Vardell, T. A. Young, S. Rajbhandari, S. Gandhi, S. Smith, S. Requena, Scaffolding oral language development S. Patil, T. Dettmers, A. Baruwa, A. Singh, A. Chevel- through poetry for students learning en- eva, A.-L. Ligozat, A. Subramonian, A. Névéol, glish, The Reading Teacher 54 (2001) 796–796. C. Lovering, D. Garrette, D. Tunuguntla, E. Re- URL: https://go.gale.com/ps/i.do?id=GALE% iter, E. Taktasheva, E. Voloshina, E. Bogdanov, G. I. 7CA75085276&sid=googleScholar&v=2.1&it=r& Winata, H. Schoelkopf, J.-C. Kalo, J. Novikova, linkaccess=abs&issn=00340561&p=AONE&sw= J. Z. Forde, J. Clive, J. Kasai, K. Kawamura, w&userGroupName=anon%7E20f961a3. L. Hazan, M. Carpuat, M. Clinciu, N. Kim, N. Cheng, [26] A. Bradley, Book of rhymes : the poetics of hip hop, O. Serikov, O. Antverg, v. , R. Zhang, R. Zhang, Basic Civitas, 2017. S. Gehrmann, S. Pais, T. Shavrina, T. Scialom, T. Yun, [27] I. Alonso, L. Davachi, R. Valabrègue, V. Lam- T. Limisiewicz, V. Rieser, V. Protasov, V. Mikhailov, brecq, S. Dupont, S. Samson, Neural correlates Y. Pruksachatkun, Y. Belinkov, Z. Bamberger, Z. Kas- of binding lyrics and melodies for the encoding ner, A. Rueda, A. Pestana, A. Feizpour, A. Khan, of new songs, NeuroImage 127 (2016) 333–345. A. Faranak, A. Santos, A. Hevia, A. Unldreaj, URL: https://pubmed.ncbi.nlm.nih.gov/26706449/. A. Aghagol, A. Abdollahi, A. Tammour, A. HajiHos- doi:10.1016/j.neuroimage.2015.12.018. seini, B. Behroozi, B. Ajibade, B. Saxena, C. M. Fer- [28] H. G. Oliveira, A rest service for po- randis, D. Contractor, D. Lansky, D. David, D. Kiela, etry generation, 2017. URL: https: D. A. Nguyen, E. Tan, E. Baylor, E. Ozoani, F. Mirza, //www.semanticscholar.org/paper/ F. Ononiwu, H. Rezanejad, H. Jones, I. Bhattacharya, A-REST-Service-for-Poetry-Generation-Oliveira/ I. Solaiman, I. Sedenko, I. Nejadgholi, J. Passmore, 5b0039186ddb41ad5d037e5dbacfae837eaa5079. J. Seltzer, J. B. Sanz, K. Fort, L. Dutra, M. Sama- [29] H. G. Oliveira, Poetryme : a versatile plat- gaio, M. Elbadri, M. Mieskes, M. Gerchick, M. Akin- form for poetry generation, 2012. URL: https: lolu, M. McKenna, M. Qiu, M. Ghauri, M. Burynok, //www.semanticscholar.org/paper/PoeTryMe-% N. Abrar, N. Rajani, N. Elkott, N. Fahmy, O. Samuel, 3A-a-versatile-platform-for-poetry-Oliveira/ R. An, R. Kromann, R. Hao, S. Alizadeh, S. Shub- 0c62affa157a453e01514042b55babff428928fa. ber, S. Wang, S. Roy, S. Viguier, T. Le, T. Oyebade, [30] X. Zhang, M. Lapata, Chinese poetry generation T. Le, Y. Yang, Z. Nguyen, A. R. Kashyap, A. Palas- with recurrent neural networks, Proceedings of ciano, A. Callahan, A. Shukla, A. Miranda-Escalada, the 2014 Conference on Empirical Methods in Nat- A. Singh, B. Beilharz, B. Wang, C. Brito, C. Zhou, ural Language Processing (EMNLP) (2014). URL: C. Jain, C. Xu, C. Fourrier, D. L. Periñán, D. Molano, https://aclanthology.org/D14-1074/. doi:10.3115/ D. Yu, E. Manjavacas, F. Barth, F. Fuhrimann, G. Al- v1/d14-1074. tay, G. Bayrak, G. Burns, H. U. Vrabec, I. Bello, [31] T. Van de Cruys, Automatic poetry generation I. Dash, J. Kang, J. Giorgi, J. Golde, J. D. Posada, from prosaic text, Proceedings of the 58th An- K. R. Sivaraman, L. Bulchandani, L. Liu, L. Shinzato, nual Meeting of the Association for Computa- M. Hahn, M. Takeuchi, M. Pàmies, M. A. Castillo, tional Linguistics (2020). URL: https://aclanthology. M. Nezhurina, M. Sänger, M. Samwald, M. Cul- org/2020.acl-main.223/. doi:10.18653/v1/2020. lan, M. Weinberg, D. Wolf, M. Mihaljcic, M. Liu, acl-main.223. M. Freidank, M. Kang, N. Seelam, N. Dahlberg, N. M. [32] J. H. Lau, T. Cohn, T. Baldwin, J. Brooke, A. Ham- Broad, N. Muellner, P. Fung, P. Haller, R. Chan- mond, Deep-speare: A joint neural model of poetic drasekhar, R. Eisenberg, R. Martin, R. Canalli, R. Su, language, meter and rhyme, Proceedings of the 56th R. Su, S. Cahyawijaya, S. Garda, S. S. Deshmukh, Annual Meeting of the Association for Computa- S. Mishra, S. Kiblawi, S. Ott, S. Sang-aroonsiri, tional Linguistics (Volume 1: Long Papers) (2018). S. Kumar, S. Schweter, S. Bharati, T. Laud, T. Gi- URL: https://aclanthology.org/P18-1181/. doi:10. gant, T. Kainuma, W. Kusa, Y. Labrak, Y. S. Ba- 18653/v1/p18-1181. jaj, Y. Venkatraman, Y. Xu, Y. Xu, Y. Xu, Z. Tan, [33] Google, Verse by verse, 2022. URL: https://sites. Z. Xie, Z. Ye, M. Bras, Y. Belkada, T. Wolf, Bloom: A research.google/versebyverse/. 176b-parameter open-access multilingual language [34] D. Uthus, M. Voitovich, R. Mical, Augmenting po- model, arXiv.org (2022). URL: https://arxiv.org/abs/ etry composition with verse by verse, 2022. doi:10. 2211.05100. doi:10.48550/arXiv.2211.05100. 18653/v1/2022.naacl-industry.3. [35] WriteExpress, Rhymer, 2023. URL: https://www. [49] Z. Hu, R. K.-W. Lee, C. C. Aggarwal, A. Zhang, Text rhymer.com/. style transfer: A review and experimental evalua- [36] Datamuse, Rhymezone rhyming dictionary and the- tion (2020). URL: https://arxiv.org/abs/2010.12742. saurus, 2023. URL: https://www.rhymezone.com/. doi:10.48550/ARXIV.2010.12742. [37] Rytr, Rytr - best ai writer, content generator writing [50] R. Roberts, Kendrick lamar’s pulitzer prize assistant, 2022. URL: https://rytr.me/. sparks lively — and at times snobby — conver- [38] M. A. Runco, Divergent thinking, creativity, and sations on the aesthetics of music, 2018. URL: ideation. (2010). https://www.latimes.com/entertainment/music/ [39] C. Lewis, P. J. Lovatt, Breaking away from set la-et-ms-kendrick-pulitzer-reactions-20180420-story. patterns of thinking: Improvisation and divergent html. thinking, Thinking Skills and Creativity 9 (2013) [51] OpenAI, Openai api, 2021. URL: https://openai.com/ 46–58. api/. [40] M. A. Runco, S. Acar, Divergent thinking as an [52] Amazon, Alexatm 20b is now available in amazon indicator of creative potential, Creativity research sagemaker jumpstart | amazon web services, 2022. journal 24 (2012) 66–75. URL: https://tinyurl.com/amazonGPT. [41] A. Cropley, In praise of convergent think- [53] HuggingFace, Gpt-neox, 2022. URL: ing, Creativity Research Journal - CREATIV- https://huggingface.co/docs/transformers/main/ ITY RES J 18 (2006) 391–404. doi:10.1207/ en/model_doc/gpt_neox#overview. s15326934crj1803_13. [54] A. Komatsuzaki, Current limitations of lan- [42] A. T. Landau, C. J. Limb, The neuroscience of im- guage models: What you need is retrieval, 2020. provisation, Music Educators Journal 103 (2017) 27– URL: https://www.researchgate.net/publication/ 33. URL: https://doi.org/10.1177/0027432116687373. 344261335_Current_Limitations_of_Language_ doi:10.1177/0027432116687373. Models_What_You_Need_is_Retrieval. [43] Studying the Impact of AI-based Inspiration on [55] F. Hill, K. Yuan, How instagram saved po- Human Ideation in a Co-Creative Design Sys- etry: Social media is turning an art form tem, 2021. URL: https://ceur-ws.org/Vol-2903/ into an industry, 2018. URL: https://www. IUI21WS-HAIGEN-7.pdf. theatlantic.com/technology/archive/2018/10/ [44] B. Shneiderman, Human-Centered AI, Oxford Uni- rupi-kaur-instagram-poet-entrepreneur/572746/. versity Press, 2022. [56] H. Oliver, Instagram is the future of po- [45] A. Joshi, S. Kale, S. Chandel, D. Pal, Lik- etry, 2021. URL: https://unherd.com/2021/10/ ert scale: Explored and explained, British instagram-is-the-future-of-poetry/. Journal of Applied Science Technology 7 [57] M. Schmidt, Lives of the Poets, Phoenix, 1999. (2015) 396–403. URL: https://eclass.aspete. [58] E. Sheng, D. C. Uthus, Investigating societal biases gr/modules/document/file.php/EPPAIK269/ in a poetry composition system, ACL Anthology 5a7cc366dd963113c6923ac4a73c3286ab22.pdf. (2020) 93–106. URL: https://aclanthology.org/2020. doi:10.9734/bjast/2015/14975. gebnlp-1.9/. [46] J. A. Olson, J. Nahas, D. Chmoulevitch, S. J. Crop- [59] P.-S. Huang, H. Zhang, R. Jiang, R. Stanforth, per, M. E. Webb, Naming unrelated words predicts J. Welbl, J. Rae, V. Maini, D. Yogatama, P. Kohli, creativity, Proceedings of the National Academy Reducing sentiment bias in language models via of Sciences 118 (2021). URL: https://www.pnas.org/ counterfactual evaluation, 2019. URL: https://arxiv. content/118/25/e2022340118. doi:10.1073/pnas. org/abs/1911.03064. doi:10.48550/ARXIV.1911. 2022340118. 03064. [47] J. Ocumpaugh, M. Mercedes, T. Rodrigo, [60] A. K., M. P. Gangan, D. P., L. V. L., Towards an En- K. Porayska-Pomsta, I. Olatunji, R. Luckin, hanced Understanding of Bias in Pre-trained Neural Becoming better versed: Towards the design Language Models: A Survey with Special Emphasis of a popular music-based rhyming game for on Affective Bias, Springer Nature, Singapore, 2022. disadvantaged youths, Proceedings of the 26th In- [61] J. Lynch, Hip-hop passes rock to become most pop- ternational Conference on Computers in Education. ular music genre for first time in history: Nielsen, Philippines: Asia-Pacific Society for Computers 2018. URL: https://www.businessinsider.com/ in Education (2018). URL: https://apsce.net/icce/ hip-hop-passes-rock-most-popular-music-genre-nielsen-2018-1? icce2018/wp-content/uploads/2018/12/C6-04.pdf. r=US&IR=T. [48] H. Hirjee, D. Brown, Using automated rhyme de- [62] A. Texas, Hip-hop is the most listened to genre in tection to characterize rhyming style in rap mu- the world, 2015. URL: https://www.nme.com/news/ sic, Empirical Musicology Review 5 (2010) 121–145. music/various-artists-1151-1214849. doi:10.18061/1811/48548. [63] Wikipedia, Hip hop, 2021. URL: https: //en.wikipedia.org/wiki/Hip_hop. [64] T. Ingham, Nearly a third of all streams in the us last year were of hip-hop and rnb artists as rock beat pop to second, 2021. URL: https://www.musicbusinessworldwide.com/ nearly-a-third-of-all-streams-in-the-us-last-year-were-of-hip-hop-and-rb-music/. [65] E. Malmi, P. Takala, H. Toivonen, R. Tapani, A. Gio- nis, Dopelearning: A computational approach to rap lyrics generation *, KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). doi:10.1145/2939672.2939679. [66] J. Eastwood, E. Hinton, We wrote an algorithm to unravel the rhymes of hit musical ‘hamilton’, 2016. URL: http://graphics.wsj.com/hamilton/. [67] C. D, Fight the power: How hip hop changed the world, ???? URL: https://www.bbc.co.uk/ programmes/p0dj70yd. [68] N. Condit-Schultz, MCFlow: A Digital Corpus of Rap Flow, Ph.D. thesis, 2016. URL: https://etd. ohiolink.edu/apexprod/rws_etd/send_file/send? accession=osu1461250949&disposition=inline. [69] J. Eastwood, E. Hinton, How wsj used an algorithm to analyze ‘hamilton’ the musical, 2016. URL: http: //graphics.wsj.com/hamilton-methodology/. [70] A Small-Data Mindset for Generative AI Creative Work, 2022. [71] Musixmatch developer api, 2023. URL: https:// developer.musixmatch.com/. [72] S. Presser, Gpt-2 neural network poetry, 2019. URL: https://www.gwern.net/GPT-2. [73] S. Mcgregor, K. Agres, M. Purver, G. Wiggins, From distributional semantics to conceptual spaces: A novel computational method for concept creation, Journal of Artificial General Intelligence 6 (2015) 55–86. doi:10.1515/jagi-2015-0004. [74] D. Yang, Y. Zhou, Z. Zhang, T. Jia, J. Li, R. Lc, Ai as an activewriter: Interaction strategies with generated text in human-ai collaborative fiction writing, 2019. URL: https://ceur-ws.org/Vol-3124/paper6.pdf. [75] E. Wassiliwizky, S. Koelsch, V. Wagner, T. Jacobsen, W. Menninghaus, The emotional power of poetry: neural circuitry, psychophysiology and composi- tional principles, Social Cognitive and Affective Neuroscience 12 (2017) 1229–1240. doi:10.1093/ scan/nsx069.