<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshops, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>How Novelists Use Generative Language Models: An Exploratory User Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alex Calderwood</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivian Qiu</string-name>
          <email>vivian.qiu@columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katy Ilonka Gero</string-name>
          <email>katy@cs.columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lydia B. Chilton</string-name>
          <email>chilton@cs.columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Columbia University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>17</volume>
      <issue>2020</issue>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Generative language models are garnering interest as creative tools.
We present a user study to explore how fiction writers use
generative language models during their writing process. We had four
professional novelists complete various writing tasks while having
access to a generative language model that either finishes their
sentence or generates the next paragraph of text. We report the
primary ways that novelists interact with these models, including:
to generate ideas for describing scenes and characters, to create
antagonistic suggestions that force them to hone their descriptive
language, and as a constraint tool for challenging their writing
practice. We identify six criteria for evaluating creative writing
assistants, and propose design guidelines for future co-writing tools.
Co-creativity; natural language processing; user interface; writing
tools; user-study.</p>
      <p>Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
1</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Spell checkers, auto-correct, and predictive keyboards have changed
how, and what, we write [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ]. Recently, a new wave of language
models—statistical models that are able to “predict” the next word
in a sentence—are garnering interest as creative generative tools.
Websites that demo the abilities of language models such as GPT-2
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have gained popularity across the computer science landscape,
but it remains unclear how professional writers view such systems.
      </p>
      <p>
        In 2019, two novelists described using similar language models to
help them generate fresh ideas or surprisingly resonant descriptions.
Their self-reported experiences suggest that these language models
could act as creative partners for professional writers, but it remains
unclear how well these anecdotes generalize. In the past, sentence
completion-style tools for story writing have lacked the semantic
coherence necessary to make them useful [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>In this work, we run a formal, albeit exploratory, user study
of four novelists writing in collaboration with a state-of-the-art
language model. Our goal is to understand what professional
writers look for in suggestions, and in what ways these new language
models do or do not meet this challenge. Figure 1 shows screen
captures from our study in which the novelists are using two diferent
writing interfaces.</p>
      <p>We report the primary ways that novelists interact with
generative language models, including: to generate ideas for describing
scenes and characters, to create antagonistic suggestions that force
them to hone their descriptive language, and as a constraint tool
for challenging their writing practice. We also unpack elements of
their criteria for evaluating creative writing assistants, and propose
design guidelines for future co-writing tools.
2</p>
    </sec>
    <sec id="sec-3">
      <title>BACKGROUND</title>
      <p>
        In 2016, New York Times Fiction Best Seller Robin Sloan wrote
about training a language model on a corpus of science fiction
short stories [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. He embedded this model in a text editor such
that he could have it complete a sentence when he pressed ‘tab’.
His vision for the tool as helper was “less Clippy, more séance”. He
imagined that the model would push him to write in an unexpected
direction and with fresh language. In 2019, the New York Times
profiled Sloan, who has continued working on this project and is
using the tool to write his third novel [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        More recently, critically acclaimed novelist Sigal Samuel wrote
about using a language model called GPT-2 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to help her write
her next novel [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. She thought that the near-human outputs
of language models were ideal for fiction writers because they
produced text that was close to, but not quite exactly, human writing.
This near-human writing “can startle us into seeing things anew”.
She discusses using GPT-2 to finish paragraphs from her previous
novels; in one case she writes, “Reading this, I felt strangely moved.
The AI had perfectly captured the emotionally and existentially
strained tenor of the family’s home.”
      </p>
      <p>Samuel makes it clear that she didn’t intend to copy-paste
sentences written by a language model, and that the model itself
contained all kinds of ephemera that didn’t advance the plot or belong
in the story. Its use was primarily local, and tended to capture a
certain tone or mood and extend that small conceit further.</p>
      <p>These two writers demonstrate the potential for language models
to act as aids for creative writers, and their anecdotal reports inspire
the work we present here.
3</p>
    </sec>
    <sec id="sec-4">
      <title>RELATED WORK</title>
      <p>
        Common writing interfaces are beginning to include predictive text
suggestions, notably next-word predictions in text messaging on
smartphones and sentence completion in email composition [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Independent work has found that these suggestions skew positive
in sentiment and influence the writer’s composition [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ], but this
work is in its early stages; recently there has been a call to explicitly
study ‘AI-mediated communication’ [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Others have noted the importance of shifting suggestions away
from the most likely phrases, as participants tend to find these
suggestions boring or trite [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Yet more unexpected suggestions
are often incoherent. Roemmele and Gordon study the efect of
model ‘temperature’ on suggestions in a story writing context,
ifnding that higher temperature suggestions are more original but
less coherent [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Manjavacas et al. fine-tune a language model
on a specific author to improve stylistic coherence [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Gero and
Chilton narrow the use-case to metaphor generation and find the
constrained context dramatically improves coherence [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In the general fiction writing case, more often than not systems
still fail to be both semantically coherent and artistically expressive.
Recent breakthroughs in natural language processing such as the
introduction of the ‘transformer’ neural network architecture [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
and BERT embeddings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have led to language models that are
remarkable at understanding the semantics of written language
and generating new text. Transformer models like GPT-2 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] rely
on massive datasets and can seemingly imitate the style of a
reference text, with legible grammar and even some understanding of
conceptual relations between characters and objects.
      </p>
      <p>
        We draw on theoretical work on co-creative artistic tools that
suggests “creativity emerges through the interaction of both the
human and the computer” [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Improved language models such as
GPT-2 may allow a more meaningful interaction to occur between
creative writers and computers. This is what we study here.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>EXPERIMENT DESIGN</title>
      <p>We recruited four published novelists for our study, and observed
them complete various tasks that had them interact with generative
writing tools in individual hour long sessions. Three of the writers
had no previous exposure to these tools; one writer had been
previously exposed but only briefly, and not for his professional writing.
We first introduce the writing tools studied, and then describe the
study procedure.
4.1</p>
    </sec>
    <sec id="sec-6">
      <title>Interfaces</title>
      <p>The adoption of co-creative writing technologies hinges on their
ability to provide appropriate suggestions while being simple to
understand and interact with. Small details in the generative
system’s interface design will have ripple efects for their perceived
utility among writers.</p>
      <p>
        The two interfaces chosen for the study were Talk To
Transformer1, and Write With Transformer2, later referred to in this
paper as ‘Talk to’ and ‘Write with’ respectively. Both user
interfaces rely on GPT-2 to predict the most likely sequence of words
following some input text. Both take into account at most the last
256 sub-word tokens available, though in many cases there is not
that much preceding text. GPT-2 was trained on the WebText
corpus, which contains 40GB of text from over 8 million articles linked
to by Reddit from before 2017 that received at least 3 votes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>‘Talk to’ (Figure 1a) uses a text completion paradigm where the
user writes into a small, centered text box and presses a button
to have the system generate a completion. The completed text is
around the same length as the input, though there is a max overall
(input + output) length of 256 sub-word tokens. The completed
text is also not editable, giving a sense of finality to the generated
text, though pressing the button again restarts the text generation,
replacing the previous output.</p>
      <p>‘Write with’ (Figure 1b) has the user write into a page-like
document, and requires that the user presses the tab key to trigger text
1https://talktotransformer.com/
2https://transformer.huggingface.co/doc/gpt2-large
generation. Doing so will show a drop down menu with three short
suggestions, usually between 1 and 10 words. The length of the
suggestions is a function of the time allotted for the generation,
which in turn is a function of the amount of input text. This means
that toward the end of a longer document, suggestions often get
shorter. The user can select one of the suggestions with a mouse or
with arrow keys (or ignore the suggestions completely and continue
writing). The text that is generated appears directly in line with
their previous writing, highlighted blue, and is itself editable.</p>
      <p>Both ‘Write with’ and ‘Talk to’ difer from existing predictive
text interfaces, like next word suggestions on a mobile keyboard,
by the length of their suggested text and their interaction mode.
Most predictive text keyboards always surface suggestions, rather
than requiring a user trigger, and are generally only one word long.</p>
      <p>
        ‘Write with’ is somewhat similar to Gmail’s ‘Smart Compose’
feature [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which shows suggested sentence endings when a user is
composing an email. Unlike ‘Write with’, ‘Smart Compose’ doesn’t
wait for a user trigger, but instead shows suggestions when the
algorithm has high confidence in the suggested text; the ‘tab’ button
allows the user to accept the suggestion.
4.2
      </p>
    </sec>
    <sec id="sec-7">
      <title>Study Procedure</title>
      <p>Each writer was asked to complete a pre-defined set of tasks. During
the course of each task, each writer was periodically asked to
comment on the output of the tool they were using and its impact on
their writing process. After each task, the writer discussed with the
examiner their thoughts about their, and the tool’s, performance in
the task. Additionally, they were allowed to articulate any response
they had to the tools in a discussion with the examiner after the
completion of all tasks.</p>
      <p>The procedure went as follows:
(1) Following a very brief description of the user interfaces, they
were given an initial open ended experimentation with the
tools. (2 - 10 minutes)
(2) They were asked to write ‘the most interesting’ or ‘the best’
original piece of fiction that they were able to with the
assistance of the tools. They were allowed to switch between the
tools at will, but were asked to use both. (10 - 20 minutes)
(3) They were asked to work on an in-progress piece of writing
with the assistance of the tools. They were told to try and
solve an ‘issue’ they’d been having with a scene or
description. (10 - 30 minutes)
(4) They were asked to again write ‘the best’ thing they could
with ‘Write with’, with the constraint that they had to use
a suggestion at least once every other sentence. (10-20
minutes)</p>
      <p>We recorded and transcribed each session. Additionally, we
recorded all text written, including text written by the machine,
and for each generated suggestion annotated if it was ‘accepted’ by
the writer.
5</p>
    </sec>
    <sec id="sec-8">
      <title>RESULTS</title>
      <p>To preserve anonymity, we refer to the four writers in our study as
W1-W4. All four writers chose to use ‘Write with’ when asked to
write ‘the best’ original piece that they could in the allotted time.
To explain the preference, they generally cited the lack of control
and the higher degree of randomness associated with the longer
text generated from ‘Talk to’.</p>
      <p>We first looked at when in a sentence writers were likely to
trigger the system. Figure 2 shows that writers triggered ‘Write
with’ at the beginning of sentence 24% of the time, with a majority of
triggers taking place less than 10 words into a sentence. As seen in
Figure 3, longer suggestions were more likely to be accepted by the
writers, though short suggestions were generated more frequently.
Table 1 shows examples of generated suggestions; E2 and E4 are
indicative of shorter suggestions.</p>
      <p>We also noticed that writers often triggered ‘Write with’ multiple
times at a single point in the text if the resulting suggestions were
not what they wanted. We found that 25% of all triggers were a
repeated trigger, suggesting that once a writer triggered the system,
they were invested in finding a useful suggestion.
5.1</p>
    </sec>
    <sec id="sec-9">
      <title>Incoherence and Plot Deviation</title>
      <p>Unanimously, the writers pointed out that the tools appeared to
deviate from the direction they were taking their writing, particularly
referring to the ‘Talk to’ interface. All writers were quick to point</p>
      <p>Preceding Text
E1
E2
E3
E4
Gen 1
of red
Gen 2
stood the woman who would
one day become his
of orange</p>
      <p>Gen 3
&lt;/&gt;
There were no roads, no</p>
      <p>The castle was a large castle
Harold sat on the hotel room bed and in front of him
was the bedsheet, "which had
was a picture of his late son.</p>
      <p>The storms colored the sky a shade
The Castle Devocion was six leagues through the for- A few days before the storm
est from the coast, where the fortress lay in disrepair.</p>
      <p>He [the man in the photograph] was holding a
pen.</p>
      <p>baby in his
small silver
out instances that the system changed point of view (it seemed to
prefer 1st person even when they were in 2nd or 3rd).</p>
      <p>As related to novelist Sigal Samuel’s perspective of using tools
to “make the familiar strange” (see Background), all of them were
at one point or another struck by just how strange the machine’s
responses were, but often to the point it wasn’t useful to them. W3
said “it’s like improv. You have to ‘yes, and.’ ” Meaning that if the
generated text does not incorporate the prior facts of the piece, it
is not constructive.</p>
      <p>W1 and W2 noted that the tools were much better at following
them into ‘genre’ writing than into the more nuanced and stylized
writing they were interested in. This is clear in Table 1, E3, where
the writer set up a fantasy scene and the suggestions were more
coherent than normal. Yet, at multiple points in Tasks 1, 2, and 4, all
four writers allowed themselves to be steered by the tools as they
introduced new characters or new plot devices that seemed unlike
those preceding them. Repeatedly, they found these developments
“interesting” or laughed at the suggestions, and were willing to
adapt their writing to incorporate the change. They were more
likely to take the suggestions during Tasks 2 and 4, when they
weren’t writing something they had preconceived.
5.2</p>
      <p>Observed Use Cases
5.2.1 Model As Antagonist. Because of its tendency to randomness,
all participants initially expressed disappointment or resignation
at times where the system’s output was not along the lines they
anticipated. However, W1, W3, and W4 expressed the idea that this
antagonism was in some ways constructive. W4 was very positive
about this trait of the system, comparing triggering the system’s
auto-complete to flipping a coin, where the coin flip makes you
realize how you hope it will land, regardless of where it actually
does. To that end, W4 was the most likely to reject the suggestion
of ‘Write with’, but generally the most positive about its ability to
help him determine what he wanted to write.
5.2.2 Description Creation. All four participants experimented
with using ‘Write with’ to generate mid-sentence descriptions for
items, scenes, or characters. All four writers learned through the
course of the session that they could get ‘Write with’ to focus on
filling in descriptions such as colors or character details by requesting
suggestions after prepositions, and actions by requesting
suggestions after a noun phrase. They rejected adjective descriptions like
colors more often than any other type of suggestion, often
dismissing them as “boring” and limited, though W4 and W1 noted
that more than three suggestions given could be useful at those
moments.</p>
      <p>The writers often didn’t see the usefulness of the tool as a
meaningful generator for plot or for characters. W4 noted that he was
not a “spiritualist” writer, meaning that rather than let the flow of
ideas come to him during the writing process, he usually sat down
with a set of “points to hit”. The majority of writers mentioned
they could see something like this being useful for generating plot
outlines for writing exercises.
5.2.3 As Constraint. Especially during Task 4, during which the
participants were required to use the suggestions from ‘Write with’
at least every second sentence, the writers most often found the tool
“fun” and “challenging”. During the post-trial discussion, all of the
four participants returned to the unique challenge of integrating
its responses into their writing.</p>
      <p>They developed a number of strategies to get it to work well,
including allowing it to begin sentences for them, most often
reasoning that if it were to go in a new direction, doing so at the
beginning of sentences allows them a chance to “steer back”, or
follow it into a new place. W1 and W2 also frequently got it into
situations where rather than generating content noun phrases, it
only generated single words like “The” or “She”. Potential causes for
this include the short suggestion length for long preceding text (See
Section 4.1) and the writers’ non-standard literary style, resulting
in low source probability under the language model.
5.2.4 The Unexpected. At one point, W1 set up ‘Write with’ to
describe the color of the sky, and it suggested “dark blue”, “yellow”,
and “a shade of dark”; he accepted the last suggestion. This is an
example of the system steering from a direction that the writer clearly
wanted to pursue (hue description) into a related, but separate
concept, describing a shade instead, for stylistic efect.</p>
      <p>Both systems frequently introduces characters or dialogue, which
for Tasks 1, 2, and 4 produced comments like “I wasn’t going to go
there, but that’s interesting”, especially when it brought into play
family members (sister, wife, father), such as in Table 1, E1, where
suggestions introduce variously a woman (perhaps wife) and a son.
6
6.1</p>
    </sec>
    <sec id="sec-10">
      <title>DISCUSSION</title>
    </sec>
    <sec id="sec-11">
      <title>Evaluation Criteria for Co-Writing Systems</title>
      <p>These trials indicate that novelists hoping to use co-creative
generative systems in their writing have a complicated evaluation
criterion that includes the system’s ability to extrapolate
reasonably well about character traits, settings, and events. They expect
the systems to match their style, verb tense, and perspective, in
addition to providing a high degree of creative insight—picking a color
from a spectrum they’d already considered is hardly ‘co-creative’.
Measures like predictive accuracy won’t do as evaluation criteria
because writers engaged with co-creative systems are looking for
creative insight, something not measured by perplexity or by a
language model’s ability to solve the canonical downstream NLP
tasks. We propose a series of evaluation questions, which could be
answered computationally, to guide system design:
(1) Does a suggestion match the tense of the preceding text?
(2) Does a suggestion introduce new characters or objects, or
does it reference preceding ones?
(3) Are new characters or objects coherent given the context?
(4) Does a suggestion include description?
(5) Does a suggestion include action?
(6) Given a single request, how diverse are the suggestions?
These questions highlight the kinds of considerations
professional writers have when evaluating suggestions. Notably they
are not questions that have correct answers; rather they reflect
important considerations we found through our user study.</p>
    </sec>
    <sec id="sec-12">
      <title>6.2 Design Guidelines for Co-Writing Tools</title>
      <p>Future systems should be aware that writers are interested in these
tools not just for immediate injection of inline text, which most
feel they are capable of producing on their own, but for a broad
range of descriptive, antagonistic, or constraining efects on their
writing.</p>
      <p>By triggering the generative model, the user switches from writer
to editor. Future design of these systems should continue to stress
the nature of the generated text as dynamic and alterable, focusing
on the suggestive element of these tools and allowing the writer
to enter an editorial feedback loop. There should be very little
overhead for querying the model.</p>
      <p>The systems should provide many suggestions that may be
swapped out and replaced frequently. Because of the high error rate
of these tools, a small number of suggestions may not be useful.
Similarly, extremely short suggestions are not useful.</p>
      <p>At times, writers are looking for a specific category of suggestion,
and any suggestion that does not fit inside those constraints is
disruptive. That disruption may itself be the goal of triggering the
system, as it forces them to explore a new range of possibilities or
back up and consider the reasons the model ‘thought’ to suggest
what it did. But to increase the odds that writers will use machine
generated text, future systems need to be more aware of what type
of suggestion the writer is looking for, rather than providing general
suggestions that lack any specific purpose.</p>
      <p>Rather than a triggering event that tells the system “generate!"
with no other context, we imagine an interface that is passively or
actively aware of the type of suggestion that is being requested,
its length, and how much it should adhere to the current scene or
freely decide the trajectory of the writing to come. This awareness
might be thought of as a list of parameters passed to the trigger,
but it should be done without intruding on the ease of the request.
In this way, the notion of co-creativity can be expanded further,
and push the generation process further into the space of dynamic
conversation between human and machine.</p>
    </sec>
    <sec id="sec-13">
      <title>7 CONCLUSION</title>
      <p>Through this study, we identified a number of considerations for
designing co-writing systems, concerning both the interaction
dynamics and the nature of the computer suggestions. Writers found
value in being able to edit the systems’ output and quickly replace
the generated output with something they preferred. They enjoyed
using the model as a constraining device for challenging their
writing, or as an antagonist that helped them refocus and refine their
intent. We advise that future systems should provide many
suggestions, do so with a better understanding of the writer’s intent, be
editable, and regenerate with little to no mental overhead.</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGMENTS</title>
      <p>Katy Ilonka Gero is supported by an NSF GRF (DGE - 1644869).
Alex Calderwood is supported by The Brown Institute for Media
Innovation (https://brown.columbia.edu/).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <string-name>
            <surname>Krysta Chauncey</surname>
          </string-name>
          , and Krzysztof Z Gajos.
          <year>2018</year>
          .
          <article-title>Sentiment bias in predictive text recommendations results in biased writing</article-title>
          .
          <source>In Proceedings of Graphics Interface</source>
          .
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Kenneth</surname>
            <given-names>C Arnold</given-names>
          </string-name>
          , Krzysztof Z Gajos, and
          <string-name>
            <surname>Adam T Kalai</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>On suggesting phrases vs. predicting words for mobile text composition</article-title>
          .
          <source>In Proceedings of the 29th Annual Symposium on User Interface Software and Technology</source>
          .
          <volume>603</volume>
          -
          <fpage>608</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Mia</given-names>
            <surname>Xu</surname>
          </string-name>
          <string-name>
            <given-names>Chen</given-names>
            ,
            <surname>Benjamin N Lee</surname>
          </string-name>
          , Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang,
          <string-name>
            <surname>Andrew M Dai</surname>
            ,
            <given-names>Zhifeng</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Gmail Smart Compose: Real-Time Assisted Writing</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>00080</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Elizabeth</given-names>
            <surname>Clark</surname>
          </string-name>
          , Anne Spencer Ross, Chenhao Tan,
          <string-name>
            <given-names>Yangfeng</given-names>
            <surname>Ji</surname>
          </string-name>
          , and
          <string-name>
            <surname>Noah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Creative Writing with a Machine in the Loop: Case Studies on Slogans and Stories</article-title>
          .
          <source>In 23rd International Conference on Intelligent User Interfaces (IUI '18)</source>
          . ACM, New York, NY, USA,
          <fpage>329</fpage>
          -
          <lpage>340</lpage>
          . https://doi.org/10.1145/3172944. 3172983
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Nicholas</given-names>
            <surname>Mark Davis</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Human-computer co-creativity: Blending human and computational creativity</article-title>
          .
          <source>In Ninth Artificial Intelligence and Interactive Digital Entertainment Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). arXiv:
          <year>1810</year>
          .04805 http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Katy</given-names>
            <surname>Ilonka Gero and Lydia B Chilton</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Metaphoria: An Algorithmic Companion for Metaphor Creation</article-title>
          .
          <source>In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM</source>
          ,
          <volume>296</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>T Hancock</given-names>
          </string-name>
          , Mor Naaman,
          <string-name>
            <given-names>and Karen</given-names>
            <surname>Levy</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <string-name>
            <surname>AI-Mediated</surname>
            <given-names>Communication</given-names>
          </string-name>
          : Definition, Research Agenda, and
          <article-title>Ethical Considerations</article-title>
          . Journal of Computer-Mediated
          <string-name>
            <surname>Communication</surname>
          </string-name>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jess</given-names>
            <surname>Hohenstein</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malte</given-names>
            <surname>Jung</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <string-name>
            <surname>AI-Supported</surname>
            <given-names>Messaging</given-names>
          </string-name>
          :
          <article-title>An Investigation of Human-Human Text Conversation with AI Support</article-title>
          .
          <source>In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 1-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Enrique</surname>
            <given-names>Manjavacas</given-names>
          </string-name>
          , Folgert Karsdorp, Ben Burtenshaw, and
          <string-name>
            <given-names>Mike</given-names>
            <surname>Kestemont</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Synthetic literature: Writing science fiction in a co-creative process</article-title>
          .
          <source>In Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG</source>
          <year>2017</year>
          ).
          <fpage>29</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Alec</surname>
            <given-names>Radford</given-names>
          </string-name>
          , Jefrey Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
          <source>OpenAI Blog 1</source>
          ,
          <issue>8</issue>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Melissa</given-names>
            <surname>Roemmele and Andrew S Gordon</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automated assistance for creative writing with an rnn language model</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion. 1-2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Sigal</given-names>
            <surname>Samuel</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>How I'm using AI to write my next novel</article-title>
          . https://www.vox. com/future-perfect/
          <year>2019</year>
          /8/30/20840194/ai-art
          <article-title>-fiction-writing-language-gpt-2</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Robin</given-names>
            <surname>Sloan</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Writing with the machine</article-title>
          . https://www.robinsloan.com/ notes/writing
          <article-title>-with-the-machine/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>David</given-names>
            <surname>Streitfeld</surname>
          </string-name>
          .
          <year>2018</year>
          . Computer Stories: A.I. Is Beginning to Assist Novelists. https://www.nytimes.com/
          <year>2018</year>
          /10/18/technology/ai
          <article-title>-is-beginning-toassist-novelists</article-title>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Lukasz Kaiser, and
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention Is All You Need</article-title>
          .
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          ). arXiv:
          <volume>1706</volume>
          .03762 http://arxiv.org/abs/ 1706.03762
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>