<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Tomato Festival: Towards using ChatGPT for Long-Form Discourse Generation of Plan-Based Narratives?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maryam Dueifi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Eger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cal Poly Pomona, Department of Computer Science</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>UC Santa Cruz, Department of Computational Media</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>With Generative AI at the forefront of public conversation, the capability of AI systems to tell stories has seen a surge of interest. OpenAI's ChatGPT provides a user-friendly interface, as well as well-documented API access, and is used widely for generative purposes. In this paper we investigate how well it can actually produce narrative text. We present an approach to take a story plan produced by the Glaive narrative planner and turn it into a novella-length text. We then present a preliminary evaluation of the text output and discuss the challenges and limitations of having it actually be read by humans. Crucially, we show that the text is not comparable to human-written text in terms of grammatical complexity, which we posit to be one possible reason for it not being very enjoyable to read. As part of our work we also encountered several particular challenges that led to misspun tales, which we also discuss in detail.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Narrative generation has been a topic of interest for AI
research for many decades. Meehan’s TaleSpin [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is
often cited as the first “story generator”, although other
approaches have preceded it [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Nevertheless, TaleSpin, with
its use of character goals and plans has served as the
inspiration of a wide variety of plan-based narrative generators.
Often, these generators distinguish between the
story/fabula part of a narrative, consisting of the events as they
happened in the story world, and the discourse, i.e. the way
that story is actually told, like the text that is produced [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
More recently, Large Language Models (LLMs) have seen a
surge in popularity, including through OpenAI’s ChatGPT
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These are neural network model architectures termed
transformers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which can learn correlations between the
occurrences of words in a text corpus. This can then be
used to answer user queries, by letting the model predict
the most likely continuation that follows the question text,
producing a response text. However, LLMs can produce all
kinds of text, including narrative, when prompted to do so.
      </p>
      <p>While LLMs can produce narrative text, the transformer
mechanism that they are based on has one significant
limitation: When predicting the continuation of a text, the
attention head mechanism employed by the model can only
look at a limited context preceding the continuation. This
context window thus limits how much information the LLM
can even see. Essentially, if one tried to generate an entire
novel using an LLM with a limited context window, the
model would not be able to take the contents of the first
few chapters into account when generating the end of the
narrative, which may result in events that contradict
earlier changes to the world state. Plan-based narratives, on
the other hand, use an explicit (logic-based) world model
to represent the state of the story world at each time step,
and can ensure that only actions that are actually possible
to occur are taken by the characters. However, generating
the discourse for such a plan-based story often involves
templates or other short story fragments, often resulting in
repetitive or terse discourse output.</p>
      <p>In this paper we present an approach that utilizes
OpenAI’s ChatGPT to generate the discourse for a story
generated by a narrative planner. In particular, we focus on
generating long-form narratives, that, at present, reach the
length of a novella (about 25 000 words), and follow the
story as produced by the planner. Our contribution is
threefold: First, we present a novel approach to prompt an LLM
using the planner output to expand the narrative to the
desired length. Second, we have generated several narratives
using our approach and show an evaluation of some of their
qualities. Third, and arguably most importantly, during the
course of our work we have discovered several limitations
of this application of LLMs, and we will discuss them in
detail.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Our work builds on previous work in narrative generation,
combining a logic-based story structure generated by a
planner, with text generated by an LLM. We will therefore
discuss prior work in both of these areas.</p>
      <sec id="sec-2-1">
        <title>2.1. Plan-Based Narrative Generation</title>
        <p>
          When a story is viewed as a (partially ordered) sequence of
actions taken by the characters, it has striking similarities
to a plan, in the AI planning sense: A sequence of actions
applied to an initial state to achieve a goal condition [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
Indeed, there is a large body of work that utilized planners
to generate such stories [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The main challenge is how to
define the “goal” of the story: Often, this is described as
a state the story world ought to be in at the conclusion of
the story as desired by an author. However, a centralized
planner assigning actions to characters to reach this global
goal might lead to the characters acting contrary to what
common sense would dictate. In a bank heist story, an
“efifcient” plan might be that the bank teller just delivers the
money to the robber’s house, but it would likely not make
for a compelling story. In TaleSpin, the individual characters
have their own goals, preventing them from acting against
their own interests [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], but this may not always lead to a
story with an actual plot. Another approach, originating
with a branch of research by Riedl and Young [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]), instead
uses a centralized planner that allows the author to define
an overall goal for the story, while also ascribing intentions
to each character, preventing them from taking actions that
do not further their own character-goals. Ware and Young
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] build on this work and also give characters the capability
to form plans that they do not end up executing due to some
conflict that arises. Other research has also investigated
the use of landmarks to guide the generation process [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
incorporating character beliefs [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ], and failed actions
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Many of these planning-based approaches use the
standardized Planning Domain Definition Language, PDDL,
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and although some use extensions to handle intentions
and beliefs, these can be compiled away if needed [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ].
For our present work, we use the Glaive narrative planner,
which implements intentions and beliefs using an extended
PDDL-variant [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], and which comes with a library of
standard narrative planning problems, which we will discuss in
more detail below.
        </p>
        <p>
          With a story in hand, the next problem is how to convey
this story to an audience, i.e. how to generate the discourse
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The discourse generation problem actually consists
of several parts, including selecting which actions of the
story to tell, which to omit, the ordering of the telling in
order to convey the necessary information to the audience,
and then determining the actual realization of the discourse
in the form of text or other media. Research has focused
significantly on the first parts, by planning which discourse
actions convey the “right” information to the audience [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ],
how to model suspense [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], flashbacks and flashforwards
[
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ] or focalization [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. The actual text generation
is then typically handled through templates, as was e.g.
the case for the CPOCL experiments, or with a simple text
planner [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Another option is to manually translate the
internal representation to text, as was done for TaleSpin
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>
          In a more refined proposed model by Barot et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ],
the authors draw a distinction between the discourse, which
incorporates the decisions for what to tell, and the narration,
which is more concerned with how to tell the narrative. Our
work is best characterized as focusing on the narration, i.e.
the surface text realization, based on full story plans. As we
will discuss below, this is not a strong limitation for the data
set we were working in, but a more sophisticated discourse
model could be incorporated in the future. To generate the
output text, we make use of Large Language Models, which
we will briefly discuss next.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Large Language Models</title>
        <p>
          Large Language Models (LLMs) are based on a neural
network architecture called transformers [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which add an
attention head mechanism to recurrent neural networks,
essentially allowing them to learn a probability distribution
of words conditioned on the context these words occur in,
with varying weights for the context. In recent years, LLMs
have seen a surge in popularity, in part due to the
availability of OpenAI’s ChatGPT, which presents an LLM using a
chat-like interface. The inference-capabilities of the LLM
are used to predict the most likely continuation to a user
query. In practice, if the user enters a question, the most
likely continuation, as learned by the model, is an answer
to that question, whereas if the user enters instructions, the
most likely continuation follows these instructions. While
the basic premise is enticing, at present LLMs sufer from a
variety of issues, including hallucinations, where they make
up people, events, citations, or court cases, and just plain
factual inaccuracies. As text-models, they are ill-equipped to
reason, or perform calculations. For our purposes, though,
these issues are not necessarily problems, as we want the
model to generate novel content. Indeed, this potential for
creativity has been used to generate NPC responses in a
murder mystery [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], control game play in interactive RPGs
[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], or even to generate entire games using VGDL [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Our Approach</title>
      <p>Our system is able to generate a long-form narrative
consisting of multiple chapters. As input we utilize a plan, as
obtained by the Glaive narrative planner, as well as an
optional genre descriptor. Our discourse generator works in
three steps:</p>
      <sec id="sec-3-1">
        <title>1. Convert Glaive plan into chapter descriptions.</title>
        <p>2. Generate chapters from descriptions.</p>
        <p>3. Summarize and regenerate each chapter.</p>
        <p>In each step, our approach utilizes the diferent roles one
can provide when prompting ChatGPT as shown below. To
summarize: The system role provides the model with
highlevel guidance, the assistant role gives the model context
(typically previous model output, but can also be used to
demonstrate desirable output to the model), while the user
role contains the actual prompt the model should respond
to.</p>
        <sec id="sec-3-1-1">
          <title>3.1. Chapter Descriptions</title>
          <p>The first challenge when generating discourse from story
plans is that the planner output is in the form of a plan, in
the case of Glaive this comes in a PDDL-like syntax. The
ifrst step is therefore to convert these formal
representations into descriptions of a story chapter, where each step
in the plan corresponds to one chapter (we will discuss the
implications of this below). Rather than having the domain
author come up with a mapping of plan actions to a
description, we utilize ChatGPT itself to make this mapping. Given
a plan step , e.g. “(hatch-plan robbie six-shooter
brown-horse bank mother-lode)”, we construct the
following prompt:
• System: You are rephrasing a string of words.
• User: Take the following phrase and make it into
a coherent sentence: . Provide only the resulting
sentence.
• Assistant: In a statement like (accept talia rory
village), the meaning is: “Talia accepts Rory’s proposal
in the village”
The example mapping provides the model with guidance
how to interpret. For the example step above, the model
produces (variations of) the description “Robbie hatched a
plan to rob the bank for the mother lode with his six-shooter
and brown horse.”. In the next step, we use these steps to
generate individual chapters of the story.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Chapter Generation</title>
          <p>Once we have a textual description of each step of the plan,
we have the model generate a chapter of the story based on
the description. However, when using only the description
of the current step, chapters become disconnected, with
characters changing frequently between them, needless
repetitions, or plain inconsistencies. On the other hand, the
context the model can use is also limited, so we cannot
provide it with the entire story so far. Instead, we construct
the following prompt, incorporating only the text of the
immediately preceding chapter:
• System: You are a story teller continuing a story.
• User: Take the following chapter and make the next
chapter, and include dialogue and natural
progression: &lt;previous chapter text&gt;.
• Assistant: The current chapter is: &lt;i&gt;
• Assistant: The current chapter is about: &lt;chapter
description&gt;
• Assistant: The genre is: &lt;genre description&gt;
For the first chapter, the system prompt is changed to
“beginning a story”, and no previous chapter text is included.
For the last chapter, the model is also explicitly told that it
is “ending the story” in the system prompt.</p>
          <p>With these first two steps, the system will already produce
a discourse following the events of the plan produced by
Glaive. However, there will still be noticeable disconnects,
as the generation process does not take into account what
may follow. Chapters often end with “To be continued...”, or
even “The end.”, even though the narrative has not reached
its conclusion. We therefore added another processing step
to refine the flow of the narrative.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.3. Chapter Summarization and</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Regeneration</title>
          <p>One key limitation with LLMs for our work is their limited
context. However, even with larger context size,
providing the model with more input does not guarantee more
desirable output. Nevertheless, in order to improve the
consistency of the narrative, we take each story chapter and
have the model rewrite it in context. For each chapter, we
ifrst ask the model to provide a summary of the events that
happen in it, with the simple prompt “Create a short
summary of the following chapter: &lt;chapter text&gt;”. We then
ask the model to rewrite each existing chapter with the
following prompt:
• System: You are a story teller remaking chapters.
• User: Rewrite the following chapter: &lt;chapter text&gt;
• Assistant: Keep in mind that the &lt;relative position&gt;
chapter is: &lt;summary of relevant chapter&gt;
Where the assistant-phrase is present up to four times: We
take the context of the current chapter to be the two
immediately preceding and two immediately following chapters,
and include their summaries in the prompt. This causes
the model to take the events of these chapters into account
when rewriting the current one.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>
        Depending on the input plan, our approach is able to
produce a narrative telling that may reach the length of a short
novella. This makes evaluating the output challenging, as
it requires reading through hundreds of pages of text and
determine its quality. Before we go into more detail about
these challenges, and the evaluation we performed, we will
ifrst discuss the input data we used and then show some
sample output to demonstrate what our approach is
capable of. However, we would be remiss to not also discuss
limitations and challenges that remain, which we will do to
conclude this section.
We use the Glaive narrative planner to produce story plans
for use with our system [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Glaive comes with 7 standard
narrative planning problems, of which we use four for our
experiments:
• Fantasy: A story world set in a magical kingdom,
with two lovers, Talia and Rory, and a monster,
Gargax, guarding a treasure.
• Heist: A story world set in the American old west,
set in a town with a bank, a saloon, and options for
the characters to rob the bank, cheat at poker, steal
valuables and exchange them for money, and for the
sherif to arrest criminals.
• Raiders of the Lost Ark: A story world based on
the movie “Indiane Jones: Raiders of the Lost Ark”,
set in 1936, with a powerful artifact, the Ark of the
Covenant, being chased after by Indiana Jones (at
the behest of the US Army), and the Nazis.
• Western: Another story world set in the American
old west, featuring snakes that can bite characters,
and anti-venom that must be obtained to heal the
snake bite.
• (Not Used) Aladdin: A story world based on the tale
“Aladdin” from 1001 Nights, with a king, a woman
called Jasmine, a genie that can fulfill wishes, and
a hero character. We did not use this domain
because “Jasmine” and “Genie” were specific enough
for ChatGPT to recall the story from its training set.
We will discuss why this did not consistently happen
for Raiders of the Lost Ark below.
• (Not Used) Best Laid Plans: A story world
consisting of a goblin minion that must obtain hair tonic for
their warlock overlord. We did not use this domain,
as the vast majority of actions performed in valid
plans are move-actions, which did not lead to very
interesting narratives.
• (Not Used) Space: A domain ostensible set in space
with volatile planets and aliens. However, this
domain is underdeveloped, with the sample problem
leading to a solution of only two steps, far shy of an
interesting narrative, and we therefore did not use
this domain either.
      </p>
      <p>While we did not use the three last domains for our main
experiments, we still attempted to use them, and we will
elaborate more on some of the problems we encountered
below.</p>
      <sec id="sec-4-1">
        <title>4.2. Results</title>
        <p>Based on the narrative domains discussed above, we
generated 10 text outputs for every one of the four domains, using
the solution plan included with Glaive (it is, of course,
possible, to generate more/diferent solution plans, to produce
more varied stories). For our purposes, we were interested
in how well the translation of plan steps to chapters would
work, what the chapters produced by ChatGPT would look
like, and how consistent they are across the narrative. Due
to the recent release of ChatGPT 4o we performed our
experiments twice, once using ChatGPT 3.5, which has much
faster processing times, and once with ChatGPT 4o, to make
use of the newest version of the model.</p>
        <p>First, despite only being given a single example step, the
model is able to translate plan steps into chapter descriptions
(open jill general-store)
(move-once sally main-street bank)
(withdraw-money sally bank dress-money)
*(withdraw-money sally bank dress-money)
(move-once sally bank main-street)
(sell sally anne tomatoes tomato-money main-street)
(move-once robbie main-street dark-alley)
(hide-in-dark-alley robbie dark-alley)
(move-once barney barneys-room saloon)
*(move-once barney barneys-room saloon)
(pickpocket robbie sally dress-money main-street dark-alley)
*(pickpocket robbie sally dress-money main-street dark-alley)
very well, as shown in Table 1. In most instances, the
descriptions produced by the two model versions were almost
identical, with only three steps being described diferently.
Notably, the models sometime struggle with distinguishing
subject and object of an action, but the resulting descriptions
are still serviceable to turn them into chapters.</p>
        <p>The second step is to take the chapter descriptions and
having the model produce narrative text that expands the
description into an entire chapter. At first glance, this
produces output that reads like an actual story. For example,
one of our produced texts for the heist story starts as follows
(as told by ChatGPT 3.5):</p>
        <p>In the dusty town of Redwood Creek, the
sun beat down mercilessly on the wooden
buildings that lined the main street. Robbie
stood outside the saloon, wiping the sweat
from his brow as he watched the townsfolk
go about their business. He had been
planning this heist for months, and today was
the day he would finally put his plan into
action.
“Hey Robbie, you lookin’ mighty serious
there,” a voice called out from behind him.</p>
        <p>Robbie turned to see his old friend, Jake,
approaching with a grin on his face.
“Hey Jake, just thinking about our little job
today,” Robbie replied, trying to keep his
voice casual.</p>
        <sec id="sec-4-1-1">
          <title>Jake’s eyes lit up with excitement. “You mean the bank job? I thought you were just jokin’ about that.”</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Robbie shook his head. “No joke, Jake.</title>
          <p>We’re gonna hit that bank and make of with
enough gold to set us up for life.”</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Jake’s grin widened. “Count me in, partner.</title>
          <p>When do we make our move?”</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Robbie glanced around to make sure no one was listening in. “Tonight, after the sun goes down. We’ll ride in, guns blazing, and take what’s rightfully ours.”</title>
          <p>Chapter description
Robbie hatched a plan to rob the bank with his trusty six-shooter
and brown horse to strike the mother lode.</p>
          <p>Jill opened the general store.</p>
          <p>Sally moved once on Main Street near the bank.</p>
          <p>Sally withdrew money from the bank to pay for her dress.
Sally withdrew money from the bank in her dress.</p>
          <p>Sally moves once on Main Street.</p>
          <p>Sally Anne sells tomatoes for money on Main Street.
Robbie moved once from Main Street to a dark alley.
Robbie hides in a dark alley.</p>
          <p>Barney moves once in Barney’s room at the saloon.</p>
          <p>Barney moved once from Barney’s room to the saloon.
Robbie and Sally pickpocket dress money on Main Street in a dark
alley.</p>
          <p>Pickpocket Robbie stole Sally’s money from her dress in a dark
alley on Main Street.</p>
          <p>The same story, as told by ChatGPT 4o starts like this:</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>The sun dipped low over the dusty town of</title>
          <p>Red Rock, casting long shadows across the
wooden storefronts and dirt-packed streets.</p>
          <p>The air was thick with the scent of sagebrush
and the distant promise of rain that never
seemed to come. Robbie McGraw sat on the
creaky porch of the Rusty Spur Saloon, his
six-shooter resting comfortably in its holster
and his brown horse, Whiskey, tethered to
the hitching post nearby.</p>
          <p>Of particular note is that while the input only mentioned
a character “Robbie” by name, both versions of the model
ifll in additional names for the town, other characters, or
even the horse, as needed. Overall, for the heist story in
particular, the models produce text that is very compelling
in parts, but falls short of actually being interesting, as we
will discuss in the next section.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Evaluation</title>
        <p>One key point of our approach is that while we generate
chapters one by one, we then have the model rewrite them
in the context of the surrounding chapters. The purpose of
this is to ensure a higher level of consistency. Even though
the model is given the previous chapter when writing the
next one, characters are not used continuously, and instead
new characters are introduced, or the role of a character
is changed between chapters. The rewrite attempts to
address this issue by putting the chapter in context of what
happens before and after it. One way to show this efect
is to determine how often individual characters show up,
and how the rewrite afects the output. Table 2 shows some
basic statistics of the generated narratives. We used spaCy1
to perform named entity recognition for each chapter, and
tracked the diferent character’s occurrences across
chapters. We call characters that appear in more than 30% of
chapters “main characters”, as plans often include actions
performed by other characters, and characters that only
show up in one or two chapters “incidental” characters. The
1https://spacy.io/
table shows several efects of the rewrite, as well as some
diferences between the two model versions: In every
instance, the rewritten narrative is shorter than the original,
as the model is able to remove some redundancy.
Characters are also utilized slightly diferently, as shown by the
changing number of main and incidental characters, but a
more detailed analysis of this efect is still an open problem.
Also noteworthy is that the newer model is significantly
more loquacious than the older version, producing almost
double the output given the same input story steps.</p>
        <p>
          As our approach is able to generate narratives consisting
of thousands of words, a more detailed evaluation is very
challenging. Perhaps the most desirable form of evaluating
the produced narratives would be by gathering feedback
from human readers, but we encountered two obstacles
to this: First, the sheer quantity of text the readers would
have to go through is beyond any reasonable compensation
we could ofer, particularly, because second, upon closer
inspection, the writing is not very good. Rather than
subject volunteer participants to what we do not believe to
be good literature, we set out to quantify why the writing
does not seem to be enjoyable. Below we will detail some
perhaps more anecdotal evidence, but we also attempted
to quantify some properties of the writing itself.
Subjectively, the rhythm of the writing seems artificial (which
it is), and we believe this is due to the repetitive sentence
structure. The Frazier score is a measure for syntactical
complexity [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], and measures, broadly speaking, the depth
of the parse tree of a sentence. Higher scores therefore
indicate higher grammatical complexity, while simpler sentence
structures are scored lower. We used nltk[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and Stanford
CoreNLP [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] to compute the parse tree, and computed the
Frazier score for each sentence of the generated narratives.
For comparison purposes, we also computed the Frazier
scores for books written by human authors: Mary Shelley’s
Frankenstein [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]), Jane Austen’s Pride and Prejudice [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ]),
Victor Hugo’s Les Miserables (English translation by Isabel
Florence Hapgood) [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] and Sir Arthur Conan Doyle’s The
Adventures of Sherlock Holmes [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. We believe that these
books are a good representation of non-trivial literature,
as they are considered classics and literary achievements,
yet still readable by a dedicated reader. While the style and
length of the books may vary, the average Frazier score of
all four books fell between 7 and 8. For comparison, the
average Frazier score of the narratives generated by ChatGPT
3.5 fell between 14 and 18 for the diferent narratives, and
while ChatGPT 4o did produce less convoluted sentences,
its average Frazier score still ranged between 10 and 12.
Figure 1 shows the distribution of Frazier scores across each
individual narrative, together with the mean and 95%
confidence intervals. It can be seen that human authors tend to
use a rather even mix of more and less complex sentences,
while the models tend to eschew simpler constructs in favor
of sentences consisting of more nested clauses. Note that
neither higher nor lower Frazier scores are inherently
“better”, and this evaluation only serves to provide a comparison
with some samples of “typical” (good) human writing. In
our main experiments we avoided instructing the model to
imitate human authors due to some ethical concerns, but for
comparison reasons we added the instruction to write in the
style of each of the authors of the four books, and performed
the same analysis. Based purely on grammatical complexity,
the model does not seem to capture the same writing style,
and instead further increases sentence nesting.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4. Misspun Tales</title>
        <p>While we believe that the results we show above already
constitute a novel contribution, various problem cases we
uncovered may also be of interest to future researchers. First,
while our approach often results in narratives that follow
the given structure, this is not always the case. Since the
model is given the previous chapter input, as well as the
desired next step, it has to perform a trade-of in how much
attention to give to each, and at times the “most likely”
continuation ignores the plan steps entirely. In one particularly
noteworthy instance, ChatGPT 4o took the Heist narrative,
had the bank robbery take place in chapter 3, followed by
an escape by sea onto a pirate ship, which then turned into
a fantasy story to chase after a powerful artifact, concluding
with (as summarized by the model):</p>
        <sec id="sec-4-3-1">
          <title>In Chapter Thirty: The Convergence, the</title>
          <p>town of Port Meridian buzzes with
anticipation as the day of the Great Unveiling
approaches. Robbie and Talia, along with the
committee, work tirelessly to decipher the
Heart of the Ancients’ inscriptions. They
uncover a crucial passage about the
"Convergence of Realms," a moment when the
boundaries between their world and the Ancients’
will blur, unlocking immense knowledge but
also posing significant dangers. As the
committee intensifies their eforts, Robbie and
Talia resolve to ensure that this newfound
wisdom is used ethically and for the greater
good, heralding a new era of unity and
potential.</p>
          <p>A minor point that this conclusion also demonstrates is
the tendency of the model to lead to happy endings, that
reassure the reader that whatever power or treasure is obtained
will be used ethically. While not “wrong”, the persistent
mention of this is not well-placed in all story contexts. We
suspect that this is due to some of the “safeguards” OpenAI
has integrated into their system, to prevent (some)
unethical output. We do not disagree with this choice, but it also
shows that controlling LLM output to make it suitable for
all applications is challenging.</p>
          <p>Generally, control is a major issue when using an LLM.
Our approach to rewrite chapters to make characters behave
more consistently is not the only thing one could do. We
attempted to tell the model outright which characters exist
and what their roles are. However, this led to two problems:
First, the characters have to come from somewhere. If the
domain author is tasked with providing a character list, they
will need to foresee which larger cast of the characters the
model might need, which may also lead to higher
repetitiveness of the generated stories. Our solution to that was
to let the model generate an (ideally) varied list of
characters to use. However, the model does favor certain names
for diferent genres (Robbie’s partner in the heist story is
usually called “Hank” or “Jake”, despite neither showing
up in the input). The second problem, though, is that the
model does not seem to have enough context to work with
any character list that is given to it. When instructed to
use specific names, the text might initially use the provided
names, but they often change back to the names favored by
the model for each particular narrative.</p>
          <p>Even when the model follows the provided trajectory
and uses characters consistently, though, it struggles with
keeping a cohesive tone. Step 7 in the heist plan is “(sell
sally anne tomatoes tomato-money main-street)”, which
consists of a single transaction. In one instance, the model
took this idea and just “ran with it”, turning one character
selling tomatoes into the three protagonists hosting a tomato
festival where they sell produce together with the local
farmers in order to finance their travels:</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>With renewed purpose, the trio embarked</title>
          <p>on a mission to gather more tomatoes and
secure a venue for the tomato festival. They
approached local farmers, explaining their
plan and ofering a fair share of the profits.</p>
          <p>To their surprise, the farmers were intrigued
by the idea and agreed to contribute their
tomato harvest.</p>
          <p>By itself, the tomato festival is a reasonable interpretation
of the given story step, but it stood out in context, as the
preceding chapter is titled “The Enigmatic Stranger”, and
the following chapter “The Mysterious Stranger”. Overall,
the model favored a darker, grittier narrative, which made
the tomato festival seem even more out of place. On the
other hand, as these other two chapter titles already
indicate, encounters with strangers, risk, or danger are all
narrative devices the model employs frequently. The
narrative in question contains three consecutive chapters titled “A
Risky Proposition”, which is then followed by “The Perilous
Journey”.</p>
          <p>
            It is already known that ChatGPT often produces
inaccurate information [
            <xref ref-type="bibr" rid="ref36">36</xref>
            ], but this also raises issues even when
it produces fiction. It may, for example, suggest ordering
whiskey to sober up:
          </p>
        </sec>
        <sec id="sec-4-3-3">
          <title>Robbie led Barney to a table in the corner,</title>
          <p>away from prying eyes. He signaled the
bartender for two whiskeys, hoping the strong
drink would sober Barney up enough to have
a coherent conversation.</p>
          <p>On the flip side, as some of the examples are based on
existing media, which OpenAI may have included in the
data they used to train the model, the resulting narrative
may just reproduce this existing data, rather than
producing a new one. This was particularly challenging with the
Aladdin story, but for the Indiana Jones-based story another
interesting phenomenon occurred: Rather than interpreting
“Indiana” as a name (even in context with the Ark and the
location of Tanis), ChatGPT would take it as the US state of
Indiana, and then either name a town there “Tanis” or turn
“Tanis” into a character, as in this example:</p>
        </sec>
        <sec id="sec-4-3-4">
          <title>The sun was just beginning to rise over the</title>
          <p>horizon, casting a golden hue across the
small town of Maplewood, Indiana. The
streets were still quiet, with only the
occasional chirping of birds breaking the silence.</p>
          <p>Tanis stood at the edge of her driveway, her
backpack slung over one shoulder and a map
of Indiana clutched in her hand. She took a
deep breath, feeling the crisp morning air fill
her lungs. Today was the day she had been
waiting for.
“Are you sure about this, Tanis?” Her best
friend, Mia, asked as she approached. Mia’s
eyes were filled with concern, but there was
also a hint of excitement. “It’s a long way
to Indianapolis, and you know how
unpredictable things can get.”</p>
          <p>Overall, using ChatGPT for narrative generation seems
to produce reasonable output on the surface, but once one
looks at the text more closely, problems keep arising
almost fractally, as one digs deeper on a problem another
one shows up. The model has a general notion of what a
“good” narrative would look like, but no understanding of
lfow, composition, coherence, common sense, or purpose.
Attempts to rectify this by better prompts are only partially
successful, as providing too much instruction to the model
makes it more likely to ignore parts of it, while providing
too little guidance results in rambling. We will conclude
with the entirety of chapter 1 for one iteration of the Ark
story (using ChatGPT 4o):</p>
        </sec>
        <sec id="sec-4-3-5">
          <title>The genre is action, adventure, and mystery. (sic!) This happened exactly twice in all of our experiments, all other attempts produced actual chapter text.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>We have presented an approach to using ChatGPT, to
produce long-form text for discourse generation – or, more
precisely, surface text realization. Our approach takes story
steps produced by a narrative planner and tasks the model
with first translating the abstract step output to descriptions
of individual chapters, and then turn these descriptions into
actual chapter text. We perform another pass over the
chapters where we ask the model to summarize each chapter, and
then rewrite it using the summaries of the preceding and
following chapters as additional context. We show several
example outputs of our model, and discuss how challenges
with an evaluation of rather long texts that are not
particularly well written. Crucially, we also investigate why the
texts do not appear to read well, and show an analysis of the
grammatical complexity of the generated narratives, which
tend to be more complex than comparable human-written
literature. Finally, we discuss a myriad of other problems
we encountered that led to narratives that were ill-formed,
illogical or incongruous.</p>
      <p>While our work presents a somewhat bleak outlook on
the current state of narrative generation using LLMs, we
believe these insights are crucial to understanding what
makes text seem artificial. The alien-ness of AI text may
have been intuitively understood, but our work attempts
to quantify it, which may in turn lead to improvements in
output, or - perhaps more importantly - highlight the
importance of human authorship. Our approach also focused
on surface text realization without taking larger questions
of discourse generation into account. Some repetitiveness
may also be alleviated by not having the model generate an
entire chapter for rather trivial move actions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Meehan</surname>
          </string-name>
          ,
          <article-title>TALE-SPIN, an interactive program that writes stories</article-title>
          .,
          <source>in: IJCAI</source>
          , volume
          <volume>77</volume>
          ,
          <year>1977</year>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ryan</surname>
          </string-name>
          ,
          <article-title>Grimes' fairy tales: a 1960s story generator</article-title>
          ,
          <source>in: Interactive Storytelling: 10th International Conference on Interactive Digital Storytelling</source>
          , ICIDS 2017 Funchal, Madeira, Portugal,
          <source>November 14-17</source>
          ,
          <year>2017</year>
          , Proceedings 10, Springer,
          <year>2017</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>Story and discourse: A bipartite model of narrative generation in virtual worlds</article-title>
          ,
          <source>Interaction Studies</source>
          <volume>8</volume>
          (
          <year>2007</year>
          )
          <fpage>177</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] OpenAI, Gpt-4
          <source>technical report</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lebowitz</surname>
          </string-name>
          ,
          <article-title>Story-telling as planning and learning</article-title>
          ,
          <source>Poetics</source>
          <volume>14</volume>
          (
          <year>1985</year>
          )
          <fpage>483</fpage>
          -
          <lpage>502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Cassell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <article-title>Plans and planning in narrative generation: a review of plan-based approaches to the generation of story, discourse and interactivity in narratives</article-title>
          , Sprache und Datenverarbeitung,
          <source>Special Issue on Formal and Computational Models of Narrative</source>
          <volume>37</volume>
          (
          <year>2013</year>
          )
          <fpage>41</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>Narrative planning: Balancing plot and character</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>39</volume>
          (
          <year>2010</year>
          )
          <fpage>217</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>CPOCL: A narrative planner supporting conflict</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>7</volume>
          ,
          <year>2011</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Porteous</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cavazza</surname>
          </string-name>
          ,
          <article-title>Controlling narrative generation with planning trajectories: the role of constraints</article-title>
          , in: Interactive Storytelling: Second Joint International Conference on Interactive Digital Storytelling,
          <string-name>
            <surname>ICIDS</surname>
          </string-name>
          <year>2009</year>
          , Guimarães, Portugal, December 9-
          <issue>11</issue>
          ,
          <year>2009</year>
          . Proceedings 2, Springer,
          <year>2009</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Siler</surname>
          </string-name>
          ,
          <article-title>The sabre narrative planner: multiagent coordination with intentions and beliefs</article-title>
          , in: AAMAS Conference proceedings,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mohr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eger</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Martens, Eliminating the impossible: A procedurally generated murder mystery</article-title>
          .,
          <source>in: AIIDE Workshops</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sanghrajka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <article-title>Headspace: incorporating action failure and character beliefs into narrative planning</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>18</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Aeronautiques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Howe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. D.</given-names>
            <surname>McDermott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Sri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Christianson</surname>
          </string-name>
          , et al.,
          <article-title>Pddl| the planning domain definition language</article-title>
          ,
          <source>Technical Report, Tech. Rep</source>
          . (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Haslum</surname>
          </string-name>
          ,
          <article-title>Narrative planning: Compilations to classical planning</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>44</volume>
          (
          <year>2012</year>
          )
          <fpage>383</fpage>
          -
          <lpage>395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cardona-Rivera</surname>
          </string-name>
          ,
          <article-title>Using domain compilation to add belief to narrative planners</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>16</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>Glaive: a state-space narrative planner supporting intentionality and conflict</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>10</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Niehaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>A method for generating narrative discourse to prompt inferences</article-title>
          ,
          <source>in: Proceedings of the Intelligent Narrative Technologies III Workshop</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.-G.</given-names>
            <surname>Cheong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>A computational model of narrative generation for suspense</article-title>
          .,
          <source>in: AAAI</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>1906</fpage>
          -
          <lpage>1907</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.-C.</given-names>
            <surname>Bae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>A computational model of narrative generation for surprise arousal</article-title>
          ,
          <source>IEEE Transactions on Computational Intelligence and AI in Games</source>
          <volume>6</volume>
          (
          <year>2013</year>
          )
          <fpage>131</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>H.-Y. Wu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Christie</surname>
          </string-name>
          ,
          <article-title>A cognitive-based model of flashbacks for computational narratives</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>12</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>Merits of a temporal modal logic for narrative discourse generation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>11</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Elson</surname>
          </string-name>
          , Modeling narrative discourse, Columbia University,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N.</given-names>
            <surname>Wardrip-Fruin</surname>
          </string-name>
          ,
          <article-title>Reading digital literature: Surface, data</article-title>
          , interaction, and
          <article-title>expressive processing, A companion to digital literary studies (</article-title>
          <year>2013</year>
          )
          <fpage>161</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>A tripartite plan-based model of narrative for narrative discourse generation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>11</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Cox</surname>
          </string-name>
          , W. T. Ooi,
          <article-title>Conversational interactions with npcs in llm-driven gaming: Guidelines from a content analysis of player feedback</article-title>
          , in: International Workshop on Chatbot Research and Design, Springer,
          <year>2023</year>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>X.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Quaye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brockett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jojic</surname>
          </string-name>
          , G. DesGarennes, K. Lobb,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leandro</surname>
          </string-name>
          , et al.,
          <article-title>Player-driven emergence in llm-driven game narrative</article-title>
          ,
          <source>arXiv preprint arXiv:2404.17027</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , J. Liu,
          <article-title>Generating games via llms: An investigation with video game description language</article-title>
          ,
          <source>arXiv preprint arXiv:2404.08706</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Frazier</surname>
          </string-name>
          ,
          <article-title>Syntactic complexity, Natural language parsing: Psychological, computational, and theoretical perspectives (</article-title>
          <year>1985</year>
          )
          <fpage>129</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Klein</surname>
          </string-name>
          , E. Loper,
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit, "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>C. D. Manning</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>McClosky, The Stanford CoreNLP natural language processing toolkit, in: Association for Computational Linguistics (ACL) System Demonstrations</article-title>
          ,
          <year>2014</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          . URL: http://www.aclweb.org/ anthology/P/P14/P14-5010.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shelley</surname>
          </string-name>
          , Frankenstein; Or, The Modern Prometheus, Penguin,
          <year>1818</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Austen</surname>
          </string-name>
          , Pride and Prejudice, T. Egerton, Whitehall,
          <year>1813</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hugo</surname>
          </string-name>
          , Les misérables, Thomas Y. Crowell &amp; Co.,
          <year>1887</year>
          .
          <article-title>Translation by Isabel Florence Hapgood</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Doyle</surname>
          </string-name>
          , The Adventures of Sherlock Holmes, George Newnes,
          <year>1892</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Humphries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Slater</surname>
          </string-name>
          , Chatgpt is bullshit,
          <source>Ethics and Information Technology</source>
          <volume>26</volume>
          (
          <year>2024</year>
          )
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>