<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yi-Chun Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arnav Jhala</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State University</institution>
          ,
          <addr-line>Raleigh, NC 27606</addr-line>
          ,
          <country country="US">US</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Yale University</institution>
          ,
          <addr-line>New Haven, CT 06510</addr-line>
          ,
          <country country="US">US</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Recent advances in large language models (LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation (PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of ”Object-Relation-Object” triples, and retrieves visual assets using afordance-aware semantic embeddings from the GameTileNet dataset [ 1]. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, afordance-layer alignment, and spatial constraint satisfaction across frames. This prototype ofers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multiagent coordination in story-centered PCG.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Narrative-to-scene generation</kwd>
        <kwd>Semantic visual grounding</kwd>
        <kwd>Procedural content generation (PCG)</kwd>
        <kwd>Afordanceaware design</kwd>
        <kwd>Temporal scene structuring</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
        <kwd>Explainable content generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Large language models (LLMs) can generate rich, coherent narratives from minimal input, creating new
opportunities for procedural content generation (PCG) in games that emphasize narrative and visual
storytelling. Yet while narrative generation has advanced rapidly, translating stories into structured,
spatially grounded game scenes remains an open challenge. Most PCG systems still produce isolated
levels or static layouts without modeling temporal structure and continuity—core aspects of narrative
experiences that connect space, time, and character action.</p>
      <p>
        Narratives unfold through sequences of events that shape spatial progression and thematic
coherence [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Building on this idea, we present a method for generating sequences of visual game scenes
from short narratives. Each story is segmented into three temporal frames (beginning, middle, end)
following narrative-structure conventions such as Freytag’s pyramid, capturing key transitions while
keeping generation tractable. This segmentation supports partial modeling of story progression and
provides a foundation for studying temporal consistency and dynamic storytelling.
      </p>
      <p>
        Our approach introduces a symbolic-to-visual pipeline that extracts spatial predicates from each
frame, maps them to assets in the GameTileNet dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and arranges them on procedurally
generated terrains. Object–relation–object triples are derived through LLM prompting and grounded in
visual tiles using semantic and afordance-based filtering. Terrain layers are produced with Cellular
Automata to ensure connectivity, while objects are placed through rule-based spatial refinement that
satisfies adjacency and containment constraints. The resulting layered 2D scenes reflect the structure
of the source narrative while maintaining spatial logic relevant to gameplay.
      </p>
      <p>Our contribution lies not in proposing new generative algorithms but in integrating established
PCG components, such as Cellular Automata and semantic matching, into a unified narrative-to-scene
pipeline emphasizing temporal segmentation and afordance grounding. This coordination supports
cross-frame continuity and balances semantic fidelity with afordance validity, a combination rarely
explored in prior PCG work. LLM-generated stories serve only as a reproducible testbed for
openvocabulary narrative input; the pipeline itself is agnostic to text source and can process any authored
story.</p>
      <p>We evaluate the system on ten narrative examples, each divided into three frames. Evaluation
considers semantic-tile matching, afordance alignment, and spatial-relation satisfaction in the generated
scenes. Even with lightweight symbolic rules and open-ended input, the results exhibit coherent spatial
organization that reflects narrative intent.</p>
      <p>Contributions.</p>
      <p>• A unified pipeline that decomposes short narratives into temporally segmented, layered 2D
scenes.
• A predicate-extraction and semantic-matching process that grounds narrative objects in
tilebased assets using afordance-aware filtering.
• A multi-layer scene synthesis method combining Cellular Automata terrain generation with
rulebased spatial placement.
• An empirical analysis of ten narrative examples demonstrating semantic alignment, afordance
coherence, and spatial-relation satisfaction across frames.</p>
      <p>
        The proposed system ofers a modular foundation for narrative-to-scene generation, emphasizing
temporal segmentation, afordance grounding, and semantic alignment. Although the current
prototype does not model agent behavior or dynamic state transitions, it establishes a framework extendable
to multi-agent coordination and gameplay-aware narrative PCG. Recent work shows that LLMs can
segment narrative events in ways that align with human perception [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], supporting our choice to use
structured time frames as the basis for visual scene construction. Table 1 illustrates one such example,
where spatial constraints and tile afordances jointly produce semantically coherent game scenes from
narrative descriptions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Narrative Theory and Structure in Games</title>
        <p>
          Narrative structure in games has been studied through formalist, experiential, and emergent
perspectives. Early work examined the tension between linear storytelling and player agency [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ]. Later
models addressed experiential modes [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and story architectures that accommodate agency [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Other
work connected narrative to spatial layout and symbolic environments [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ], while surveys
summarize recurring patterns in interactive media [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. These theories provide foundations for incorporating
narrative intent into generative pipelines.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Narrative-Guided Procedural Content Generation</title>
        <p>
          Procedural content generation (PCG) increasingly leverages narrative structure. Quest and story
grammars guide goal-oriented generation [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and language models now produce worlds and events [
          <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
          ].
Emotional arcs and planning constraints improve coherence and engagement [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. PCG methods
span grammars, search-based strategies, and PCGML [
          <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
          ], with reinforcement learning
extending control to narrative form [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Layered generation has also appeared in visual storytelling and
comics [
          <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
          ], demonstrating ways to link symbolic story input with generative models. Prior
text-toscene and PCG systems generate single scenes or levels [
          <xref ref-type="bibr" rid="ref16 ref21">21, 22, 16</xref>
          ] but rarely model temporal
continuity or explicit narrative structure. Our work addresses this gap by connecting narrative segmentation
with spatial and afordance reasoning, bridging story progression and playable spatial layouts.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Tile-Based Generation and Semantic Afordances</title>
        <p>
          Tile-based abstractions remain central to scalable generation. Representative methods include
constrained layout via SMT solvers [23], evolutionary search [24], and grammar-based dungeon layouts
[25]. Neural approaches introduce embeddings for generalized level generation [26] and segmentation
for context-aware tilemaps [27]. Afordance modeling distinguishes environmental, interactive, and
collectible roles [28], while corpora enable afordance-aware generation [ 29]. GameTileNet contributes
a dataset of low-resolution tiles annotated with layered afordances [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Semantic Matching and LLM-Guided Grounding</title>
        <p>
          LLMs are increasingly used for scene synthesis and grounding. Prompting and multimodal alignment
allow them to act as planners for layouts [30, 31]. Studies show that structured input such as scene
semantics improves alignment [32]. Visual-semantic reasoning frameworks for matching and
embedding inform techniques for aligning text predicates with tiles [33, 34]. Cognitive studies suggest LLMs
approximate human-like segmentation of narrative events, underscoring their potential for
symbolicneural integration [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Symbolic Structures in Visual Storytelling</title>
        <p>Symbolic and graph-based representations provide structure for storytelling. Knowledge-enhanced
generation has been modeled with narrative graphs or multimodal scene graphs [35, 36, 37]. Other
frameworks explore relational encodings to ensure continuity and causality in stories and games
[38, 39]. Recent work highlights hierarchical narrative graphs as interpretable intermediaries for
multimodal alignment [40]. These approaches underscore the importance of symbolic scafolds for
controllable and interpretable storytelling pipelines.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method: Narrative-to-Scene Generation Pipeline</title>
      <p>We propose a structured pipeline that transforms narrative text into visually grounded, tile-based game
scenes. As illustrated in Figure 1, the pipeline integrates large language model (LLM)–based story
generation with a sequence of symbolic and visual reasoning modules. The process begins with narrative
prompting and temporal abstraction, where the story is segmented into key time frames and
represented as predicate-style triples. These structured descriptions serve as input for semantic reasoning
and symbolic grounding. Although the LLM provides the narrative input, this step serves only to
ensure reproducible open-vocabulary test cases; the pipeline itself is model-agnostic and can process any
authored story with minimal preprocessing.</p>
      <p>Each scene is synthesized through a layered generation process: (1) terrain generation using Cellular
Automata ensures navigable base regions; (2) semantic object matching retrieves tile assets aligned
with narrative entities based on name, category, and afordance embeddings; and (3) spatial placement
with rule-based refinement positions entities according to annotated spatial relations. The final output
consists of rendered tile-based scenes and underlying semantic maps, enabling both human-readable
visualization and downstream interactive applications.</p>
      <sec id="sec-3-1">
        <title>3.1. Narrative Prompting and Frame Extraction</title>
        <p>
          Inspired by Word2World [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], we use a large language model to generate a short adventure story and
extract three key narrative moments. Each moment is structured as symbolic predicates describing
relationships between story elements, represented as [Object] [Relation] [Object] triples. These
structured predicates form the basis for downstream spatial reasoning and visual scene generation.
        </p>
        <p>The process uses two prompts:
• Prompt_1: “Generate a short adventure story (about 100 words).”
• Prompt_2: “Extract three key time frames and describe each with [Object] [Relation] [Object]
triplets.”
Generated Story Example:</p>
        <p>In the heart of the Enchanted Forest, young Elara discovered an ancient map hidden within a hollow oak.
It led her to the legendary Crystal Cavern, rumored to grant the finder a single wish. Braving treacherous
paths and wild creatures, Elara reached the cavern’s shimmering entrance. Inside, she faced the Guardian, a
majestic dragon. With courage and wit, she solved the Guardian’s riddle, earning her the wish. Elara wished
for peace in her war-torn village. As she exited the cavern, the skies cleared, and harmony was restored,
proving that bravery and hope could transform the world.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Symbolic Spatial Relation Mapping</title>
        <p>To transform symbolic narrative structure into spatial configurations, we begin by parsing each scene
into a set of predicate triples in the form [Object] [Relation] [Object]. These relations encode spatial
intent, such as adjacency or containment, and are mapped to a controlled ontology of canonical spatial
actions suitable for tile-based rendering.</p>
        <p>
          This mapping is informed by prior work on relational scene representation [
          <xref ref-type="bibr" rid="ref22">41, 42</xref>
          ], which
demonstrates that spatial relations such as above, next to, and on top of align with how people describe and
interpret visual layouts. For our system, we adopt the following spatial relation types:
• above / below: vertical adjacency (Y-axis ofset)
• at left of / at right of: horizontal adjacency (X-axis ofset)
• on top of: overlapping placement with layer prioritization
        </p>
        <p>Because natural language varies widely, we use a large language model (LLM) to normalize
openended expressions to this spatial ontology. For example, contains is mapped to on top of, while stands
near may correspond to at left of or at right of depending on context. Human verification ensures
the consistency and interpretability of the mappings. The resulting spatial relations serve as symbolic
scafolds that guide object placement during scene generation.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Semantic Asset Retrieval with GameTileNet</title>
        <p>
          To align narrative objects with appropriate visual tiles, we adopt a semantic embedding–based retrieval
strategy grounded in the GameTileNet dataset [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Each tile in the dataset is annotated with structured
metadata, including object name, group label, supercategory, and afordance type. These attributes
are embedded using the all-MiniLM-L6-v2 Sentence Transformer to construct a searchable index.
Narrative objects are encoded using the same model, and tile matches are retrieved via cosine similarity.
Afordance Types. GameTileNet classifies tiles into five afordance types adapted from the Video
Game Description Language (VGDL) [
          <xref ref-type="bibr" rid="ref23">43</xref>
          ]:
• Terrain: Walkable ground surfaces (e.g., grass, stone).
• Environmental Object: Static scene elements (e.g., trees, fences).
• Interactive Object: Triggerable or functional elements (e.g., doors, levers).
• Item/Collectible: Usable or acquirable items (e.g., potions, scrolls).
        </p>
        <p>• Character/Creature: Playable or non-playable agents (e.g., goblins, shopkeepers).</p>
        <p>Afordance labels serve as soft constraints to improve retrieval robustness, especially for ambiguous
cases. For example, the term ”guardian” could refer to a statue or a creature, and the afordance context
helps disambiguate the intended match. This retrieval process enables semantic alignment between
narrative elements and visual assets while respecting scene composition constraints.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Terrain and Scene Layout</title>
        <p>To render each scene with appropriate environmental context, we infer base terrain types and
subregion patches from narrative content. Each scene is structured as a layered grid, with the base layer
representing walkable terrain and additional layers corresponding to object afordance types.</p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Terrain Suggestion via LLM Classification</title>
          <p>We use an LLM-based classification step to extract environmental cues from narrative objects.
Following story decomposition into predicate triples, each object is assigned:
• Afordance type: One of terrain, environmental object, interactive object, item/collectible, or
character/creature.</p>
          <p>• Suggested terrain: A free-text label describing the implied environment (e.g., ”forest”, ”desert”).</p>
          <p>These predictions are aggregated to determine dominant terrain types for each scene.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Base and Patch Selection with Continuity Propagation</title>
          <p>To ensure continuity across time frames, we assume that scenes with no explicit location change remain
in the same environment. Scenes are grouped based on inferred location continuity using temporal
adjacency and terrain similarity. For each group:
• The most frequent terrain type is assigned as the base terrain.
• Objects with terrain-related labels (e.g., ”path”, ”alley”) are extracted as patch candidates.
• Patch terrain decisions are propagated across all scenes within the group.</p>
          <p>This process maintains visual and narrative coherence across sequential scenes. The terrain selection
process is summarized in Algorithm 1.</p>
          <p>Algorithm 1 InferBaseAndPatchTerrain
Require: story_scenes: list of scenes with narrative objects
Ensure: base_terrains: map from scene to base terrain label
1: patch_terrains: map from scene to list of patch labels
2: Initialize empty maps: base_terrains, patch_terrains
3: Group scenes by inferred location continuity
4: for all scene_group in grouped scenes do
5: Initialize frequency_counter
6: for all scene in scene_group do
7: for all object in scene.objects do
8: Predict afordance and suggested terrain using LLM
9: if object.afordance == Terrain then
10: Increment frequency_counter[object.suggested_terrain]
11: else if object.name contains terrain keywords then
12: Append object.suggested_terrain to patch_terrains[scene]
13: end if
14: end for
15: end for
16: base_terrain ← terrain with max frequency in frequency_counter
17: for all scene in scene_group do
18: base_terrains[scene] ← base_terrain
19: end for
20: end for</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>3.4.3. Scene Initialization with Cellular Automata</title>
          <p>We generate the layout of the base terrain using a Cellular Automata (CA)–based synthesis process.
For each scene:
• A connected walkable region is created using CA and verified for reachability.
• Terrain patches are inserted as subregions constrained within the generated base mask.
• The final output is a layered map suitable for narrative-aligned object placement.</p>
          <p>This multi-layered scene layout ensures compatibility between narrative framing and spatial
structure.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Spatial Constraint–Driven Object Placement</title>
        <p>After terrain generation and semantic matching, each scene is populated by placing narrative-aligned
objects within a multi-layer tile grid. This stage is divided into two parts: random initialization and
symbolic refinement.</p>
        <sec id="sec-3-5-1">
          <title>3.5.1. Initial Placement on Walkable Terrain</title>
          <p>Objects are first placed randomly on the walkable base terrain using a greedy assignment process. For
each object:
• A walkable coordinate is selected from the base terrain mask.
• The object is placed into the corresponding layer based on its afordance (character, item,
interactive, or environment).</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>3.5.2. Spatial Relation–Based Refinement</title>
          <p>Once initial placements are made, we apply symbolic spatial constraints derived from narrative
predicates. For each predicate of the form [Object A] [Relation] [Object B], we apply the associated
spatial transformation to reposition Object A relative to Object B. We define a rule-based adjustment
engine to implement these spatial relations. Each relation is translated into a spatial ofset and applied
iteratively.</p>
          <p>Algorithm 2 ApplySpatialRelations
Require: scene: object placement layers and predicate relations
Ensure: updated object positions satisfying spatial constraints
1: for all relation in scene.spatial_relations do
2: (, , ) ← relation.source, relation.relation, relation.target
3: Normalize names of A and B using alias dictionary
4: Get current position of B as (  ,   )
5: Compute new position (  ,   ) ← ApplyOfset (  ,   , )
6: if (  ,   ) within bounds and not overlapping then
7: Update A’s position in its assigned layer
8: end if
9: end for
The ApplyOfset function implements fixed spatial transformations:
• at the left of : ofset (−3, 0)
• at the right of: ofset (+3, 0)
• above: ofset (0, −3)
• below: ofset (0, +3)
• on top of: overlapping placement (same coordinates but layer shift)</p>
          <p>This symbolic-to-spatial grounding process supports interpretable scene composition and lays the
foundation for future rule learning or agent-driven placement strategies.</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Knowledge Graph Construction and Narrative Linking</title>
        <p>To support symbolic reasoning, we construct optional scene-level knowledge graphs (KGs) derived
from parsed narrative predicates. Each KG encodes the symbolic structure of a single story frame,
while temporal relations across scenes are captured using a merged KG with ‘precedes‘ edges. These
structures facilitate interpretable alignment between visual scenes and underlying narrative logic.</p>
        <sec id="sec-3-6-1">
          <title>3.6.1. Narrative Knowledge Graphs for Scene Composition</title>
          <p>Each story scene is parsed into a set of symbolic predicates, typically in the form of [Subject] [Relation]
[Object]. These predicates are transformed into triplets and rendered as a directed symbolic graph.
Nodes represent objects and agents, while edges encode spatial or interactional relationships derived
from language. Figure 2 shows three examples of such graphs aligned with key scenes.</p>
          <p>The knowledge graph for each frame includes:
• Entities: Matched visual objects and characters.
• Relations: Symbolic spatial predicates (e.g., above, contains) derived from narrative text.
• Semantic Roles: Directional links such as agent-action-object when applicable.</p>
          <p>These scene-level KGs enable localized symbolic reasoning and enhance traceability in narrative
visualization.
(a) Elara discovers the ancient map (b) Elara faces the treacherous paths (c) Elara meets the Guardian dragon</p>
        </sec>
        <sec id="sec-3-6-2">
          <title>3.6.2. Cross-Frame Temporal Integration</title>
          <p>To preserve the overarching story flow, we link each scene-level KG through a unified structure. This
merged knowledge graph introduces precedes edges to connect scenes according to narrative
chronology. These temporal links enable:
• Global queries across multiple scenes.
• Timeline reconstruction.</p>
          <p>• Coherence analysis across disconnected predicates.</p>
          <p>When no scene boundary is detected in the narrative (i.e., no setting or goal shift), the system
assumes the location is continuous. This continuity influences both terrain rendering and the KG linkage.</p>
        </sec>
        <sec id="sec-3-6-3">
          <title>3.6.3. Relation to Hierarchical Narrative Models</title>
          <p>Our construction is inspired by prior work on hierarchical symbolic representation for visual
narratives [? ]. In that model, events are represented at multiple levels, from panel to event to macro-event,
and integrated using graph structures that encode semantic, temporal, and multimodal relations.</p>
          <p>Although our current system focuses on symbolic graphs from text and image grounding, its
scenelevel KG resembles the event segment layer in [? ]. Both:
• Represent discrete narrative units grounded in key scenes or moments.
• Encode semantic roles and inter-entity relations symbolically.</p>
          <p>• Support integration into larger temporal and narrative structures.</p>
          <p>While our current scene graphs are simpler, they lay the groundwork for symbolic extensions and
integration with event-segment and macro-event abstractions. This future work could enable a unified
narrative reasoning framework for both scene generation and story understanding.</p>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Visual Rendering and Output</title>
        <p>After semantic matching and spatial placement, each tile-based scene is rendered using matched 2D
sprite assets. Scenes are organized into multi-layer matrices representing diferent object types
(terrain, environment, interactive objects, items, characters), which are composited in semantic order to
preserve depth and spatial logic. Object images are resized (e.g., 1.5×), centered within their tiles, and
pasted layer by layer to construct the final image. Figure layers are rendered from background to
foreground based on their afordance class. We also retain each layer’s numerical matrix for
downstream tasks such as symbolic reasoning or gameplay simulation. The rendering procedure is outlined
in Algorithm 3.</p>
        <p>Algorithm 3 RenderSceneImage
Require: scene_layers, matched_objects, scene_summaries
Ensure: Saved visual rendering of each scene
1: for all scene in scene_summaries do
2: Initialize canvas with white background
3: Draw base terrain using binary mask
4: for all layer_type in {environment, interactive, item, character} do
5: Get object list and placement matrix
6: for all (x, y), object in matrix positions do
7: Lookup matched object image
8: Resize and center image on tile
9: Paste image onto canvas
10: end for
11: end for
12: Save canvas as PNG to output folder
13: end for</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>We evaluated our narrative-to-scene generation system using 10 LLM-generated stories. Each story is
segmented into three key time frames, resulting in a total of 30 scene visualizations and their associated
symbolic representations. We assess the quality of the generated outputs from multiple perspectives:
tile–object semantic alignment, afordance-layer placement correctness, spatial predicate satisfaction,
and qualitative renderings.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>Each narrative (approximately 100 words) is decomposed into three time frames via prompting. For
each frame, we extract three predicate triples, which are matched to GameTileNet assets using semantic
embeddings and placed within procedurally generated terrains. Table 3 summarizes the evaluation
dataset.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Tile Matching Accuracy</title>
        <p>We first examine whether the tiles selected by semantic matching correspond well to the narrative
objects. Evaluation considers (1) whether the top-1 matched tile is semantically appropriate, (2) whether
the assigned afordance matches the expected gameplay role, and (3) whether the system produces a
diverse set of tiles across each story. Per-story results are shown in Table 4, and aggregate results
across all 30 scenes are given in Table 5.</p>
        <p>Analysis. The results in Table 5 show that semantic alignment between narrative objects and
candidate tiles is reliable across stories: cosine similarity values are consistently around 0.40–0.44 with low
variance (Table 4). This suggests that narrative embeddings provide a stable signal for selecting visually
coherent tiles. By contrast, afordance match rates fluctuate more strongly (0.27–0.55, mean 0.42),
indicating that while visual semantics are captured, gameplay functions (e.g., terrain vs. item vs. obstacle)
are more dificult to preserve. Tile diversity remains high across all stories (mean 0.92), showing that
the system avoids reusing the same assets excessively and maintains scene variety. Error inspection
revealed three recurring challenges: inconsistent naming conventions (e.g., “decrepit library” vs.
“decrepit_library”), coverage gaps where no suitable tile existed, and afordance misclassifications when
a visually similar tile had the wrong role. Overall, these findings highlight the usefulness of semantic
matching but also point to the need for stronger afordance-aware retrieval.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Spatial Predicate Satisfaction</title>
        <p>We next examine whether spatial relations (e.g., “Tree to the left of House”) are satisfied in the rendered
layout. A rule-based checker validates each predicate based on scene matrices, and we compute the
percentage of predicates satisfied per scene. Results are shown in Table 4.</p>
        <p>Analysis. Predicate satisfaction averaged 72% across all stories (Table 4), with the majority of scenes
achieving two-thirds or more of their relational constraints. Stories 6 and 1 achieved the highest
consistency (89% and 78%), while Story 10 lagged (56%), typically due to conflicting placement constraints
or lack of suficient map space. These results suggest that the procedural layout is capable of
enforcing basic spatial relations but can be brittle when multiple constraints interact. Improvements such as
constraint-solving or afordance-informed placement could further enhance satisfaction rates.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Strengths and Generalization</title>
        <p>The evaluation highlights several promising aspects of our approach. First, semantic alignment
between narrative descriptions and visual tiles is stable across all ten stories (Table 5), indicating that
embedding-based retrieval provides a reliable foundation for mapping open-ended narrative text into
game assets. Second, the system maintains high tile diversity (mean 0.92), suggesting that it can
produce varied outputs without excessive repetition, an important property for replayability and player
engagement. Third, spatial predicate satisfaction averaged 72% (Table 4), demonstrating that even a
lightweight rule-based layout generator can enforce a substantial fraction of narrative constraints.
Together, these findings suggest that the pipeline generalizes across diferent story contexts, making it
adaptable for varied game scenarios.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Limitations</title>
        <p>Despite these strengths, several limitations remain. Afordance matching showed high variance (0.27–
0.55), revealing a gap between visual similarity and gameplay semantics. This partly stems from limited
coverage in the GameTileNet dataset: some objects (e.g., ”lantern,” ”archway”) lack representative tiles,
forcing approximate matches. Semantic embeddings also capture descriptive but not functional
similarity (e.g., terrain vs. collectible), leading to occasional misplacements. Symbolic layers were not fully
leveraged for spatial reasoning, the layout engine checks constraints but does not resolve conflicts,
causing brittle performance when multiple predicates interact. Finally, narrative-to-scene generation
is inherently open-ended, complicating evaluation. Metrics such as diversity and cosine similarity
depend not only on correct object interpretation but also on tile coverage and the ambiguity of natural
language descriptions. The current three-frame segmentation follows a fixed narrative template; future
versions will explore data-driven segmentation that adapts frame count to story complexity. Ablation
studies on individual modules (terrain generation, semantic matching, spatial refinement) were not
conducted; future work will quantify their contributions to overall scene coherence.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Use Cases and Integration in Game Tools</title>
        <p>Despite these challenges, the framework has several promising use cases. For game developers, the
system can serve as a prototyping tool, quickly transforming narrative prompts into playable scene
sketches that can be refined by designers. For procedural content generation (PCG) research, it ofers a
testbed that integrates symbolic reasoning, semantic matching, and spatial layout, enabling controlled
experiments on hybrid generation pipelines. Integration into existing game engines such as Unity
or Godot could extend the system into interactive editors, where designers specify narrative beats
and receive automatically generated candidate scenes. Beyond development, the pipeline may also
support applications in game-based storytelling, educational games, or automated testing of narrative
scenarios.</p>
        <p>Overall, the results suggest that narrative-driven PCG is feasible, but requires a deeper integration
of afordance-aware retrieval and constraint-solving methods to bridge the gap between narrative
semantics and functional game design.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We presented a pipeline for generating game scenes from narrative text by aligning LLM-derived
predicates with the GameTileNet dataset and rendering layered maps using procedural terrain generation.
Our evaluation across ten stories demonstrated that semantic matching provides stable visual
alignment with narrative objects, while afordance alignment and spatial relation enforcement remain
challenging. These findings highlight both the promise of semantic embeddings for bridging text and assets
and the need for deeper afordance-aware reasoning to ensure gameplay consistency. This work
provides an early step toward narrative-driven procedural content generation. Future directions include
integrating symbolic reasoning for more reliable spatial and temporal coordination, expanding the
coverage of tile datasets to reduce gaps in representation, and supporting interactive or co-creative
workflows where designers and players can iteratively refine generated scenes. We see these
developments as important next steps toward practical tools that blend narrative expression with playable
game environments.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>Generative AI tools were used only to assist with language refinement and LaTeX formatting under the
authors’ direction. All conceptual contributions, design, implementation, and analysis were produced
by the authors.
[22] M. Guzdial, M. O. Riedl, Combinatorial creativity for procedural content generation via machine
learning., in: AAAI Workshops, 2018, pp. 557–564.
[23] J. Whitehead, Spatial layout of procedural dungeons using linear constraints and smt solvers, in:
Proceedings of the 15th International Conference on the Foundations of Digital Games, 2020, pp.
1–9.
[24] A. Petrovas, R. Bausys, Procedural video game scene generation by genetic and neutrosophic
waspas algorithms, Applied Sciences 12 (2022) 772.
[25] H. Jiang, S. Wang, H. Bi, X. Lv, B. Zhao, Z. Wang, Z. Wang, Synthesizing indoor scene layouts
in complicated architecture using dynamic convolution networks, Proceedings of the ACM on
Computer Graphics and Interactive Techniques 4 (2021) 1–16.
[26] M. Jadhav, M. Guzdial, Tile embedding: a general representation for level generation, in:
Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,
volume 17, 2021, pp. 34–41.
[27] L. Gabriel, E. W. Clua, A semantic segmentation system for generating context-based tile-maps,
in: Proceedings of the 22nd Brazilian Symposium on Games and Digital Entertainment, 2023, pp.
124–133.
[28] R. Hunicke, M. LeBlanc, R. Zubek, et al., Mda: A formal approach to game design and game
research, in: Proceedings of the AAAI Workshop on Challenges in Game AI, volume 4, San Jose,
CA, 2004, p. 1722.
[29] G. R. Bentley, J. C. Osborn, The videogame afordances corpus, in: 2019 Experimental AI in</p>
      <p>Games Workshop, 2019.
[30] R. Volum, S. Rao, M. Xu, G. DesGarennes, C. Brockett, B. Van Durme, O. Deng, A. Malhotra, W. B.</p>
      <p>Dolan, Craft an iron sword: Dynamically generating interactive game characters by prompting
large language models tuned on code, in: Proceedings of the 3rd Wordplay: When Language
Meets Games Workshop (Wordplay 2022), 2022, pp. 25–43.
[31] M. Zhou, Y. Wang, J. Hou, C. Luo, Z. Zhang, J. Peng, Scenex: Procedural controllable large-scale
scene generation via large-language models, arXiv preprint arXiv:2403.15698 (2024).
[32] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, L. Sun, A comprehensive survey of ai-generated
content: A history of generative ai from gan to chatgpt, arXiv preprint arXiv:2303.04226 (2023).
[33] K. Li, Y. Zhang, K. Li, Y. Li, Y. Fu, Visual semantic reasoning for image-text matching, in:
Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4654–4662.
[34] K. Li, Y. Zhang, K. Li, Y. Li, Y. Fu, Image-text embedding learning via visual and textual semantic
reasoning, IEEE transactions on pattern analysis and machine intelligence 45 (2022) 641–656.
[35] C.-C. Hsu, Z.-Y. Chen, C.-Y. Hsu, C.-C. Li, T.-Y. Lin, T.-H. Huang, L.-W. Ku, Knowledge-enriched
visual storytelling, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34,
2020, pp. 7952–7960.
[36] C. Xu, M. Yang, C. Li, Y. Shen, X. Ao, R. Xu, Imagine, reason and write: Visual storytelling with
graph knowledge and relational reasoning, in: Proceedings of the AAAI Conference on Artificial
Intelligence, volume 35, 2021, pp. 3022–3029.
[37] A. Mishra, A. Laha, K. Sankaranarayanan, P. Jain, S. Krishnan, Storytelling from structured data
and knowledge graphs: An nlg perspective, in: Proceedings of the 57th annual meeting of the
association for computational linguistics: Tutorial Abstracts, 2019, pp. 43–48.
[38] R. E. C. Rivera, A. Jhala, J. Porteous, R. M. Young, The story so far on narrative planning, in:
Proceedings of the International Conference on Automated Planning and Scheduling, volume 34,
2024, pp. 489–499.
[39] T. Akimoto, Computational modeling of narrative structure: A hierarchical graph model for
multidimensional narrative structure, International Journal of Computational Linguistics Research 8
(2017) 92–108.
[40] Y.-C. Chen, Structured graph representations for visual narrative reasoning: A hierarchical
framework for comics, arXiv preprint arXiv:2506.10008 (2025).
[41] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A.</p>
      <p>Shamma, et al., Visual genome: Connecting language and vision using crowdsourced dense image</p>
    </sec>
    <sec id="sec-8">
      <title>A. Online Resources</title>
    </sec>
    <sec id="sec-9">
      <title>B. Appendix: Sample Stories</title>
      <p>The Code and examples are released publicly via https://github.com/RimiChen/2025_NarrativeScene.</p>
      <p>Amid the neon glow of New York’s restless night, journalist Jake stumbled upon a cryptic note tucked inside
a trash can under a flickering streetlight. The message hinted at a hidden treasure buried deep within the
city’s shadows. Pursued by ruthless gangsters, Jake raced through winding alleys bathed in moonlight, his
every step echoing with danger. Clue after clue led him skyward—up spiral stairs and secret elevators—until
he reached the crown of the Statue of Liberty. There, hidden beneath the cold iron floor, he uncovered the
treasure. As dawn broke over the skyline, Jake realized he’d rewritten the city’s secrets.</p>
      <p>In the neon-lit sprawl of Neo-Tokyo, young hacker Kenji uncovered a cache of encrypted files hidden in a
derelict mainframe. Dust swirled in the glow of failing lights as lines of forgotten code blinked to life. Back
in his cramped apartment, Kenji hunched over his terminal, fingers flying across the keyboard. The screen
lfooded with cascading symbols, then froze—decoding complete. What emerged was more than data; it was
evidence of a vast corporate conspiracy. As city skyscrapers loomed outside his window, Kenji stared at the
truth pulsing on his screen, knowing his next move could shake the world’s digital core.</p>
      <p>In a crumbling library beneath a sagging ceiling, Iris unearthed a fragile message sealed in dust and time.
The note whispered of a hidden oasis—a refuge in the desolate ruins of the world. With resolve burning in
her chest, she ventured into the scorched wastelands, where mutant creatures prowled and the earth cracked
beneath her feet. Days blurred into nights, but Iris pressed on. At the edge of collapse, she found it: a
shimmering oasis blooming defiantly in the dead soil. As its waters sparkled with life, humanity’s hope
rekindled. Iris hadn’t just survived—she had rediscovered a future.</p>
      <p>Inside the public library, Alex stood near a dusty bookshelf, where an old book rested untouched. Opening
it, he found it contained an encrypted note tucked between yellowed pages. That night, at home, Alex held
the encrypted note over his desk, where moonlight illuminated the surface. The desk supported scattered
papers, maps, and scribbled codes. After cracking the message, he followed its coordinates to an abandoned
warehouse. There, Alex stood in the dim space, heart pounding. Streetlights outside cast shadows on the
warehouse exterior. Suddenly, figures emerged—the secret society gathered around Alex, their eyes fixed on
the note he still held.</p>
      <p>In the rain-slicked alleys of Zephyr, streetwise Jax knelt by a loose cobblestone glowing faintly beneath the
streetlight’s shimmer. Beneath it, he uncovered an emerald amulet pulsing with forgotten energy. Clutching
it in his hand, a surge of ancient power surged through him, surrounding him in a halo of green fire. The
city trembled. Above the skyline, a sinister sorcerer descended from the clouds. Jax stood his ground, the
amulet blazing with newfound strength. Magic clashed in the sky, old and new. When the light faded, only
Jax remained—victorious and forever changed by the artifact he had unearthed from the street.
Aboard the cursed ship The Sea Serpent, Captain Redbeard peered into the abyss, where the ocean floor
cradled a glowing artifact of unknown origin. As the ship rocked atop the waves, he hauled it aboard, sensing
the tide of fate shift. From the deep, monstrous shapes surged—leviathans with fangs like anchors. Redbeard
stood firm, the artifact blazing in his grip as he battled the beasts. When the sea fell still, the relic rose, casting
a vision across the sky: an ancient prophecy long forgotten. As its light danced across the waves, Redbeard
knew the sea had chosen its next legend.</p>
      <p>Amid a raging storm, Captain Jack stood firm on the deck, waves crashing around him. The sea surrounded
his ship, The Stormcaller, as he gripped a cryptic compass unearthed from a sailor’s tale. Its needle trembled,
pointing unerringly toward the fabled El Dorado. Navigating through the Bermuda Triangle, Jack’s vessel
braved merciless waves while mystical sea creatures lunged from the depths. His crew fought with steel
and fear. At last, the storm broke. Under radiant moonlight, the horizon cleared—revealing golden spires
glistening in the distance. Standing on the drenched deck, Jack watched El Dorado rise from myth into
reality.</p>
      <p>In the blistering Sahara Desert, where the sun beats down on endless dunes, Dr. Samuel Cross knelt near
an unearthed ancient amulet glinting in the sand. The Sahara Desert contained more than secrets—it held
the path to legend. As he journeyed onward, a violent sandstorm engulfed Cross, tearing visibility to shreds.
Through the storm, he read cryptic hieroglyphs etched in stone, while tomb raiders pursued him relentlessly.
Deeper inside the pyramid, Cross stood in a hidden chamber—an elaborate pyramid trap that contained
deadly mechanisms. Clutching the fabled treasure, Cross narrowly escaped, leaving behind danger but
carrying history in his hands.</p>
      <p>Deep within the abyss of the Forgotten City, intrepid archaeologist Dr. Alexander unearthed a mystical
artifact concealed in an ancient tomb. Torchlight flickered across the ruins as he knelt beside the stone
sarcophagus, uncovering secrets long buried. With the artifact in hand, he pressed forward, navigating a
labyrinth of treacherous traps and walls that whispered forgotten chants. Guided by the artifact’s glow,
he reached the heart of the tomb. There, the Guardian emerged, its form towering in silence. As chamber
doors slammed shut, the artifact shimmered and activated a hidden mechanism. The Guardian bowed. Dr.
Alexander had passed the test—and earned the city’s truth.</p>
      <p>In the blazing Sahara, golden sunlight beamed on the golden amulet buried just beneath the sand. Amelia
stood over the buried amulet, brushing away grains until it shimmered in full. She picked it up. As she
followed the amulet’s glow across the desert, sand dunes surrounded her. A venomous creature lurked behind
one of the dunes, but she pressed on. Soon, she arrived at a hidden pyramid. Amelia stood before the ancient
pyramid, awed by its size. With steady hands, she raised the amulet. It fit perfectly into the pyramid’s lock.
Inside, the hidden pyramid held civilization’s long-lost secrets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jhala</surname>
          </string-name>
          ,
          <article-title>Gametilenet: A semantic dataset for low-resolution game art in procedural content generation</article-title>
          ,
          <source>arXiv preprint arXiv:2507.02941</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>S. McCloud</surname>
          </string-name>
          ,
          <article-title>Understanding Comics: The Invisible Art</article-title>
          , Tundra Publishing,
          <year>1993</year>
          . Reprinted by HarperCollins in
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Cohn</surname>
          </string-name>
          , Visual narrative structure,
          <source>Cognitive science 37</source>
          (
          <year>2013</year>
          )
          <fpage>413</fpage>
          -
          <lpage>452</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Michelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Toneva</surname>
          </string-name>
          ,
          <article-title>Large language models can segment narrative events similarly to humans</article-title>
          ,
          <source>Behavior Research Methods</source>
          <volume>57</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Aarseth</surname>
          </string-name>
          ,
          <article-title>A narrative theory of games</article-title>
          ,
          <source>in: Proceedings of the international conference on the foundations of digital games</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Juul</surname>
          </string-name>
          ,
          <article-title>Games telling stories, Handbook of computer game studies (</article-title>
          <year>2005</year>
          )
          <fpage>219</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Calleja</surname>
          </string-name>
          ,
          <article-title>Experiential narrative in game environments (</article-title>
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Domsch</surname>
          </string-name>
          ,
          <article-title>Storyplaying: Agency and narrative in video games</article-title>
          , De Gruyter,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Domsch</surname>
          </string-name>
          ,
          <article-title>Space and narrative in computer games, Ludotopia: Spaces, places and territories in computer games (</article-title>
          <year>2019</year>
          )
          <fpage>103</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>A. Lindley, Story and narrative structures in computer games</article-title>
          , Bushof, Brunhild. ed (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Koenitz</surname>
          </string-name>
          ,
          <article-title>Narrative in video games</article-title>
          ,
          <source>in: Encyclopedia of computer graphics and games</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>1230</fpage>
          -
          <lpage>1238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <article-title>Quests: Design, theory, and history in games and narratives</article-title>
          , AK Peters/CRC Press,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Nasir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Togelius,</surname>
          </string-name>
          <article-title>Word2world: Generating stories and worlds through large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2405.06686</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Todd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Padula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stephenson</surname>
          </string-name>
          , É. Piette,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Soemers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          , Gavel:
          <article-title>Generating games via evolution and language models</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>37</volume>
          (
          <year>2024</year>
          )
          <fpage>110723</fpage>
          -
          <lpage>110745</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dighe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jhala</surname>
          </string-name>
          ,
          <article-title>Stories of the town: balancing character autonomy and coherent narrative in procedurally generated worlds</article-title>
          ,
          <source>in: Proceedings of the 14th International Conference on the Foundations of Digital Games</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Summerville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Snodgrass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guzdial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Holmgård</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Hoover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaksen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nealen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Procedural content generation via machine learning (pcgml)</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          <volume>10</volume>
          (
          <year>2018</year>
          )
          <fpage>257</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Procedural content generation by content type</article-title>
          ,
          <source>in: Artificial Intelligence and Games</source>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>312</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. O.</given-names>
            <surname>Stanley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Browne</surname>
          </string-name>
          ,
          <article-title>Search-based procedural content generation</article-title>
          , in: Applications of Evolutionary Computation:
          <article-title>EvoApplicatons 2010: EvoCOMPLEX, EvoGAMES</article-title>
          , EvoIASP, EvoINTELLIGENCE, EvoNUM, and EvoSTOC, Istanbul, Turkey, April 7-
          <issue>9</issue>
          ,
          <year>2010</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Y.-C. Chen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jhala</surname>
          </string-name>
          ,
          <article-title>Collaborative comic generation: Integrating visual narrative theories with AI models for enhanced creativity</article-title>
          ,
          <source>in: Proceedings of the 3rd Workshop on Artificial Intelligence and Creativity</source>
          , volume
          <volume>3810</volume>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Y.-C. Chen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jhala</surname>
          </string-name>
          ,
          <article-title>A customizable generator for comic-style visual narrative</article-title>
          ,
          <source>arXiv preprint arXiv:2401.02863</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>Narrative planning: Balancing plot and character</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>39</volume>
          (
          <year>2010</year>
          )
          <fpage>217</fpage>
          -
          <lpage>268</lpage>
          . annotations,
          <source>International journal of computer vision 123</source>
          (
          <year>2017</year>
          )
          <fpage>32</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hariharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Van Der</given-names>
            <surname>Maaten</surname>
          </string-name>
          , L.
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lawrence Zitnick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          ,
          <article-title>Clevr: A diagnostic dataset for compositional language and elementary visual reasoning</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2901</fpage>
          -
          <lpage>2910</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schaul</surname>
          </string-name>
          ,
          <article-title>A video game description language for model-based or interactive learning</article-title>
          ,
          <source>in: 2013 IEEE Conference on Computational Inteligence in Games (CIG)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>