<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>for Procedurally Generating Aesthetically Complex Environments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Timothy Merino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sam Earle</string-name>
          <email>sam.earle@nyu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franklin Yiu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohan Lu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nina Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Joseph</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tianxu Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julian Togelius</string-name>
          <email>julian@togelius.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>WaveFunctionCollapse, Markov Decision Process, Evolution, PCG</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>New York University</institution>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Procedural content generation often requires satisfying both designer-specified objectives and adjacency constraints implicitly imposed by the underlying tile set. To address the challenges of jointly optimizing both constraints and objectives, we reformulate WaveFunctionCollapse (WFC) as a Markov Decision Process (MDP), enabling external optimization algorithms to focus exclusively on objective maximization while leveraging WFC's propagation mechanism to enforce constraint satisfaction. We empirically compare optimizing this MDP to traditional evolutionary approaches that jointly optimize global metrics and local tile placement. Across multiple domains with various dificulties, we find that joint optimization not only struggles as task complexity increases, but consistently underperforms relative to optimization over the WFC-MDP, underscoring the advantages of decoupling local constraint satisfaction from global objective optimization.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Procedural Content Generation (PCG) automatically creates game content through algorithmic methods
often leveraging search or machine learning-based methods (PCGML). Learning-based methods in
particular have enabled the automated creation of game content that optimizes designer-specified
high-level objectives [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3, 4, 5</xref>
        ] that would be dificult to express or satisfy through traditional
algorithms such as WaveFunctionCollapse (WFC) [6]. However, many existing learning-based PCG
approaches are primarily applied to visually simplistic domains where the tile set imposes limited
adjacency constraints [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3, 4</xref>
        ]. We suspect that even methods which already produce satisfactory
results in a domain with more aesthetic adjacency constraints like Mario [7, 8, 9], would achieve
higher robustness and eficiency from ofloading the learning of the adjacency rules. We hypothesize
that optimization-based PCG methods struggle with complex aesthetic adjacency rules due to the
combinatorial explosion of valid tile configurations. To test this hypothesis, we compare traditional
approaches that must learn these constraints implicitly against our novel formulation that leverages
WFC’s constraint propagation mechanism.
      </p>
      <p>While WFC is inherently limited in its ability to optimize global functional properties such as
playability, it excels at enforcing fine-grained aesthetic coherence by propagating preferences over
possible tile placements via constraint solving [10, 11]. By reformulating WFC as a Markov Decision
Process (MDP) and embedding an explicit objective function within this framework, we enable a
more generalizable approach to guiding content generation toward designer-specified goals while
simultaneously maintaining adherence to aesthetic adjacency constraints.</p>
      <p>We optimize this MDP by evolving an action sequence using  + 
evolution. To understand the
implications of the MDP’s explicit adjacency constraint guarantees, we compare against evolving the</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
tiles of the final artifact directly, via both naive  +  evolution and Feasible Infeasible 2-Population
(FI-2Pop) [12], where the optimization algorithm must implicitly learn to resolve adjacency violations.</p>
      <p>We evaluate these optimization strategies across 3 domains. The binary domain, adapted from Khalifa
et al., imposes desired path length constraints as a simplified proxy for functional playability. Notably,
varying the target path length allows us to dynamically adjust problem dificulty, enabling a systematic
evaluation of how each optimization algorithm scales with increasing task complexity. On the other
hand, the biome domains define objective functions that capture global topographic features of distinct
biomes, guiding optimization toward semantically coherent and visually pleasing patterns. Finally, we
integrate both binary and biome objectives, requiring solutions that jointly optimize metrics pertaining
to form and function, thereby reflecting the multi-objective demands typical of real-world game design.</p>
      <p>Across all domains and desired path lengths, non-MDP methods (which incorporate WFC-style
constraints into objective functions instead of leveraging them via an MDP) perform noticeably worse;
evolving the MDP action sequence leads to faster and more consistent convergence. However, on
the most dificult objectives, evolving the action sequence converges very inconsistently, hinting at
exploration limitations of  +  evolution.</p>
      <p>Concretely, we ofer the following contributions:
• We demonstrate that forcing learning algorithms to learn local adjacency constraints leads to
degraded performance in highly constrained domains.
• We present a novel formulation of WFC as an MDP, along with a corresponding gym [13]
environment to facilitate the evaluation of alternative optimization algorithms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. WaveFunctionCollapse Algorithm and Modifications</title>
        <sec id="sec-2-1-1">
          <title>2.1.1. WaveFunctionCollapse</title>
          <p>WFC is an algorithm for procedural content generation, particularly for tile-based environments. It can
create coherent and visually consistent outputs based on simple input examples or constraints [14]. The
algorithm’s core functionality resembles constraint satisfaction with a quantum mechanics-inspired
approach: maintaining a superposition of possible states for each tile that gradually “collapses” to
definite states through observation and propagation steps [ 15].</p>
          <p>The algorithm operates in two main modes: the simple tiled model, which uses explicit adjacency
rules, and the overlapping model, which automatically extracts patterns from example inputs [16].
In the both models, WFC can be seen as a form of self-supervised learning that learns a distribution
from minimal examples—often just one—and generates similar content by maintaining the learned
adjacency patterns. While WFC excels at maintaining local adjacency constraints, it struggles with
global optimization objectives that are critical for gameplay, such as ensuring level solvability or
balanced resource distribution [10].</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Enhancements to WaveFunctionCollapse</title>
          <p>Researchers have developed numerous modifications to improve WFC’s capabilities beyond its original
formulation. Karth and Smith provide a comprehensive analysis of WFC as a constraint satisfaction
problem, establishing its theoretical foundations and positioning it within the broader context of PCG
approaches.</p>
          <p>To improve scalability to large environments, Nie et al. proposed Nested Wave Function Collapse
(NWFC), which decomposes the generation process into nested subproblems. This approach significantly
reduces time complexity from exponential to polynomial while maintaining consistency across the
generated content. For non-grid environments, researchers have developed graph-based extensions
of WFC that enable content generation for arbitrary topologies, enabling applications to 3D worlds
and non-uniform structures [10]. LNU introduce additional extensions to WFC including the path
constraint, which enforces global connectivity between specified tiles—addressing one of WFC’s key
limitations regarding global structure control. This path constraint is distinct from the path-length
objective optimized in the present work in that it focuses on paths between designer-specified tiles
as opposed to any two most distant tiles on the map, and only ensures the existence of such a path,
rather than making guarantees about its length. This distinction is significant because optimizing
path length requires global reasoning about the entire map structure, which traditional WFC cannot
perform. While the path constraint ensures connectivity exists, our work specifically optimizes the
longest shortest path between any two points, creating a more challenging optimization problem
that must balance local tile placement decisions with their global impact on path topology. 1 Cheng
et al. incorporates designer-driven heuristics to steer the generation process toward specific aesthetic
or functional outcomes by introducing an automatic rule system along with global, multi‑layer, and
distance constraints to provide designers with more control over level layouts [6]. Yet the approach
hinges on enumerated, non-local constraints—tile caps and anchored cells, hard distance windows
among entities, and layer-locking of assets—that must be re-engineered for each domain, limiting
generality compared with an objective-based formulation that subsumes these preferences in a unified
reward.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Evolutionary Approaches to PCG</title>
        <sec id="sec-2-2-1">
          <title>2.2.1. Evolutionary Algorithms</title>
          <p>Evolutionary algorithms, widely applied to PCG problems, are a flexible approach to searching the space
of possible game content while optimizing for designer-specified objectives [ 20]. These approaches
typically encode content as genomes that evolve through operations such as crossover and mutation,
with fitness functions guiding the search toward desired properties.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. FI-2Pop for Constrained Optimization</title>
          <p>The Feasible-Infeasible Two-Population (FI-2Pop) genetic algorithm [12], is a solution to constrained
optimization problems that maintains two separate populations: one of feasible solutions that optimize
the objective function, and another of infeasible solutions that minimize constraint violations. This
approach has proven particularly efective for PCG applications where content must simultaneously
satisfy hard constraints (e.g., playability) while optimizing soft objectives (e.g., challenge or balance).</p>
          <p>Sorenson and Pasquier applied FI-2Pop to procedural level generation, creating game maps that
satisfy playability constraints while optimizing for designer preferences. Later, Liapis et al. extended
FI-2Pop for constrained novelty search in game level design. This approach uses novelty metrics rather
than objective functions to drive search, resulting in a broader exploration of the feasible design space.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Procedural Content Generation via Reinforcement Learning (PCGRL)</title>
        <p>Khalifa et al. introduced PCGRL as a novel approach to procedural content generation that frames
level design as a sequential decision-making process optimized through reinforcement learning. By
formulating level generation as a Markov Decision Process (MDP), PCGRL enables an agent to learn a
policy that maximizes expected level quality through incremental modifications. This approach can
operate without human-authored examples and generates content extremely quickly once trained.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Domain</title>
      <p>All maps and thus objective functions were constructed based on a small subset of Biome Tileset Pack
B - Grassland, Savannah, and Scrubland [24] which ofers combinations of diferent grass, path, and
1We note that future work could attempt to optimize path-lengths between specific tiles with a simple modification to our
path-length objective function.
water tiles. We used a subset of the tiles shown in Figure 1. We generate the adjacency rules via manual
human labeling, though they could also be extracted from an input image.</p>
      <sec id="sec-3-1">
        <title>3.1. Binary</title>
        <p>
          The binary domain (Figure 2), originally introduced in PCGRL [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], tasks the generator with modifying
an existing map composed of solid and empty tiles to increase the longest shortest path between any
two empty tiles by at least 20 tiles, while ensuring full connectivity of the empty space. In contrast,
our formulation requires generating valid maps entirely from scratch that satisfy prescribed path
length constraints, rather than modifying preexisting layouts. We also impose a stricter requirement,
penalizing deviations from the target by requiring the generated map to achieve a path length of exactly
 . Formally, for a generated map with longest shortest path length  , the objective function awards a
score of −| −  | . While the binary domain in PCGRL comprises a simple “binary” tile-set of wall and
empty tiles, we use a more sophisticated tileset in which adjacent tiles—e.g. path and grass tiles—can
share straight or rounded edges, creating visually smooth transitions between tile-types.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Biome/Binary Hybrid</title>
        <p>The hybrid domain adds complexity by dictating the biome surrounding the binary path to better
simulate the needs of real game maps. We consider 2 biomes which specify diferent distributions
and arrangements of tiles. All percentage calculations are taken from the tiles of the final artifact, not
including those used in binary path calculations. For example, if the objective specifies 50% water tiles,
this means 50% of the tiles that are not path tiles must be water. We specify the biome objectives below.
• River (  ).</p>
        <p>Let
We then define:</p>
        <p>the number of connected river regions,
ℓ the length of the current river path,</p>
        <p>the number of water “center” tiles,
 the number of connected land regions.</p>
        <p>= (1 −   ) + min(0, ℓ − 35)
−  
+ min(0, 3 −   ) .</p>
        <p>The objective attains its maximum value of 0 exactly when all of the following hold:
–   = 1 (exactly one contiguous river region),
– ℓ ≥ 35 (river path length of at least 35 tiles),
–   = 0 (no fully surrounded water tiles),
–   ≤ 3 (no more than three separate land regions).
it from interrupting the river (Figure 3a).</p>
        <p>Here, the (1 −   ) term enforces a single connected channel, the min(0, ℓ − 35) term rewards
reaching the target length of 35 tiles, the −  penalty encourages a thin, winding river (few
interior water tiles), and the cap on   discourages excessive fragmentation of the land to prevent
• Field (  ) .</p>
        <p>Let
We then define:
 
 ℎ


the number of water tiles,
the number of hill tiles,
the percent of grass tiles,
the percent of flower tiles.</p>
        <p>= −   −  ℎ + min(0,  − 20)</p>
        <p>+ min(0,  − 20) .</p>
        <p>The objective attains its maximum value of 0 exactly when all of the following hold:
–   = 0 (no water or shore tiles),
–  ℎ = 0 (no hill tiles),
–  ≥ 20 (at least 20% of tiles are grass),
–  ≥ 20 (at least 20% of tiles are flowers).
(1)
(2)
(a) River Biome
(b) Field Biome
Here, the −  and − ℎ penalties enforce a clear, unbroken grassy field, while the min(0,  −20) and
min(0,  − 20) terms ensure minimum coverage of grass and flowers. Together, these components
encourage the generation of open grasslands with suficient floral detail (Figure 3b).</p>
        <p>Since all objective functions have a max reward of 0, we can enforce both binary and biome features
by optimizing over the sum of the two objective functions. As such, we get the following objective
functions:
• hybrid river/binary:   = −| −  | +  
• hybrid field/binary:    = −| −  | +   .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methods</title>
      <sec id="sec-4-1">
        <title>4.1. Direct Map Evolution</title>
        <p>All optimization methods have various hyperparameters detailed in Section 9 of the Appendix.
These methods operate directly on the final artifact and do not leverage WFC. Instead, the optimization
process must learn to satisfy the adjacency rules. For a target map of length ℓ and width  , the genotype
is represented as a 2D array of size ℓ ×  , where each entry contains an integer corresponding to a
tile index in the tileset. For example, if position (,  ) contains value  , then the  th tile is placed at
coordinate (,  ) in the artifact, irrespective of whether this placement violates adjacency constraints.
Baseline Evolution. The baseline evolutionary algorithm treats each map genotype as an individual
and applies standard genetic operators with a penalized fitness that subtracts adjacency violations from
raw objective function (Algorithm 1). Given objective score  and  adjacency violations, the individuals
will receive a fitness of  −  . Over  generations it maintains a population of size  , selecting the top
 individuals by fitness each round.</p>
        <p>FI-2Pop. FI-2Pop [12] attempts to leverage adjacency violations as an exploration medium by
maintaining two equal–sized subpopulations, feasible ( ) and infeasible ( ), and applies tailored selection
criteria to each: objective maximization in  and violation minimization in  (Algorithm 2). Since  is
not constrained by the objective function, it is free to explore the boundaries of infeasibility where
optimality may lie. The ofspring are generated separately to replenish each subpopulation to size  /2 .
Algorithm 1 Baseline Evolution
Input: generations  , population size  , survival rate 
Output: Best genome found</p>
        <p>Initialize population  with  random genomes
return best feasible genome in</p>
        <p>genomes in  by  
Algorithm 2 FI‑2Pop Evolution
Input: generations  , population size  , survival rate 
Output: Best genome found</p>
        <p>Initialize population  with  random genomes
Each genome  has an objective score   , adjacency-constraint violation penalty   , and fitness   .
Initialize empty sets  and 
(  ,   ) ← Evaluate()
if   = 0 then</p>
        <p>←  ∪ {}
else</p>
        <p>←  ∪ {}
return best genome in 
genomes in  by objective score (  )
genomes in  by lowest violation (  )</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. MDP Representation.</title>
        <p>By formalizing WFC as a Markov Decision Process (MDP), we leverage its guarantees to ofload
the burden of learning adjacency constraints from the optimizer. This reformulation transforms the
generation problem into a sequential decision process where every action results in a valid intermediate
configuration.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. State Representation</title>
          <p>encodes the feasibility of a tile at a cell:
We define each state   as the current configuration of the WFC grid at timestep  . For a target map of
size ℓ ×  with   tile types,   is represented by an ℓ ×  ×   binary tensor   , where each depth slice
•   [,  , ] = 1</p>
          <p>indicates that tile type  is currently feasible at cell (,  ) .</p>
          <p>Algorithm 3 WFC-MDP: One Environment Step
Input: belief grid  ∈ {0, 1} ℓ××  , adjacency rules  , agent action (tile logits) 
Output: updated grid  ′, reward 
( ⋆,  ⋆) ← FindLowestEntropyCell()
 ← [
⋆,  ⋆, ∶]</p>
          <p>if ( ⋆,  ⋆) = ∅ or ∑=1   = 0 then
for all t ∈ {1, … ,   } do
indicates that tile type  is infeasible at cell (,  ) .
uncollapsed (∑= 1   [,  , ] &gt; 1
all ℓ ×  cells have one-hot feasibility vectors), yielding the final artifact.</p>
          <p>A cell is collapsed if and only if it has a one-hot feasibility vector (∑= 1   [,  , ] = 1
); otherwise it is
). The MDP terminates when every cell is collapsed (equivalently, when</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Action Representation</title>
          <p>At each timestep  , the action   specifies which tile to collapse in a single uncollapsed cell (Algorithm
It is parameterized as an   -dimensional logit vector, with each entry corresponding to a possible
tile; these logits lead to corresponding per tile probability after softmax. To ensure compliance with
3).
adjacency constraints, we mask out invalid tiles by setting their probabilities to zero. The selected
action is then arg max over the probabilities, ensuring that each collapse step is constraint-respecting.
Since exactly one tile is collapsed per action, a complete map requires a sequence of ℓ ×  actions.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.2.3. Objective Structure</title>
          <p>We adopt a sparse objective model: intermediate states contribute no score, and only the terminal state
is evaluated using the objectives defined in Section</p>
          <p>3. If WFC enters a contradiction where a cell has no
valid tiles remaining, the process truncates immediately and incurs a large negative objective of -1000,
thus discouraging invalid map constructions.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evolving an Action Sequence</title>
        <p>We use a standard  +</p>
        <p>evolutionary algorithm to optimize the full sequence of WFC collapse actions
(Algorithm 4). Each individual in the population encodes a fixed-length sequence of collapse decisions,
represented as logits over the tile set at each of the ℓ ×  positions. Two genotype shapes are supported:
• 1D representation: a flat sequence of length ℓ ×  such that element at position  is the action
applied at time step  .
• 2D representation: a grid-aligned sequence of dimensions ℓ ×  , where each action’s position is
inferred from WFC’s next-collapse coordinates. Action at genotype coordinate (,  )
corresponds
to collapsing the tile at (,  ) .</p>
        <p>Algorithm 4 Evolving an Action Sequence
Input: generations  , population size  , survival rate</p>
        <p>Initialize population  with  action sequences
for all genome  ∈  do
 ←  0;   ← 0;   ← 0
for  = 1 to ℓ ×  do

 ← []
( ′,   ) ← env.step(  )
 
 ←  ′</p>
        <p>←   +  
end for
end for
 ←</p>
        <p>elites ∪ ofspring
end for
return best feasible genome in 
elites ← top ⌈   ⌉ genomes in  by  
ofspring</p>
        <p>Initialize population  with ( − | elites|) action sequences</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We evaluate each optimization method across desired path lengths 10-100 in intervals of 10 for both
binary and hybrid domains. Convergence robustness is defined as the proportion of runs that achieve
the maximal reward of 0, while sample eficiency is measured by the number of generations it takes to
evolve at least one population member with reward of 0. All methods were run with a fixed sample
budget (population size = 48) to enable fair comparisons.
longer path lengths correspond to increased dificulty as all methods show reduced convergence rates
and increased generations to convergence with rising desired path length  .</p>
      <p>However, even at low dificulty (  ≤ 40 ), non-MDP baselines occasionally fail to converge. In contrast,
both MDP-based evolution strategies exhibit perfect or near-perfect convergence and require fewer
generations. This early divergence suggests that directly modeling constraint satisfaction via WFC
confers immediate robustness advantages, even in seemingly simple settings.</p>
      <p>As  increases, the performance gap widens. At  = 60 , non-MDP methods converge in less than
20% of runs and exhibit ineficiency when they do, with baseline requiring over 170 generations on
average. Evolution 1D and 2D remain below 60 generations and achieve convergence in 84% and 72% of
runs respectively, underscoring their superior scalability.</p>
      <p>For  ≥ 70 , all non-MDP methods fail almost entirely. MDP methods, while less consistent, still
achieve convergence in up to 35% of runs (1D) and 15% (2D). Interestingly, the baseline was able to
converge once, in relatively few generations. This outlier suggests a rare, favorable initialization or
fortuitous stochasticity early on in the evolutionary process rather than systematic optimization success.</p>
      <p>In the hybrid river/binary domain (Table 2), MDP-based methods again demonstrate clear robustness
advantages. For moderate dificulty ( 20 ≤  ≤ 40 ), 1D and 2D evolution achieve convergence in at
least 36% and 45% of runs respectively, requiring roughly 50–100 generations on average. In contrast,
baseline and fi2pop converge in at most 14% and 4% of runs, and when they do succeed, require over
80–120 generations. At  = 50 , both MDP methods maintain non-trivial success rates (≥19%) with
convergence typically within 80–100 generations, whereas baseline converges only 4% of the time and
if2pop fails entirely. Beyond this threshold, only MDP methods achieve any convergence at all: both 1D
and 2D occasionally succeed for  ≥ 60 , though with growing variance in required generations, while
non-MDP methods collapse to zero convergence. The performance gap is more stark in Table 3 and the
hybrid field/binary domain. Non-MPD methods completely fail to converge across almost all target
path lengths.</p>
      <sec id="sec-5-1">
        <title>5.1. Implications and Cross-Domain Insights</title>
        <p>Two key patterns emerge:
1. MDP Encapsulation of Constraints is Crucial: Across all desired path lengths, methods that
ofload constraint enforcement to WFC consistently outperform those that must learn it implicitly.
This discrepancy is especially pronounced given more dificult objectives (i.e. higher target path
lengths, hybrid biome/binary domains), where the feasibility space is severely constrained.
2. Feasible Region Shrinkage Limits Optimization: At high path lengths, even MDP methods
fail to converge reliably. This likely stems from the exponentially shrinking volume of the feasible
space and limited tendency toward exploration in vanilla  +  evolution—as compared to e.g.
Quality Diversity [25]. Despite valid intermediate states, the reward landscape remains highly
sparse and multi-modal.</p>
        <p>These findings highlight a fundamental insight: procedural generation under complex constraints
benefits most when constraint satisfaction is externalized and search is guided through structurally
aligned representations. The clear failure of joint optimization approaches, particularly in aesthetically
constrained domains, emphasizes the importance of architectural modularity in generative design
systems.
62.0±6.5 59.7±6.0 77.3±5.7 98.5±10.4 78.7±9.5
0.42 0.45 0.59 0.46 0.28
38.3±9.8 49.7±6.0 52.2±4.7 51.2±6.3 48.9±8.0
0.11 0.45 0.42 0.36 0.19
98.5±11.5 89.6±18.7 114.6±25.9 111.7±22.3 70.7±13.0
0.03 0.09 0.12 0.14 0.04
40.5±39.5 48 136 82.3±40.8 —
0.03 0.01 0.01 0.04 —
—
—
36
0.01
—
—
—
—</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <sec id="sec-6-1">
        <title>6.1. Other Biomes</title>
        <p>WFC-MDP is compatible with various domains without necessitating the bespoke engineering required
by more traditional extensions to WFC. By changing the objective function, WFC-MDP is able to
optimize for other gameplay artifacts like pond and hill biomes (Figure 4). These biomes were excluded
from the main results as they were not adapted to work in conjunction with binary either because they
proved excessively dificult to optimize or were not structured to allow long continuous paths; instead
they illustrate the flexibility of WFC-MDP. We leave the optimization of these more challenging hybrid
objectives to future work that builds on our WFC-MDP formulation.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Representation Matters</title>
        <p>We evaluated two encoding schemes for collapse sequences: a 2D representation that maps actions to
ifxed spatial locations, and a 1D representation that encodes a strict sequential order. The 2D encoding
reduces the risk of early mutations cascading across the map by localizing genetic variation, resulting
in smoother and more stable optimization dynamics. Conversely, the 1D encoding allows mutations
in earlier steps to significantly influence subsequent tile collapses, which introduces instability but
(a) Pond Biome
(b) Pond and Hill Biome
(c) Hill Biome
encourages broader exploration. Empirically, we observe that 1D encodings tend to converge more
frequently possibly due to this exploratory behavior. In contrast, 2D encodings exhibit better sample
eficiency, aligning with their more localized impact. These trade-ofs suggest promising directions for
future research, such as hybrid encoding schemes that balance exploration and stability.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Toward Scalable and Interactive Generation</title>
        <p>Our method operates on consumer hardware, making it viable for map generation during the game
development process. However, as problems get more complex, live evolution becomes time consuming.
Further research can explore learning generalized policies that are capable of generating multiple
artifacts from single training instance. For example, one could train a Reinforcement Learning policy to
output/edit the tile-type logits in our 1D or 2D evolutionary genomes given varying target path-lengths
as in [26], with reward equal to the score of a map collapsed via WFC over these logits. Alternatively,
one could apply Imitation Learning to the data generated by the evolutionary processes in this paper,
similar to [27], and/or use such Imitation Learning to jump-start the RL process outlined above. Future
work could explore additional heuristics beyond path length optimization to test the generality of the
WFC-MDP approach. Additionally, RL extensions could leverage the gym environment to test alternative
optimization methods on the WFC-MDP formulation.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations</title>
      <sec id="sec-7-1">
        <title>7.1. Optimization Parameters</title>
        <p>We used a fixed population size of 48 across all algorithms due to hardware constraints and to maintain
a consistent measure of sample eficiency. This choice may not reflect the optimal configuration for
each method. In particular, FI‑2Pop’s dual‑population architecture may be disproportionately afected
by smaller population sizes.</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Observation Utilization</title>
        <p>In an MDP formulation, the agent’s observation can inform optimal action selection. Agents can make
more optimal tile selections if given the partially collapsed map and the next collapse position. However,
our evolution-based implementations operate over the full action sequence in advance and do not
leverage intermediate observations. By instead evolving e.g. a neural network controller to output tile
logits at each step of the WFC-MDP, these observations could be leveraged to improve performance.</p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Optimization Limitations</title>
        <p>Although MDP-based methods demonstrate improved consistency and sample eficiency on challenging
tasks, the standard  +  evolutionary strategy often exhibits inadequate exploration, leading to poor
convergence on the hardest problems. We hypothesize that the large PCG state space and highly
multimodal fitness landscape demand algorithms with stronger exploration capabilities, such as Quality
Diversity evolutionary algorithms [25] or Novelty Search [28].</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>This work recasts WaveFunctionCollapse (WFC) as a Markov Decision Process (MDP), enabling
optimizers to sidestep the combinatorial burden of learning tile adjacency rules by ofloading constraint
enforcement to WFC’s propagation mechanism. By evaluating this formulation across various scalable
and progressively constrained domains, we uncover a central insight: explicit algorithmic decoupling of
constraint satisfaction from objective optimization dramatically improves both convergence reliability
and sample eficiency.</p>
      <p>Our WFC-MDP framework not only succeeds where traditional methods collapse, but also reveals
how reframing generation as a sequence of valid state transitions exposes a richer interface for guidance
and learning. The domain’s scalability allowed us to observe how even strong inductive biases (like
constraint propagation) falter without suficient exploration capacity, suggesting fertile ground for
hybrid learning approaches.</p>
      <p>Looking ahead, our formulation opens the door to more controllable and expressive uses of WFC.
Because objective functions are more generalizable than hard-coded global constraints, they provide a
natural bridge to machine learning methods, particularly those that rely on flexible reward signals or
policy learning. By enabling WFC to support objective driven control over layout generation, this work
establishes a blueprint for scalable, constraint-aware, and ML-compatible PCG systems.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Appendix</title>
      <sec id="sec-9-1">
        <title>9.1. Final Hyperparameter Settings</title>
        <p>9.1.1. Hyperparameter Definitions
number_of_actions_mutated_mean (int) Expected number of genome actions to mutate each
generation.
number_of_actions_mutated_standard_deviation (float) Standard deviation for truncated‐normal
sampling of the mutation count.
action_noise_standard_deviation (float) Standard deviation of Gaussian noise added to each
mutated action’s value.
survival_rate (float) Fraction of the population preserved into the next generation.
cross_over_method (enum) Crossover strategy:
0 = UNIFORM Gene‐wise mixing: each child gene is chosen from one parent with 50%
probability.</p>
        <p>1 = ONE_POINT Single cut‐point crossover: parents swap tails at one random index.
cross_or_mutate_proportion (float) Fraction of ofspring produced via crossover.</p>
      </sec>
      <sec id="sec-9-2">
        <title>9.2. Convergence Behavior Plots</title>
        <p>Plots correlated with the tables in Section 5 (Figure 5).</p>
        <p>(a) Plot of Table 1
(b) Plot of Table 2
(c) Plot of Table 3
10. Declaration on Generative AI
There was no use of generative AI in this paper.
[4] S. Earle, F. Kokkinos, Y. Nie, J. Togelius, R. Raileanu, Dreamcraft: Text-guided generation of
functional 3d environments in minecraft, in: Proceedings of the 19th International Conference on
the Foundations of Digital Games, 2024, pp. 1–15.
[5] A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen,
J. Togelius, Procedural content generation via machine learning (pcgml), IEEE Transactions on
Games 10 (2018) 257–270.
[6] D. Cheng, H. Han, G. Fei, Automatic generation of game levels based on controllable wave function
collapse algorithm, in: Entertainment Computing–ICEC 2020: 19th IFIP TC 14 International
Conference, ICEC 2020, Xi’an, China, November 10–13, 2020, Proceedings 19, Springer, 2020, pp.
37–50.
[7] H. J. Lee, E. Simo-Serra, Using unconditional difusion models in level generation for super mario
bros, in: 2023 18th International Conference on Machine Vision and Applications (MVA), IEEE,
2023, pp. 1–5.
[8] Z. Wang, J. Liu, G. N. Yannakakis, The fun facets of mario: Multifaceted experience-driven pcg via
reinforcement learning, in: Proceedings of the 17th International Conference on the Foundations
of Digital Games, 2022, pp. 1–8.
[9] T. Shu, J. Liu, G. N. Yannakakis, Experience-driven pcg via reinforcement learning: A super mario
bros study, in: 2021 IEEE Conference on Games (CoG), IEEE, 2021, pp. 1–9.
[10] H. Kim, S. Lee, H. Lee, T. Hahn, S. Kang, Automatic generation of game content using a graph-based
wave function collapse algorithm, in: 2019 IEEE conference on games (CoG), IEEE, 2019, pp. 1–4.
[11] I. Karth, A. M. Smith, Wavefunctioncollapse: Content generation via constraint solving and
machine learning, IEEE Transactions on Games 14 (2021) 364–376.
[12] S. O. Kimbrough, G. J. Koehler, M. Lu, D. H. Wood, On a feasible–infeasible two-population (fi-2pop)
genetic algorithm for constrained optimization: Distance tracing and no free lunch, European
Journal of Operational Research 190 (2008) 310–327. URL: https://www.sciencedirect.com/science/
article/pii/S0377221707005668. doi:https://doi.org/10.1016/j.ejor.2007.06.028.
[13] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai
gym, arXiv preprint arXiv:1606.01540 (2016).
[14] M. Gumin, Wave Function Collapse Algorithm, 2016. URL: https://github.com/mxgmn/</p>
        <p>WaveFunctionCollapse.
[15] I. Karth, A. M. Smith, Wavefunctioncollapse is constraint solving in the wild, in: Proceedings of
the 12th International Conference on the Foundations of Digital Games, FDG ’17, Association for
Computing Machinery, New York, NY, USA, 2017. URL: https://doi.org/10.1145/3102071.3110566.
doi:10.1145/3102071.3110566.
[16] R. Heaton, The wavefunction collapse algorithm explained very clearly, 2018. URL: https:
//robertheaton.com/2018/12/17/wavefunction-collapse-algorithm/.
[17] I. Karth, A. M. Smith, Wavefunctioncollapse: Content generation via constraint solving and
machine learning, IEEE Transactions on Games 14 (2022) 364–376. doi:10.1109/TG.2021.3076368.
[18] Y. Nie, S. Zheng, Z. Zhan, X. Song, Extend wave function collapse algorithm to large-scale content
generation, 2023 IEEE Conference on Games (CoG) (2023) 1–8. URL: https://api.semanticscholar.
org/CorpusID:260886968.
[19] B. LNU, Wave function collapse tips and tricks, 2020. URL: https://www.boristhebrave.com/2020/
02/08/wave-function-collapse-tips-and-tricks/.
[20] J. Togelius, G. N. Yannakakis, K. O. Stanley, C. Browne, Search-based procedural content generation:
A taxonomy and survey, IEEE Transactions on Computational Intelligence and AI in Games 3
(2011) 172–186. doi:10.1109/TCIAIG.2011.2148116.
[21] N. Sorenson, P. Pasquier, Towards a generic framework for automated video game level creation, in:
C. Di Chio, S. Cagnoni, C. Cotta, M. Ebner, A. Ekárt, A. I. Esparcia-Alcazar, C.-K. Goh, J. J. Merelo,
F. Neri, M. Preuß, J. Togelius, G. N. Yannakakis (Eds.), Applications of Evolutionary Computation,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 131–140.
[22] A. Liapis, G. N. Yannakakis, J. Togelius, Constrained novelty search: A study on game
content generation, Evolutionary Computation 23 (2015) 101–129. URL: https://doi.org/10.1162/
EVCO_a_00123. doi:10.1162/EVCO_a_00123.
arXiv:https://direct.mit.edu/evco/articlepdf/23/1/101/1514786/evco_a_00123.pdf.
[23] A. Khalifa, P. Bontrager, S. Earle, J. Togelius, Pcgrl: procedural content generation via
reinforcement learning, in: Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence and
Interactive Digital Entertainment, AIIDE’20, AAAI Press, 2020.
[24] VectoRaith, Biome tileset pack b - grassland, savannah, and scrubland, https://vectoraith.itch.io/
biome-tileset-pack-b, 2025.
[25] J. K. Pugh, L. B. Soros, K. O. Stanley, Quality diversity: A new frontier for evolutionary computation,</p>
        <p>Frontiers in Robotics and AI 3 (2016) 40.
[26] S. Earle, M. Edwards, A. Khalifa, P. Bontrager, J. Togelius, Learning controllable content generators,
in: 2021 IEEE Conference on Games (CoG), IEEE, 2021, pp. 1–9.
[27] A. Khalifa, J. Togelius, M. C. Green, Mutation models: Learning to generate levels by imitating
evolution, in: Proceedings of the 17th International Conference on the Foundations of Digital
Games, 2022, pp. 1–9.
[28] J. Lehman, K. O. Stanley, Evolving a diversity of virtual creatures through novelty search and
local competition, in: Proceedings of the 13th annual conference on Genetic and evolutionary
computation, 2011, pp. 211–218.
[29] T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter
optimization framework, in: Proceedings of the 25th ACM SIGKDD international conference on
knowledge discovery &amp; data mining, 2019, pp. 2623–2631.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Todd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Earle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Nasir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Level generation through large language models</article-title>
          ,
          <source>in: Proceedings of the 18th International Conference on the Foundations of Digital Games</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Middleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Merino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kanagaraja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          , Moonshine:
          <article-title>Distilling game content generators into steerable generative models</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>39</volume>
          ,
          <year>2025</year>
          , pp.
          <fpage>14344</fpage>
          -
          <lpage>14351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bontrager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Earle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          , Pcgrl:
          <article-title>Procedural content generation via reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>16</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>