<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Shields);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Tool to Explore and Evaluate Large Spaces of Playtrace Metrics Through User-Defined Curves</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samuel Shields</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noah Wardrip-Fruin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward F. Melcer</string-name>
          <email>edwardmelcer@cunet.carleton.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Game Balancing, Playtrace Analysis, Player Modeling, Dramatic Arc, Design Tools</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carleton University</institution>
          ,
          <addr-line>1125 Colonel By Dr, Ottawa, ON K1S 5B6</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California</institution>
          ,
          <addr-line>Santa Cruz, 1156 High St, Santa Cruz, CA 95064</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Playtraces are artifacts produced during playtesting that tell a story about how game systems operate and what actions players take at runtime. The playtrace contains relevant metrics in a game alongside the metrics' rising and falling throughout player progression. These curves inform designers about their players' experiences and open opportunities to implement player-adaptive design strategies. To help improve the iterative design process around playtrace analysis, we introduce the Playtrace Arc Search (PAS) tool. PAS allows designers to search through a large corpus of playtrace data to find the system metric curves that match their design intent by drawing a desired progression arc on a canvas and then using that to perform a curve-similarity search over all playtraces. PAS enables designers to see a set of playtraces as a summarized whole, then search against that whole to find specific, meaningful gameplay data that can confirm or reject their hypotheses about game and player performance. PAS showed success on an initial evaluation of 1,000 playtraces; this, combined with its game- and metric-agnostic capabilities, indicates that PAS might be a useful tool for designers to rapidly discover whether their design hypotheses are accurately played out in metric progression curves with desired levels of consistency.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A playtrace from a video game is defined by Osborn et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as “a sequence of player actions
corresponding to one play of the game.” As players perform actions in a game, observable metrics are
produced as a direct or indirect impact of changing the system. Such metrics might include parameters
relating to a game goal (e.g., how many items have been collected), bug prevention (e.g., has a player
reached a checkpoint in a race too early), or updating a player model (e.g., what level of dificulty the
player is experiencing right now). The role of such metrics in design strategies such as Dynamic Game
Balancing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as well as scoring games for quality in A/B testing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] helps underscore the utility of
metric production at both design time and runtime. These metrics are frequently trivially available due
to their critical role in modifying game systems, and are also useful to provide system observability of a
game system over a play session. Designers have the responsibility not only to tune input parameters
to a game system to ensure a desired play experience, but also to utilize these observable metrics to
iterate on their design strategy and confirm their hypotheses about runtime operations of games. The
value of producing understandable and meaningful playtraces is, as such, a meaningful step in iterative
game design — especially when it comes to human or automated playtesting.
      </p>
      <p>
        That being said, playtrace collection and analysis rarely involve examining a single, predetermined
session of gameplay data. Tools surrounding playtraces usually deal in aggregates of thousands to
millions of traces compiled together and visualized in some format. For instance, heatmaps [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], player
modeling [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], and balancing patches [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are just a few examples of how playtrace composites are
currently used to augment design strategies. While such tooling is valuable for pattern identification in
playtrace datasets, it can put the designer’s understanding of their desired, emergent play experience in
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
the back seat to a global summary of data. Furthermore, such approaches might make it challenging
to identify interesting examples of playtraces that could become meaningful case studies for further
design investigations. As such, a tool that allows a designer to show what their intended experience is
for a set of playtraces and to understand how consistent (or, if desired, diverse) a set of traces is quickly
could provide a valuable strategy to rapidly understanding system characteristics during gameplay.
Supplementing this with the identification of important examples of their desired experience then
provides such an approach with the opportunity to do deeper investigations of important games played
and even perform parameter tuning based on a given playtrace match.</p>
      <p>
        To address this gap in designer tooling, we introduce the Playtrace Arc Search (PAS) tool, which
enables the searching of a large corpus of playtraces according to a user-defined curve or “arc”. PAS
reads a set of structured playtrace data, has the user select game and/or system metrics to analyze, then
allows the user to “draw” a curve to search for playtrace arcs that fit their arc. It also allows a designer
to quickly understand if a set of playtraces exhibit similar or diverse behaviors. In both of these modes,
PAS provides an option for rapid iterative design cycles, where the identification of some systematic
arc (such as drama, emotional intensity, dificulty) allows a designer to home in on important qualities
of a specific gameplay session. By quickly showing a designer where to focus, PAS can tighten the
playtesting loop, which is cited as one of the most expensive and time-consuming parts of the game
development lifecycle [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this work, we describe the system architecture of PAS, perform a brief set
of evaluations on a corpus of 1,000 playtraces, and discuss the design implications of applying this tool
in both manual and procedural design contexts. A screenshot of the tool can be seen in Fig. 2, and the
tool is available for use on GitHub.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Co-Creative Tooling</title>
        <p>
          Co-creative systems and research tooling involves the alteration of a designer’s workflow to include
computational systems, which helps produce work with greater speed, quality, or expressive range.
[
          <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10, 11, 12, 13</xref>
          ]. Such systems often either provide active assistance in creating an output (such as in
Liapis et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]’s Sentient Sketchbook, where a user-drawn map is used to generate a level) or in scoring
and evaluating some user-defined artifact (as in Migkotzidis and Liapis [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]’s SuSketch, which provides
predictive evaluation of a level design). Our system falls into the latter category, and lands somewhere
between a creative tool and a data analytics platform. By allowing a designer to specify what types
of emergent outcomes they desire, we aford a designer the ability to determine if their current game
system is capable of the experiences they desire. If it is (or is not), a designer can then use specific
playtraces to understand what parameters, tuning, and gameplay actions were critical to producing the
playtrace they observed.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Parameter Tuning in Games and PCG</title>
        <p>
          Video games are inherently parameterized systems. How high a character jumps or when an AI should
perform an action (among many, many other scenarios) are all determined by a designer’s decision
on how the corresponding mechanic and parameter is tuned. As such, both static and procedurally
generated content (PCG) are dependent on how a system is parameterized and tuned for a given
interaction. The pursuit of automated parameter tuning and parameter tuning analysis is thus well
researched and prominent in academic literature. Summerville et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] identifies parameter tuning as
one of the greatest unsolved challenges in PCG, and good approaches to parameter tuning provide the
opportunity to greatly improve artifact quality. There are many approaches to performing automated
parameter tuning — an algorithmic example is the usage of evolutionary search algorithms, which score
and iterate a swath of parameters until some desired fitness is reached [
          <xref ref-type="bibr" rid="ref17 ref18 ref19 ref20">17, 18, 19, 20</xref>
          ]. An analytics
approach is shown in Withington and Tokarchuk [21]’s approach to identifying which parameters and
1https://github.com/smshields/ArcSearch
metrics might be most valuable when performing Expressive Range Analysis [22]. The importance of
understanding which parameters and how a designer should tune them are thus critical facets of an
iterative game design loop, especially when it comes to systematic tuning for emergent or experiential
game properties such as game balance. This identification and modification of parameters in turn
depends on the quality and observability of system metrics that tell the designer how a game progressed
— metrics which are commonly reported through playtrace data.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Playtrace Analysis</title>
        <p>Playtrace collection and analysis has been a critical area of game design and research for well over a
decade. Drachen and Canossa [23] make the case that gameplay metrics provide critical information
about player behavior, and that any action a player can take can potentially be measured. Wallner
[24] discuss the importance of large playtrace and player metric corpora, especially when it comes to
the production of visualization tooling such as heatmaps. Such large sets of playtraces, however, are
inherently dificult to parse and nigh impossible to investigate on a case-by-case basis [ 25]. This does
not mean that investigating an individual playtrace is unimportant — the study of individual games
between players has historically been critical to the study of games such as Chess [26] and Go [27]
and is a common attribute of eSports (such as in Starcraft [28]) and professional gaming commentary
[29, 30]. The ability to both understand trends at large in a set of playtrace data and identify critical
examples to investigate provides a solid foundation for thorough analysis of playtesting data.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Player Modeling</title>
        <p>
          Player modeling refers to the practice of making a system-defined model of how a player might behave,
feel, or think before, during, and after gameplay [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Player modeling is key to understanding how a
system should respond to the input provided by a given audience, and is a common practice during
game design [31] and the construction of procedural systems [32]. Modeling is particularly relevant
in adaptive game systems such as dynamic dificulty adjustment (DDA), where modeling perceived
dificulty allows a designer to appropriately modify a game’s properties to meet a player on their skill
level [33]. This is exemplified by Valve’s Left 4 Dead [34] AI director (AID), which uses a calculated
metric for “emotional intensity” as a means to define gameplay curves. In pursuit of roughly sinusoidal
cycles of rising tension, climax, and falling tension, system perception of emotional intensity drives
enemy spawn volume and timing [35]. It is important to note that player modeling is often a designed,
composite of many gameplay metrics. In the case of Left 4 Dead [34], there is no raw value measurement
of “emotional intensity” broadcast from the player’s limbic system; it is calculated from a combination
of in-game metrics. Thus, we can easily visualize and operate diferent dimensions of player modeling
through these composite metrics that are created by a game’s designer. The ability to understand if your
player modeling is 1) correctly tuned to player experience and 2) allows for the correct manipulation of
system interaction is in turn critical for a designer to meet their intended experiential goals. A generic
version of how a modeled metric changes throughout a playtrace can be seen in Fig. 1.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System Description</title>
      <p>The Playtrace Arc Search (PAS) tool (Fig. 2) aims to help designers accomplish three goals:
1. Understand overall distribution and consistency of playtrace data
2. Search large sets of playtrace data for desired system arcs
3. Inspect individual playtraces of interest, with references to their source data
PAS aims to satisfy these goals for a specific drawn playtrace within seconds, allowing for multiple
searches to be performed in quick succession. It accomplishes this by parsing a large set of structured
JSON data and providing a point cloud representation of user-defined properties of said JSON data.
The user can enable or disable this cloud on top of a canvas, which allows the user to draw a desired
curve that they’d like to search for within the playtrace corpus. After selecting search strategies using
curve-matching approaches, a list of similarity-scored results (Fig. 3) is presented for the user to
inspect. The system is web-native and uses graphing (Chart.js2), canvas (Fabric.js3), Fréchet Distance
(Curve-Matcher4), and Dynamic Time Warping (Dynamic-Time-Warping5) libraries for user-input,
visualization, and search strategy. The full code and demo for PAS is available online.1</p>
      <sec id="sec-3-1">
        <title>3.1. Metric Selection, Logging, Formatting</title>
        <p>PAS focuses on the analysis of how important gameplay metrics change throughout playtraces. These
metrics are sometimes raw data (e.g., how much HP does a character have), but can also be productively
calculated and composited into player-modeling metrics tied to player actions or progression. The
metrics recorded at each step can be thought of as dependent variables, which PAS visualizes on the
y-axis of its arc drawing. On the x-axis, a discrete progression measurement is used to map each
consecutive data entry (e.g., gameplay tick). While a common progression measurement used for
playtraces is time, other metrics can be used: player action frequency, level/goal completions, and so
on. What is important is that these metrics are identified as meaningful by a designer and implemented
into a game logging system that allows for post-hoc analysis.</p>
        <p>PAS reads a folder from the file system containing an array of JSON files. The naming of each JSON
should be a meaningful identifier corresponding with its playtrace. Examples of good identifiers include
time of playtrace or the seed of the generated artifact — something displayed in the interface that will
help the user quickly understand which playtrace they are looking at. The format of the JSON is shown
in Listing 1. Of note is that you can have multiple discrete progression measurements and multiple
dependent metrics for each progression measurement, allowing the user to switch between diferent
playtrace inspections without reloading data. Discrete progression measurements are represented by
their index in their array (i.e., discreteProgressionMeasurements[0] corresponds to the first recorded
metrics in a playtrace). Dependent metrics must be numbers for visualization and analysis purposes.
This data structure makes PAS game-agnostic — so long as designers define metrics for input (which is
2https://github.com/chartjs/Chart.js
3https://github.com/fabricjs/fabric.js/
4https://github.com/chanind/curve-matcher
5https://github.com/GordonLesti/dynamic-time-warping
likely done as a byproduct of development anyways) PAS can be applied to a dataset.</p>
        <p>Once data is correctly formatted, it can be loaded into the system using the PAS interface, and the
x-axis and y-axis can be selected using dropdowns. After the playtraces are loaded and metrics are
specified, a user can select a toggle to display a point cloud representation of the playtraces they have
loaded.</p>
        <p>Listing 1: An example input JSON schema to be used with PAS. Each object inside of
“discreteProgressionMeasurements” should have the same structure. Only numbers can be used for playtrace
analysis.
},
...
15
16 }</p>
        <p>...</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Search Strategy and Output</title>
        <p>On the right-hand side of the tool, a canvas exists for a user to draw an arc on. This arc can be any
shape, but must pass the vertical line test (no duplicate y values for the same x value), and must only
consist of one continuous segment. The user can use the generated point cloud as a guide if desired.
The line does not need to reach both ends of the canvas, as the line drawn will be transformed to fit
within the vertical and horizontal bounds of any given playtrace to which it is being compared. With
data loaded and the desired arc drawn, a user can select what search strategy they’d like to use: Fréchet
Distance, Dynamic Time Warping (DTW), or a weighted combination of both. The Fréchet Distance
strategy favors matching based on overall curve trend on a point-by-point basis [36], while DTW cares
more about the shape and translated phase of curves and is more forgiving of translations of those
shapes (e.g., rising/falling trends) [37]. Both are useful for diferent use cases — Fréchet Distance might
help confirm an overall trend across all playtraces (all metrics are rising/falling in similar places) while
DTW can help confirm that all playtraces experienced a similar pattern regardless of progression index
that they occurred at. A weighted combination of both approaches is also available, allowing for a
nuanced mixture of both search strategies. Each strategy scores every playtrace in similarity against
the user-drawn curve from 0 to 1. Due to the transformation of the user-drawn curve to match the
relative ranges of playtraces, a resampling is performed for both algorithms to standardize scoring. The
density of this resampling can be specified using a sensitivity slider below the search method options.
The search can be initiated by clicking the button below the search method options.</p>
        <p>After performing the search, the user is presented with a ranked list of playtraces, sorted by their
similarity to the drawn score (see Fig. 3). Each playtrace is labeled by its filename, and its score is
shown in its row. Playtraces are colored on a gradient from green to red, with high scores being green
and low scores being red. Clicking on an individual playtrace entry shows a graph that overlays the
user-drawn arc with the playtrace, allowing for direct comparison in context. The user can hover over
the playtrace, investigating all metrics included in the JSON file for the given discrete progression step.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>To evaluate the eficacy of PAS, we ran an evaluation that sought to confirm three critical design use
cases:
1. Can PAS correctly positively score a drawn arc corresponding with the point cloud?
2. Can PAS correctly score a drawn arc that does not follow the point cloud?
3. Can PAS correctly identify a graph with certain features from a large set?</p>
      <p>To do this, we used a game testbed with automated playtester functionality to generate 1,000 playtraces
[38]. From these playtraces, we generated a point cloud, drew arcs to match each condition, selected
appropriate search strategies, and reviewed the results.</p>
      <sec id="sec-4-1">
        <title>4.1. Methodology</title>
        <p>We used a headless Turn-Based Role-Playing Game simulator named FighterDDA for the production of
a 1,000 playtrace corpus of data [38]. We selected this system because of its usage of a popular game
genre and ability to rapidly generate a large set of playtraces for analysis; collecting a similar volume
of human-playtesting would require much more time and is unnecessary to confirm basic system
operation. This system uses utility agents combined with an AI Director to simulate two teams of four
characters fighting one another, with the game ending when all characters from a given player have no
remaining health points. Characters are capable of attacking opponents, defending themselves, and
healing teammates. The AI director applies environment changes in an attempt to modulate dificulty
throughout gameplay. For the purposes of this evaluation, we chose to use the metric of the health of
all characters summed together across each game tick. This metric provides us a view into the general
pacing of the game and could help a designer understand if games are ending at appropriate times and
that games have some form of back-and-forth style of gameplay. A screenshot of the console output of
a run of the data generation testbed is shown in Fig. 4.</p>
        <p>To test the three cases listed above, we uploaded our corpus of playtraces to the system, selecting
game ticks (”timeStepLogs”) and total health (”totalCurrentHP”) as our x-axis and y-axis, respectively.
For case 1, we drew an arc that follows the densest areas of the point cloud, and ran analysis using
the Fréchet strategy, as we are looking to see if we get matches for the overall trend of our drawn arc.
For case 2, we drew an arc that inverts the densest areas of the point cloud, running the same Fréchet
search strategy as in case 1. For case 3, we drew a curve that oscillated while matching the overall trend
of the point curve, searching for playtraces that held some pattern of rising and falling current health
throughout the game. We used a combined approach (both weights set at 0.5) for this case, as we both
wanted the overall trend of the curve to be matched as well as the detection of specific curve features.
We set system sensitivity at 300 for all cases. All drawn curves can be seen in Fig. 5.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>PAS worked as intended in all three cases listed in section 1. For use case 1 (Q1), we found high scoring
of curves with near universal consistency, with scores ranging from 0.7367 to 0.9354. This shows that
PAS is capable of matching the overall curve trends that are common within a large set of playtraces.
For use case 2 (Q2), we found extremely low scoring of curves with universal consistency, with scores
ranging from 0.0000 to 0.3542. This result indicates that PAS successfully is able to identify curves
which are highly unlikely to appear within a data set. For use case 3 (Q3), we were able to identify a
curve with both overall trend and prominent features close to our drawn curve, as well as see curves
that matched poorly, with scores ranging from 0.6247 to 0.8603. This feature matching demonstrates
PAS’s ability to identify playtraces with both overall trend and specific features in a playtrace, helping
identify exemplary playthroughs of a game. A visual showing the drawn arc and the highest/lowest
scoring playtrace for each case is shown in Fig. 5.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Mapping Game States to Curves</title>
        <p>PAS highlights a common facet of evaluating the quality of game systems and emergent experience —
parameters and metrics of the system form curves over a progression, and these curves represent how
situations and gameplay unfold. These curves can be viewed as game-system metaphors for dramatic
arcs — the rising and falling of these curves over time can help a designer identify the “rising tension”
or other systematic drama over a gameplay session [39].</p>
        <p>It’s important to note that progression does not just happen on a moment-to-moment, temporal
basis. Such approaches might be unnecessarily fine-grained for understanding a game environment. A
designer for a game that contains a large volume of small levels, such as a mobile puzzle game, might
be more interested in metrics for each level completed over hundreds of levels instead of the minutiae
of each individual level itself. The ability for PAS to scale based on the grain of the discrete increment
provided allows the tool to be used across a variety of design use cases throughout the game lifecycle —
you might investigate the moment-to-moment dificulty of a section of a game, or view how a player
changes in skill level over thousands of multiplayer matches.</p>
        <p>The ability to see a collection of curves and find important curves also highlights versatility in seeing
the macro- and micro-trends of playtrace datasets. In our evaluation, we quickly saw that all games
were ending within a similar curve space. This is a useful insight for the platform designer, as it shows
that all games are ending appropriately and have a contained space (i.e., an important goal for this
style of battle is that battles eventually end and have some level of consistency in how they play out).
Meanwhile, it was reassuring to know that the individual character of curves with a tug-of-war pattern
(health oscillating over time) was present, meaning that the games simulated were able to provide
experiences that were not overly linear.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Confirming Designer Goals</title>
        <p>Designers often use an “experience goal” (i.e., what experience should the player have) to drive their
game development process [40]. These experience goals are impossible to know explicitly (as there
is no direct access to player experience) and instead must be expressed through qualitative research
or system-observable metrics defined through designer-specified player models. Understanding if a
game metric is accurately predicting a player experience (and, in some implementations, impacting
game state) correctly is key to meeting these player experience goals. Combining PAS analysis with
qualitative data can quickly shed light on whether or not the correct system-metrics have been defined
for an audience. For example, a designer might look at PAS and find confirmation for a system curve,
but find that players do not qualitatively agree with the data. In such a case, the designer might need to
go back to the drawing board to find a better system-derived method of quantifying player experience.</p>
        <p>When it comes to PCG, Automatic Game Design, or games aimed at having high replayability, it
may not be a positive thing to see a consistent set of playtraces. On the contrary, the designer of such
systems or games might actually be looking for system arc diversity, hoping to see a wide range of
curves across playtrace sessions [41]. When combined with an Expressive Range Analysis approach
(such as in Shields et al. [42]’s tooling on FighterDDA), understanding the potential design space of
system playtraces represents a novel and meaningful strategy to understanding system quality. PAS
allows the visualization of such diversity at a glance with its point cloud, but also allows the designer
to see if varied individual curves exist in the gameplay and what specific actions occurred during that
playtrace to produce the curve in question. Consistency and diversity can be valid goals for diferent
designers in diferent use cases, and PAS allows for the confirmation of either goal.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Tuning Through Parameter “Sweeping”</title>
        <p>
          PAS is not only useful for looking at a static configuration of game systems — it can also be used for
parameter tuning. This can be done by implementing an A/B testing strategy, where diferent parameter
tunings are provided for each playtesting session [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. With this collection of data, a designer can then
specify the output curve they were looking for, identify the playtrace(s) with the highest similarity, and
use the parameter settings from those playtraces for future tuning refinement. The scoring from this
system could even be implemented as part of a fitness function of an evolutionary loop, selecting games
based on the quality of their system arc in addition to other metrics.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations and Future Work</title>
      <p>We believe that PAS has the potential to be a valuable tool for designers to rapidly iterate and evaluate
their games based on systematic playtrace curves that might be dificult to parse or observe without it.
That being said, there are clear limitations that could be addressed in future work. For one, a trial with
game designers across a varied selection of games would greatly bolster the tool’s real-world usefulness
and ensure that it has the right usability and visualization approaches to aid in game design. Our
evaluation confirms basic system functionality, but would benefit from being applied to a wider range
of playtrace corpora, especially human-generated playtraces. Applying this form of evaluation to more
complicated and a wider diversity of system measurements would also be useful. For example, it would
be interesting to see how PAS could be applied directly to narrative spaces, to see if it could directly
capture the dramatic arcs that it uses as a metaphor for system analysis. Integrating the arc-matching
process into generative search strategies (i.e., into fitness functions) also represents an interesting design
potential in looking for system quality when generating levels or games. Implementing a database
adapter (so that data can be queried from a database rather than saving JSON files to disk) would
also reduce requirements to improve/alter existing data logging systems in games and improve PAS
compatibility overall. Finally, common arc patterns (such as the narrative arcs described in Reagan et al.
[43]) could be provided to users as an alternative to drawing, efectively building on the work of Leong
et al. [44]’s approach to mapping storysifting to Reagan et al. [ 43]’s definitions.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>Playtrace Arc Search is a tool that allows designers to evaluate a large corpus of playtraces through
a visualization tool and search for specific systematic “arcs” within their dataset. It provides visuals
for the entire dataset as well as inspection of individual playtraces and data points alongside a scoring
system that shows the similarity of a given dataset to a user-drawn arc. The tool contains multiple
search strategies, and proved successful on searching through a corpus to find examples of relevant
curves as drawn by a user. The combination of generic input, ability to rapidly iterate through arc
searches and strategies, and range of visualization output indicates that PAS can be a useful tool for
designers who wish to understand the quality of their game systems over time through playtesting data.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Special thanks to Robert Zubek, Jasmine Otto, and Oliver Withington for their feedback and advice on
the tools described in this paper.</p>
      <p>This material is also based upon work supported by the National Science Foundation under Grant
No. 2202521. Any opinions, findings, and conclusions or recommendations expressed in this material
are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>The author(s) used ChatGPT 4.5 and 5 in order to: Grammar and spell check, paraphrase and reword.
After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.
arXiv preprint arXiv:2404.18574 (2024).
[21] O. Withington, L. Tokarchuk, The Right Variety: Improving Expressive Range Analysis with Metric
Selection Methods, in: Proceedings of the 18th International Conference on the Foundations of
Digital Games, ACM, Lisbon Portugal, 2023, pp. 1–11. URL: https://dl.acm.org/doi/10.1145/3582437.
3582453. doi:10.1145/3582437.3582453.
[22] G. Smith, J. Whitehead, Analyzing the expressive range of a level generator, in: Proceedings
of the 2010 Workshop on Procedural Content Generation in Games - PCGames ’10, ACM Press,
Monterey, California, 2010, pp. 1–7. URL: http://portal.acm.org/citation.cfm?doid=1814256.1814260.
doi:10.1145/1814256.1814260.
[23] A. Drachen, A. Canossa, Towards gameplay analysis via gameplay metrics, in: Proceedings of the
13th international MindTrek conference: Everyday life in the ubiquitous era, 2009, pp. 202–209.
[24] G. Wallner, Play-graph: A methodology and visualization approach for the analysis of gameplay
data, in: 8th International conference on the Foundations of digital games (FDG2013), Foundations
of Digital Games, 2013, pp. 253–260.
[25] Y.-E. Liu, E. Andersen, R. Snider, S. Cooper, Z. Popović, Feature-based projections for efective
playtrace analysis, in: Proceedings of the 6th international conference on foundations of digital
games, 2011, pp. 69–76.
[26] P. W. Frey, P. Adesman, Recall memory for visually presented chess positions, Memory &amp; Cognition
4 (1976) 541–547.
[27] J. S. Reitman, Skilled perception in go: Deducing memory structures from inter-response times,</p>
      <p>Cognitive psychology 8 (1976) 336–356.
[28] Blizzard Entertainment, Starcraft 2, [DIGITAL], 2010.
[29] A. Białecki, N. Jakubowska, P. Dobrowolski, P. Białecki, L. Krupiński, A. Szczap, R. Białecki,
J. Gajewski, Sc2egset: Starcraft ii esport replay and game-state dataset, Scientific Data 10 (2023)
600.
[30] Z. Lin, J. Gehring, V. Khalidov, G. Synnaeve, Stardata: A starcraft ai research dataset, in:
Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,
volume 13, 2017, pp. 50–56.
[31] A. Drachen, M. Seif El-Nasr, A. Canossa, Game analytics–the basics, in: Game analytics:
Maximizing the value of player data, Springer, 2013, pp. 13–40.
[32] C. Pedersen, J. Togelius, G. N. Yannakakis, Modeling player experience for content creation, IEEE</p>
      <p>Transactions on Computational Intelligence and AI in Games 2 (2010) 54–67.
[33] A. Baldwin, D. Johnson, P. Wyeth, P. Sweetser, A framework of dynamic dificulty adjustment in
competitive multiplayer video games, in: 2013 IEEE international games innovation conference
(IGIC), IEEE, 2013, pp. 16–19.
[34] Valve South, Left 4 dead, [Windows, Xbox 360, macOS], 2008.
[35] M. Booth, The ai systems of left 4 dead, in: Artificial Intelligence and Interactive Digital
Entertainment Conference at Stanford, 2009, 2009.
[36] H. Alt, M. Godau, Computing the fréchet distance between two polygonal curves, International</p>
      <p>Journal of Computational Geometry &amp; Applications 5 (1995) 75–91.
[37] E. Keogh, C. A. Ratanamahatana, Exact indexing of dynamic time warping, Knowledge and
information systems 7 (2005) 358–386.
[38] S. Shields, E. F. Melcer, FighterDDA: A simulation testbed for evaluating director-based dynamic
balancing, in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive
Digital Entertainment (AIIDE), AAAI Press, 2025. In press.
[39] M. Mateas, A. Stern, Structuring content in the façade interactive drama architecture, in:
Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,
volume 1, 2005, pp. 93–98.
[40] T. Fullerton, C. Swain, S. Hofman, Game design workshop: Designing, prototyping, &amp; playtesting
games, CRC Press, 2004.
[41] T. X. Short, T. Adams, Procedural storytelling in game design, Crc Press, 2019.
[42] S. Shields, O. Withington, E. F. Melcer, Designer dificulties: Visualizing the possibility spaces
of dynamic dificulty adjustment systems, in: Proceedings of the AAAI Conference on Artificial
Intelligence and Interactive Digital Entertainment (AIIDE), AAAI Press, 2025. In press.
[43] A. J. Reagan, L. Mitchell, D. Kiley, C. M. Danforth, P. S. Dodds, The emotional arcs of stories are
dominated by six basic shapes, EPJ data science 5 (2016) 1–12.
[44] W. Leong, J. Porteous, J. Thangarajah, Automated sifting of stories from simulated storyworlds.,
in: IJCAI, 2022, pp. 4950–4956.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Osborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Samuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>McCoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <article-title>Evaluating play trace (dis) similarity metrics</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>10</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Andrade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ramalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Corruble</surname>
          </string-name>
          ,
          <article-title>Dynamic game balancing: An evaluation of user satisfaction</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>2</volume>
          ,
          <year>2006</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hyrynsalmi</surname>
          </string-name>
          , E. Klotins,
          <string-name>
            <given-names>M.</given-names>
            <surname>Unterkalmsteiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gorschek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tripathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Pompermaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prikladnicki</surname>
          </string-name>
          ,
          <article-title>What is a minimum viable (video) game? towards a research agenda</article-title>
          , in: Conference on e-Business, e-Services and e-Society, Springer,
          <year>2018</year>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Xenopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rulf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Silva</surname>
          </string-name>
          , Ggviz:
          <article-title>Accelerating large-scale esports game analysis</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Drachen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Canossa</surname>
          </string-name>
          ,
          <article-title>Analyzing spatial user behavior in computer games using geographic information systems</article-title>
          ,
          <source>in: Proceedings of the 13th international MindTrek conference: Everyday life in the ubiquitous era</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>182</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Player modeling</article-title>
          ,
          <source>in: Artificial Intelligence and Games</source>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>315</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guzdial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Deep convolutional player modeling on log and level data</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference on the Foundations of Digital Games</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maldeniya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Ferrara,</surname>
          </string-name>
          <article-title>The wide, the deep, and the maverick: Types of players in team-based online games</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>5</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zook</surname>
          </string-name>
          , E. Fruchter,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Automatic playtesting for game parameter tuning via active learning</article-title>
          , CoRR abs/
          <year>1908</year>
          .01417 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1908</year>
          .01417. arXiv:
          <year>1908</year>
          .01417.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Liapis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shaker</surname>
          </string-name>
          ,
          <article-title>Mixed-initiative content creation, Procedural content generation in games (</article-title>
          <year>2016</year>
          )
          <fpage>195</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Margarido</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Machado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Roque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <article-title>Boosting mixed-initiative co-creativity in game design: A tutorial, ACM Computing Surveys (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Leymarie</surname>
          </string-name>
          , W. Latham,
          <article-title>On mixed-initiative content creation for video games</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>543</fpage>
          -
          <lpage>557</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kreminski</surname>
          </string-name>
          , I. Karth,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wardrip-Fruin</surname>
          </string-name>
          ,
          <article-title>Evaluating mixed-initiative creative interfaces via expressive range coverage analysis</article-title>
          .,
          <source>in: IUI Workshops</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Liapis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Sentient sketchbook: computer-assisted game level authoring (</article-title>
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Migkotzidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liapis</surname>
          </string-name>
          , Susketch:
          <article-title>Surrogate models of gameplay as a design assistant</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>273</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Summerville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Snodgrass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guzdial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Holmgård</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Hoover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaksen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nealen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Procedural content generation via machine learning (pcgml)</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          <volume>10</volume>
          (
          <year>2018</year>
          )
          <fpage>257</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shields</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mawhorter</surname>
          </string-name>
          , E. Melcer,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <article-title>Searching for balanced 2d brawler games: successes and failures of automated evaluation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>18</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Morosan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Poli</surname>
          </string-name>
          ,
          <article-title>Automated game balancing in ms pacman and starcraft using evolutionary algorithms</article-title>
          , in: Applications of Evolutionary Computation: 20th European Conference,
          <source>EvoApplications</source>
          <year>2017</year>
          , Amsterdam, The Netherlands,
          <source>April 19-21</source>
          ,
          <year>2017</year>
          , Proceedings,
          <source>Part I 20</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>377</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Leigh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schonfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Louis</surname>
          </string-name>
          ,
          <article-title>Using coevolution to understand and validate game balance in continuous games</article-title>
          ,
          <source>in: Proceedings of the 10th annual conference on Genetic and evolutionary computation</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1563</fpage>
          -
          <lpage>1570</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rupp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eckert</surname>
          </string-name>
          , Geevo:
          <article-title>Game economy generation and balancing with evolutionary algorithms,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>