<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Empirical Analysis of the Validity of Competitive Pokémon Rule Sets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicholas Fluty</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryan D. Flores</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Judy Goldsmith</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brent Harrison</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Kentucky</institution>
          ,
          <addr-line>Lexington, KY 40506-0633</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In competitive Pokémon battling, players have adopted a set of extra rules that are meant to encourage fair play. They are used to constrain team formation so that no one team has an overwhelming advantage over all others. These rule sets are often derived based on trial and error, intuition, or post-hoc evaluations of team performance, which means that the rules may not be ideal solutions to the problem they are supposed to address, or the problem may not have been worth addressing. In this paper, we explore how artificial intelligence and machine learning techniques can be used to potentially evaluate the quality of a rule set. This is meant to be a preliminary study that will ultimately lead to the automatic formulation of such rule sets. Our case study investigates how the inclusion or exclusion of one-hit-knock-out (OHKO) moves afects the outcomes and player behaviors in games between two teams battling under Generation 1 rules.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Competitive Rules</kwd>
        <kwd>Artificial Intelligence</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Pokémon is a game in which players construct teams of
combatants, the titular Pokémon, to battle against other
players’ teams. A great deal of thought is often put into
how these teams are constructed, as one wants to utilize
powerful Pokémon while promoting good synergy as a team.
In order to ensure that a healthy competitive atmosphere is
maintained, there are often rules put in place on how a team
can be constructed. This is meant to ensure that strategies
that are potentially too strong don’t become prevalent as a
part of the competitive metagame.</p>
      <p>The rulesets Smogon uses to govern their competitive
battles are good examples of this. Smogon is a competitive
battling community that organizes tournaments, provides
competitive battling resources, etc. In service of this, they
also define rulesets that are used when these tournaments
are held. These rules govern how players construct and use
their teams and are meant to guard against overpowered or
degenerate strategies. In addition to a set of rules common
to all battles, Smogon defines various battle formats that
restrict which Pokémon can be used to allow for diverse
usage of both strong and weak Pokémon. The most commonly
used format is refered to as OverUsed (OU), which allows all
but some of the strongest "legendary" Pokémon that were
intentionally given this advantage for purposes outside of
competitive play. Again, these rulesets are meant to ensure
that no one strategy for constructing teams or battling is
strictly dominant over all others.</p>
      <p>While these rules are often necessary for healthy
competitive play, constructing these rulesets can be quite dificult.
Often, these rules are based on speculation, anecdotal
evidence, or post-hoc analysis. As such, the formation of
efective rulesets is an imperfect science that can be
timeconsuming and prone to errors (constructing rules where
there shouldn’t be one or missing a rule that should be
present).</p>
      <p>In this paper, we investigate how machine learning (ML)
11th Experimental Artificial Intelligence in Games Workshop, November
19, 2024, Lexington, Kentucky, USA.
$ ndfl222@uky.edu (N. Fluty); rhma226@uky.edu (R. D. Flores);
goldsmit@uky.edu (J. Goldsmith); bha286@g.uky.edu (B. Harrison)
0009-0007-3250-4915 (N. Fluty); 0009-0000-2790-5908 (R. D. Flores);
0000-0002-8383-5390 (J. Goldsmith); 0000-0002-1301-5928 (B. Harrison)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
techniques can be used to support the evaluation of
competitive rule sets for the game of Pokémon. We are using
the Pokémon domain to test this concept because of the
existence of community tournaments that contain rules that
exist outside of the game environment. Specifically, we
present a case study in which we examine the Smogon rules
associated with the first generation of the game,
demonstrate how we can test changes, and present a data-based
discussion of the efects of the change. We chose this ruleset
because Smogon rules are largely community-driven and
not necessarily subject to rigorous empirical analysis. The
primary contribution of this work is to explore how AI and
machine learning techniques can be used to perform
vulnerability tests on these types of rulesets. This case study
serves as preliminary evidence of the feasibility of such an
approach, and we hope it will encourage further work in
the area.</p>
      <p>The remainder of the paper is organized as follows. In the
next section, we review relevant related work on evaluating
rule sets and metagame in Pokémon. We will then introduce
the Smogon generation 1 tournament rule set. Finally, we
will detail our case study and present the results of said
study.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        The primary contribution of this work is in evaluating the
rulesets associated with competitive play in games.
Specifically, in this paper we evaluate how the rulesets
associated with competitive Pokémon afect dominant teams.
There has been past work that has examined the Pokémon
metagame [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], but that previous work examines what
teams of Pokémon are particularly strong in a metagame
as defined by the rules associated with competitive play or
investigate countering the metagame [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this paper, we
examined whether these rules are justified and how one
might prove them.
      </p>
      <p>
        To do this, we take inspiration from automated
playtesting literature and propose that machine learning techniques
can be used to identify problems with rule sets or rules that
are not justified. Typically, video game playtesting is
performed by humans to determine whether a game contains
errors. This process is time consuming and prone to
human error. Thus, there has been an increased interest in
automating this process using artificial agents. In the past,
researchers have explored several AI and machine
learning methods for automating this process [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
        ]. Still,
these approaches are typically done to evaluate level design
or game mechanics with respect to designer goals. Other
work has focused on better creating agents that can mimic
playtesters of diferent personas [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or skillsets [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], but still
the primary focus is on evaluating game mechanics or level
design. In this work, we focus on evaluating rules that are
not inherent to the game itself, but that are designed after
the fact to encourage competitive play.
      </p>
      <p>
        When determining the efect that rule sets have on a
team’s win probability, we use machine learning to learn
battle strategies for each team. Machine learning has been
readily explored in Pokémon [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10, 11, 12, 13, 14</xref>
        ] and found
to be an efective tool for teaching agents how to battle. The
main diference between our work and this past work is not
in the method, but in the motivation behind the method.
These past works primarily focused on developing
techniques to be more competent in battle. We are using these
techniques in service of evaluating player-made rule sets.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Smogon Rule Sets</title>
      <p>Smogon uses a modified rule set compared to the oficial
tournaments hosted by Nintendo. The rules vary across the
diferent Pokémon generations. These modified rules efect
the formation of the teams as well as the actions a player
can take during a battle. Every few years a vote is held on
the Smogon forums to see whether or not any rules need to
be updated or replaced.</p>
      <p>The first set of rules dictates the team formation. The
Species Clause prevents a player from having multiple of
the same Pokémon on their team. This is to encourage more
diversity and to prevent players from running teams of the
same Pokémon. Next is the Evasion Clause, which prevents
player from using the moves Double Team or Minimize.
These moves make a Pokémon harder to hit, and can lead
to stalled games where neither player can win. The final
clause is the one-hit-knock-out (OHKO) Clause. Pokémon
are not allowed to have the moves Horn Drill, Guillotine,
Sheer Cold, or Fissure. These moves, referred to as OHKO
moves, have a low hit rate but will cause the opponent’s
Pokémon to faint if they do hit. The general opinion on the
forums is that this rule prevents strong players from losing
due to randomness</p>
      <p>The second set of clauses afects player behavior during
the battles. The Sleep Clause prevents a player from directly
causing more than one of their opponent’s Pokémon to fall
asleep at a time. If a move attempts to break this clause, the
game will automatically prevent the sleep from occurring.
The Freeze Clause is the same as the Sleep Clause but it
refers to the freeze status efect. The Endless Battle Clause
prevents players from intentionally preventing their
opponent from winning without forfeiting. The Timer Clause
causes a player to automatically lose the battle if their player
timer is exhausted.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Design</title>
      <p>To show how ML can be used as a tool for testing rules for
Pokémon battles, we decided to designate one rule for
experimentation. Due to the seemingly independent reasoning
for each of Smogon’s rules, it would make sense that rules
are tested individually as one sees fit.</p>
      <p>We decided the rule of greatest interest to us was the one
prohibiting the use of OHKO moves. Not all Pokémon can
use the moves, so we would expect this rule to restrict a
small set of Pokémon.</p>
      <p>In their best use cases, OHKO moves will work ∼ 30% of
the time, dealing a fixed amount of damage that can knock
out any opponent. Smogon may have set this rule to prevent
its users from being too strong, but it might alternatively
exist to keep the game more interesting. For example,
relying on luck to this extent may limit players’ ability to win
through better decision-making. As a counterargument, ice
moves have a ∼ 10% chance of permanently freezing the
opponent, a large reason to use the move, yet there is no
rule against it.</p>
      <p>While it would be very interesting to know exactly how
the utility of all Pokémon change by the removal of the rule
and production of a new ranking of some sort, that would be
either too dificult to compute or require many assumptions
that may not be agreeable. We instead perform a single
experiment where we hope to see some notable efects of
the rule.</p>
      <p>To perform our experiment, we use ML to control the
actions of players in four repeated team battle scenarios. We
configure two teams with three Pokémon each, determine
control movesets for each that make sense for competitive
play, and create alternate movesets that make use of OHKO
moves. In the four scenarios, we test the likelihood of Team
1 winning when one, both, or neither of the teams use the
alternate movesets. All other factors are assumed to be
constant.</p>
      <p>We conduct these battles using the most popular of
Smogon’s battle formats, OU, which only prohibits using
Mew and Mewtwo. There are 14 Pokémon in the OU tier.
Pokémon in lower tiers are allowed but not recommended
in the format because they are seen as not worth using over
the other 14.</p>
      <p>By observing both the outcome of games and the
specific learned behavior of the player agents, we should have
insight on how competitive play may be afected by the
removal of this rule set. If a team’s change in moveset appears
to increase its likelihood of winning, implying that players
would want to use the moves, then we can be confident that
the rule set impacts play. If not, then our results are not
conclusive but suggest that the rule may not have much
impact.</p>
      <sec id="sec-4-1">
        <title>4.1. The Teams</title>
        <p>Reasonably designing the two teams and their movesets is
an important part of testing the rule, as we want the
results of the experiments to have implications on how skilled
players would optimally play. For example, if the
introduction of OHKO moves to Team 1 yields large benefits against
Team 2, this would be irrelevant if the control movesets
of Team 1 were already inefective, such that we could see
the same benefits by using other currently legal moves. In
another case, if Team 2 is not a good representation of a
normal competitive team, then one’s ability to beat it is not
meaningful.</p>
        <p>First, we acknowledge that 3v3 battles are uncommon
for competitive games. Community rule sets, tier lists, and
recommended movesets are all generally made under the
assumption that battles are 6v6. Our primary reasons for this
change are to limit the number of variables afecting battle
and improve the speed and performance of our applied ML.
Not every Pokémon is capable of using OHKO moves in the
games, so a reduced team size helps raise the concentration
of OHKO move usage while allowing us focus on just a few
of its users.</p>
        <p>Table 1 details all relevant information about the two
teams and their movesets. Following is a justification of this
configuration.</p>
        <p>In designing the teams, we noted that the three Pokémon,
Snorlax, Tauros, and Chansey, can be see on almost every
6v6 team in OU. The Type or Types of a Pokémon usually
cause them to have particular advantages or disadvantages
against diferent Pokémon, but these three being of the
Normal Type means that there are fewer moves that have
type advantage against them. Similarly, their moves are less
likely to be highly efective or inefective against other types.
It happens that these three also have some of the highest
stat totals, making them some of the strongest Pokémon in
isolation.</p>
        <p>Another reason for their high usage is their synergy in
performing diferent roles for a team. Snorlax is one of
the bulkiest Pokémon in the game and has the damage to
threaten any opponent. Tauros has both damage and speed,
making it able to quickly finish of weakened opponents.
Used efectively, it can force enemies to switch out, allowing
for a free hit on the new Pokémon. Chansey specializes in
inflicting negative status conditions and can recover health
faster than a lot of Pokémon can deal damage. Snorlax
and Chansey can also be seen using a variety of movesets
depending on what best compliments the rest of the team.</p>
        <p>For these reasons, we believed it would be interesting to
see these three together as a complete team, where Snorlax
and Tauros can both be given OHKO moves. The only other
OHKO move users in OU are Rhydon and Slowbro, so we
put them on the other team. Rhydon is advantageous for its
ability to absorb Normal moves, while Slowbro can inflict
paralysis with Thunder Wave. Paralysis is critical to the
team because OHKO moves will not work if the user has less
speed than the target. It also seemed interesting to make
use of Slowbro, since he is one of the least used and lowest
rated Pokémon categorized into OU.</p>
        <p>Because these two are particularly slow, their team would
benefit from having a fast Pokémon that can use also use
Thunder Wave. There were several nominees for this role,
but we selected Alakazam for its slightly better ofense,
which would help compensate for the absence of Tauros
from that team.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. The Moves</title>
        <p>When determining the moves for the control teams, we
wanted to balance using the most common moves with
making all Pokémon reasonably useful for the battle. All
three Pokémon on the Normal team are usually seen with
Ice moves, which is redundant and counters Rhydon too
well. Snorlax is the one most commonly seen without its
Ice move, so we gave it Self-Destruct to account for other
dificult situations.</p>
        <p>A Pokémon can only know four moves, so a move would
have to be replaced from each Pokémon that would be given
an OHKO move. The fixed damage nature of these moves
makes it seem reasonable to replace a damaging move that
is usually only used situationally. For Snorlax, we replaced
Self-Destruct. For Tauros, we replaced Earthquake, which
is mostly just used against Gengar in OU.</p>
        <p>We anticipated that Rhydon would have no reason to use
Rock Slide or Body Slam when it has Earthquake, so we just
replaced Rock Slide. Slowbro usually only has one damaging
move, so ideally we would replace Amnesia or Rest. Each
of these moves in the absence of the other is less useful, as
they are efective in combination. In the end, we decided
to sacrifice Rest because it leaves the user vulnerable and
more predictable.</p>
        <p>It may be important to note that neither team was given
moves to inflict the sleep condition on enemies. Smogon
has always recognized sleep to be one of the most powerful
mechanics given to some Pokémon, which is why they have
a rule preventing a team from making two of their opponents
be asleep at the same time. Our primary reason for not using
these moves was that the Pokémon we were considering
selecting could not use them. We avoided intentionally
giving the teams access to these moves because we feared
that they simply could not be balanced; there are not enough
Pokémon on each team for players to not be devastated by
the reduction in options. We could also argue that both
teams have enough ability to inflict other status conditions,
helping fill the void made by sleep’s exclusion.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Software Used</title>
        <p>To simulate Pokémon battles, we used the pkmn engine,
a Pokémon battle simulation engine optimized for
performance in larger scale projects [15]. This open source tool
accurately implements battles as done in the original game
code and the popular simulator Pokémon Showdown.
Pokémon Showdown is sponsored and endorsed by Smogon for
competitive battles, as it provides practical implementations
of battles for all Pokémon generations and a practical
interface for playing online. The pkmn engine currently only
fully supports the first generation of Pokémon, but it is able
to run faster than Showdown while having less unneeded
overhead.</p>
        <p>We also used and credit another open source project,
Wrapsire, which provides a C++ interface for the compiled
library produced by the pkmn engine [16]. We opted to use
C++ for this project because of its advantages for memory
management, multi-threading, and compiler optimizations.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Monte Carlo Tree Search</title>
        <p>Pokémon is a stochastic turn-based game, where a turn is
initiated by both players selecting an action independently
but concurrently. In order to observe battles that are
consistent with the skill expected of competitive players, action
selection must be mindful of what gives the highest chances
of winning. To accomplish this, we decided to use Monte
Carlo Tree Search (MCTS).</p>
        <p>
          We chose to use MCTS primarily because of its ability
to handle uncertainty and large state spaces. MCTS has
also been shown to be efective at simulating Pokémon
battles [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Pokémon uses a random seed as a factor in many
diferent calculations, most notably in standard damage
calculation and secondary move efects. Because of this, a turn
initiated from a given game state by given actions may have
hundreds of possible unique resulting game states
depending on the seed. This is not problematic for MCTS because
it naturally weighs the value of an action based on the
frequency that diferent states result from it, and these states
similarly develop better heuristics for move selection as the
search continues.
        </p>
        <p>The primary problem we faced with implementing MCTS
for Pokémon battles was with determining how to handle
concurrent action selection. Minimax trees are commonly
used for turn-based games like chess, but these assume that
players alternate turns and know what the enemy
previously selected. This assumption contradicts the nature of
competitive Pokémon battles, as it almost always involves
players intentionally being unpredictable. This is due to a
natural rock-paper-scissors-like relationship among
strategies. For example, recovery may counter gradual damage,
while applying bufs or statuses may counter recovery, and
the right damaging move may yield an enemy’s bufs
ineffective or make them lose before they gain the longer-term
benefits.</p>
        <p>To address this issue, we decided to randomly determine
at every simulation of a turn who will select their action first,
and who will pick based on that selected action. While this
does not perfectly represent the distribution of move
combinations used in competitive games, it efectively balances
the scenarios where a player either picks their safest action
or correctly predicts that their opponent is picking their
safest action. This gives an equal advantage to each player,
while also producing more consistency in game outcomes,
which is helpful for assuring the integrity of our results. It is
also significantly less computationally expensive than
developing stochastic policies, which makes it easily applicable
at all frequently visited states during a search.</p>
        <p>There is an additional concern caused by simulating
forward in a game, and that is that it causes you to know
everything about the opponent’s team. Normally Pokémon
is a partial information game, where one does not know
anything about the opponent’s team until they switch to
each Pokémon and use their moves. If the battles in this
experiment were done by people instead of ML agents, Team
2 not knowing that Alakazam could switch to Rhydon could
end up disastrous for them if they told Snorlax to use
SelfDestruct.</p>
        <p>Representing partial observability not only makes the
problem too complicated to get reasonable results but also
may not change the results very much. Most Pokémon have
a single set of moves that is used identically in over half
of games. Additionally, teams in 6v6 battles tend to
intentionally cover all of their weaknesses by having a Pokémon
resistant to each commonly used move Type, basically
following a recipe for team building. Essentially, players can
expect their opponents to have some kind of counter until
they have already defeated the counter, and it is not as
important to know what exactly the counter is. All this leads
to teams being rather predictable.</p>
        <p>Other design notes had a lesser efect on the results.
Primarily, since total health points and amounts of damage
dealt are rather inflated, we bucketed states together when
counting visits and related information, ignoring the bottom
ifve bits of both Pokémon’s health points when identifying
a state during a search. This allows for much quicker
convergence and is easily modifiable in our code.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Simulating Each Scenario</title>
        <p>To observe the likelihood of Team 1 winning with
reasonable precision, we decided to simulate 500 battles for each
scenario, using MCTS at each turn. Each search consisted
of 100,000 iterations, simulating from the active game state.</p>
        <p>Because of the somewhat large depth and stochastic
nature of Pokémon battles, we chose to limit the search depth
to 25 turns from the active game state. Simulating past
25 turns of depth would likely be unnecessary since states
would not get enough visits to do anything other than
random rollouts, which only slowly progress to the end state
and may favor certain movesets. After 25 turns, we
terminated the simulated game and considered the team with the
larger sum of percentage healths the winner. When this
happened, we gave the winner a modified reward as if it
was an average of 9 wins and 1 loss, which was to help
encourage actual wins over this method. Implementing this
caused it to take less time to complete each search.</p>
        <p>We also encountered a problem with battles not ending
when all Pokémon on a team were frozen. This was because
the algorithm recognized that it could win on any future
turn for equal reward. Rather than reducing the reward for
winning after more turns, something that could theoretically
negatively influence the decisions made, we decided that
it was safe to assume that a team with all their Pokémon
frozen will not win and let games be terminated at that
point. None of these Pokémon had moves to deal damage
on turns they do not attack, such as Toxic and Leech Seed.</p>
        <p>When running the battles, we collected a detailed log of
the actions selected at every turn and the values of relevant
variables, such as health and status conditions. We used
these to extract interesting information that may inform
our discussion of how gameplay behavior changes when
the external rules for team compositions change.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>Observing the summary of the battles in Table 2,
particularly the Team 1 Win % column, we can see notable
Move Given</p>
      <p>Rest
Rock Slide
Self-Destruct
Earthquake
changes in the outcome of games based on the change of
movesets. In the control battles where neither team was
given OHKO moves, we saw that the three popular Normal
types, Team 2, won 5/8 of their games. We expected that
this team would have an advantage, since they have high
stats, few weaknesses, and seemingly good synergy.</p>
      <p>When we tried replacing moves from the already
advantaged Team 2 with OHKO moves, they lost some of their
advantage. This suggests that the moves we replaced were
more useful in this battle than the OHKO moves were. In
fact, they collectively managed only 1 successful OHKO per
10 battles. Taking away Tauros’s Earthquake likely was not
a big deal since it still had Body Slam, but we suspect that the
main reason for the change in performance is Self-Destruct,
which knocks out the user to deal massive damage.</p>
      <p>As a quick side note, Self-Destruct caused ties in 4 of the
2,000 battles by knocking out the final two Pokémon. Each
of these was treated as half of a win when calculating Team
1’s win ratio.</p>
      <p>Snorlax’s Self-Destruct can knock out Alakazam in one
hit or take out 3/4 of Slowbro’s health. 220 of the 482 times
Snorlax fainted in the control scenario were from using this
move. This indicates that the move was worth having and
using against this team. On the other hand, Self-Destruct
is more risky in a 6v6 battle because the opponent is more
likely to be able to switch to a Pokémon like Gengar that
can absorb the blow and leave Snorlax knocked out. Fissure
did not help Snorlax here, but it might find more use with
the right moveset and team.</p>
      <p>When we tried putting OHKO moves into the movesets
of just Team 1, we saw their win ratio go up from 37.5% to
41.9%. This suggests that we successfully found a use case
for OHKO moves in competitive battles. Slowbro averaged
about 1 successfully hit Fissure per 2 battles, while Rhydon
hit about 1 Horn Drill per 10 battles. Rhydon’s success rate
was not very impressive, but this Pokémon’s popularity
within OU would probably be enough to cause Horn Drill
to show up as an alternative to catch opponents of guard.</p>
      <p>Slowbro had the most interesting results with OHKO
moves. We replaced Rest, which Smogon quotes as being
important for use with Amnesia. Slowbro probably would
have used Amnesia and Rest a lot more if it had been starting
against a special attacker instead of the physical attacker
Snorlax. Against a diferent team with more special
attackers, Slowbro might have done better with the recommended
moveset. On the other hand, Slowbro’s increase in ability to
use alternate movesets would make it more useful in
diferent scenarios, which could raise its usage rate. Opponents
would not know whether to expect the standard Slowbro or
one with Fissure, causing them to have trouble switching
appropriately.</p>
      <p>The scenario where both teams had OHKO moves is not as
interesting because we already showed a disadvantage with
the movesets for Team 2. However, it ofers a little insight to
how some things change with diferent circumstances. Most
notably, the change in win rate was significantly greater
when both changes were present together. Without going
into too much detail, a strong hypothesis could be that the
absence of Self-Destruct was more devastating for Team
2 when it had a Fissure-using Slowbro to deal with.
Slowbro used over twice as many total moves in battles where
Snorlax did not have Self-Destruct.</p>
      <p>Aside from the four Pokémon already in OU that can
use OHKO moves, this rule efects some Pokémon that are
just below the cutof for this tier. Dragonite may be the
most notable of these. In OU, Dragonite is the holder of
the highest base stat total but lacks the moves to back it
up. Dragonite is relatively fast and can use Thunder Wave.
Perhaps the only thing stopping it from becoming one of
the strongest Pokémon with Horn Drill is its weakness to
Ice. Ice moves are very useful and can be known by most
Pokémon in OU.</p>
      <p>Lapras is also a strong candidate for OU viability with
Horn Drill, as it can put Pokémon to sleep with Sing, is
very bulky, and outspeeds other bulky Pokémon. The rule
limiting a team to putting just one opponent to sleep at once
currently stops Lapras from being viable in OU, as usually
having one sleeper is seen as ideal. With the addition of
Horn Drill, Lapras might be a better replacement for another
sleeper, especially with Blizzard to take down Rhydon pretty
consistently. Smogon recommends using 3 damaging moves
for Lapras, even though it is not very ofensive; one of these
is likely easily replaceable by Horn Drill.</p>
      <p>Win rates aside, the battles can be seen changing in terms
of the experience for players. Earlier we mentioned that the
OHKO rule may exist to keep games interesting. One thing
that can be noted from Table 2 is that, as the number of teams
with OHKO moves increased, the turn counts decreased.
When Team 1 had the moves and performed better, the
games were about 4/5 as long, even though the win ratio
reflected that it was a more even match.</p>
      <p>Game length does not necessarily translate to more or
less fun for the players, but this helps show how the game
had more of a twist toward randomly rewarding a team with
a large advantage that would cause it to end more decisively.
A 30-70 gamble with a high reward likely does this. It makes
sense that skilled players would appreciate their win ratio
having higher correlation to that skill they put a lot of time
into mastering, so this seems like a strong potential motive
for the rule.</p>
      <p>We also show in Table 2 the average number of OHKO
moves used by any Pokémon per battle in the column OHKO
Sel/Bat. This helps us gain a sense of whether Pokémon are
overutilizing OHKO moves with respect to other moves. If
a Pokémon is disproportionately using OHKO moves, this
could lead to the opposing player becoming annoyed or
frustrated with the battle since OHKO moves generally involve
luck to succeed. Three OHKO moves being selected in a
battle lasting 41 turns is not likely to be seen as overusage
or annoying. It may be important to acknowledge that this
battle used all four Pokémon in the current OU tier that
are able to use OHKO moves, so this can be seen as an
estimated upper bound on OHKO move usage. Given this, the
luck-based nature of these moves seems less concerning.</p>
      <p>Under these almost ideal conditions for OHKO move
usage, we observed this strategy not posing a notable problem
to player experience. OHKO moves can be observed
usually requiring setup by using Thunder Wave or Body Slam,
which shows that the strategy still requires a degree of skill
and team-building decisions to make efective. Furthermore,
we can see that allowing these moves may give more benefit
to less commonly used Pokémon, and we can anticipate that
some Pokémon that are not considered strong enough for
this battle format may become more relevant.</p>
      <p>The results of this experiment do not show a strong
justiifcation for the rule banning the use of OHKO moves, but
that is not to say that an alternate problematic case does
not exist. Of course, it must be acknowledged that this is
a limited case study designed to show how we can collect
and analyze data to test the concerns associated with a rule.
Should the rule be seriously put into question by the Smogon
community, it would be wise to observe more cases than
this and in a less anecdotal way.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Additional rule sets are often used in competitive games
to create a healthy competitive metagame. This ensures
that all participants are playing on a level field. These rule
sets, however, are often designed using intuition and
posthoc usage statistics/analysis. In this paper, we explore how
artificial intelligence techniques can be used to empirically
evaluate competitive rule sets in the game Pokémon.</p>
      <p>To do this, we use Monte-Carlo tree search to simulate
3v3 battles using a relaxed version of the Smogon
generation 1 rule set, allowing Pokémon to use OHKO moves.
Results show that OHKO moves can have a visible efect
on how the battles play out, but that efect is nuanced in
many ways. Our results indicate that while OHKO are used
in battle when they’re available, they likely are not used
enough to cause annoyance. This, however, is not definitive,
and more work must be done before any conclusions can be
drawn. Overall, this approach gives designers critical
context on how rules afect the potential metagame surrounding
competitive play.</p>
      <p>Overall, we feel that these preliminary results are
promising and provide evidence that further work on utilizing
automated playtesting techniques to evaluate competitive
rulesets has merit. In addition, the approach we use in this
paper has applications to games outside of Pokémon. Any
competitive game that can be simulated could very easily
make use of the approaches that we use in this paper. We
hope this encourages further research into how artificial
intelligence and machine learning can be applied in
competitive games and metagames.
conference on systems, man, and cybernetics (SMC),
IEEE, 2018, pp. 2182–2187.
[14] J. Wang, Winning at Pokémon Random Battles Using
Reinforcement Learning, Ph.D. thesis, Massachusetts
Institute of Technology, 2024.
[15] K. Scheibelhut, libpkmn, 2021-24. URL: https://github.</p>
      <p>com/pkmn/engine.
[16] pasyg, wrapsire, 2023-24. URL: https://github.com/
pasyg/wrapsire.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Novais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <article-title>An adversarial approach for automated pokémon team building and meta-game balance</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. P. R. A.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence methods for automated dificulty and power balance in games (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Crane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Holmes</surname>
          </string-name>
          , T. T. Kosiara,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spradling</surname>
          </string-name>
          ,
          <article-title>Team counter-selection games</article-title>
          ,
          <source>in: 2021 IEEE Conference on Games (CoG)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ferdous</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kifetew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Prandi</surname>
          </string-name>
          , I. Prasetya,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shirzadehhajimahmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Susi</surname>
          </string-name>
          ,
          <article-title>Search-based automated play testing of computer games: A model-based approach</article-title>
          ,
          <source>in: International Symposium on Search Based Software Engineering</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>P. L. P. de Woillemont</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Labory</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Corruble</surname>
          </string-name>
          ,
          <article-title>Automated play-testing through rl based human-like playstyles generation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>18</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Stahlke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mirza-Babaei</surname>
          </string-name>
          ,
          <article-title>Artificial players in the design process: Developing an automated testing tool for game level and world design</article-title>
          ,
          <source>in: Proceedings of the Annual Symposium on Computer-Human Interaction in Play</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>267</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Stahlke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mirza-Babaei</surname>
          </string-name>
          ,
          <article-title>Artificial playfulness: A tool for automated agent-based playtesting</article-title>
          ,
          <source>in: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Holmgård</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liapis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Automated playtesting with procedural personas through mcts with evolved heuristics</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>352</fpage>
          -
          <lpage>362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <article-title>A monte carlo approach to skill-based automated playtesting</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>14</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>166</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Simoes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <article-title>Competitive deep reinforcement learning over a Pokémon battling simulator</article-title>
          ,
          <source>in: 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Villanueva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baldeón</surname>
          </string-name>
          ,
          <article-title>Enhancing pokémon vgc player performance: Intelligent agents through deep reinforcement learning and neuroevolution</article-title>
          , in: International Conference on HumanComputer Interaction, Springer,
          <year>2024</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A self-play policy optimization approach to battling pokémon</article-title>
          ,
          <source>in: 2019 IEEE conference on games (CoG)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ihara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Imai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurihara</surname>
          </string-name>
          ,
          <article-title>Implementation and evaluation of information set monte carlo tree search for pokémon</article-title>
          , in: 2018 IEEE international
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>