=Paper= {{Paper |id=Vol-3926/paper3 |storemode=property |title=An Empirical Analysis of the Validity of Competitive Pokémon Rule Sets |pdfUrl=https://ceur-ws.org/Vol-3926/paper3.pdf |volume=Vol-3926 |authors=Nicholas Fluty,Ryan D. Flores,Judy Goldsmith,Brent Harrison |dblpUrl=https://dblp.org/rec/conf/exag/FlutyFGH24 }} ==An Empirical Analysis of the Validity of Competitive Pokémon Rule Sets== https://ceur-ws.org/Vol-3926/paper3.pdf
                         An Empirical Analysis of the Validity of Competitive Pokémon
                         Rule Sets
                         Nicholas Fluty1 , Ryan D. Flores1 , Judy Goldsmith1 and Brent Harrison1
                         1
                             Department of Computer Science, University of Kentucky, Lexington, KY 40506-0633 USA


                                           Abstract
                                           In competitive Pokémon battling, players have adopted a set of extra rules that are meant to encourage fair play. They are used to
                                           constrain team formation so that no one team has an overwhelming advantage over all others. These rule sets are often derived based
                                           on trial and error, intuition, or post-hoc evaluations of team performance, which means that the rules may not be ideal solutions to the
                                           problem they are supposed to address, or the problem may not have been worth addressing.
                                               In this paper, we explore how artificial intelligence and machine learning techniques can be used to potentially evaluate the quality
                                           of a rule set. This is meant to be a preliminary study that will ultimately lead to the automatic formulation of such rule sets. Our case
                                           study investigates how the inclusion or exclusion of one-hit-knock-out (OHKO) moves affects the outcomes and player behaviors in
                                           games between two teams battling under Generation 1 rules.

                                           Keywords
                                           Competitive Rules, Artificial Intelligence, Machine Learning



                         1. Introduction                                                                                              techniques can be used to support the evaluation of com-
                                                                                                                                      petitive rule sets for the game of Pokémon. We are using
                         Pokémon is a game in which players construct teams of                                                        the Pokémon domain to test this concept because of the
                         combatants, the titular Pokémon, to battle against other                                                     existence of community tournaments that contain rules that
                         players’ teams. A great deal of thought is often put into                                                    exist outside of the game environment. Specifically, we
                         how these teams are constructed, as one wants to utilize                                                     present a case study in which we examine the Smogon rules
                         powerful Pokémon while promoting good synergy as a team.                                                     associated with the first generation of the game, demon-
                         In order to ensure that a healthy competitive atmosphere is                                                  strate how we can test changes, and present a data-based
                         maintained, there are often rules put in place on how a team                                                 discussion of the effects of the change. We chose this ruleset
                         can be constructed. This is meant to ensure that strategies                                                  because Smogon rules are largely community-driven and
                         that are potentially too strong don’t become prevalent as a                                                  not necessarily subject to rigorous empirical analysis. The
                         part of the competitive metagame.                                                                            primary contribution of this work is to explore how AI and
                            The rulesets Smogon uses to govern their competitive                                                      machine learning techniques can be used to perform vul-
                         battles are good examples of this. Smogon is a competitive                                                   nerability tests on these types of rulesets. This case study
                         battling community that organizes tournaments, provides                                                      serves as preliminary evidence of the feasibility of such an
                         competitive battling resources, etc. In service of this, they                                                approach, and we hope it will encourage further work in
                         also define rulesets that are used when these tournaments                                                    the area.
                         are held. These rules govern how players construct and use                                                      The remainder of the paper is organized as follows. In the
                         their teams and are meant to guard against overpowered or                                                    next section, we review relevant related work on evaluating
                         degenerate strategies. In addition to a set of rules common                                                  rule sets and metagame in Pokémon. We will then introduce
                         to all battles, Smogon defines various battle formats that                                                   the Smogon generation 1 tournament rule set. Finally, we
                         restrict which Pokémon can be used to allow for diverse us-                                                  will detail our case study and present the results of said
                         age of both strong and weak Pokémon. The most commonly                                                       study.
                         used format is reffered to as OverUsed (OU), which allows all
                         but some of the strongest "legendary" Pokémon that were
                         intentionally given this advantage for purposes outside of                                                   2. Related Works
                         competitive play. Again, these rulesets are meant to ensure
                         that no one strategy for constructing teams or battling is                                                   The primary contribution of this work is in evaluating the
                         strictly dominant over all others.                                                                           rulesets associated with competitive play in games. Specif-
                            While these rules are often necessary for healthy compet-                                                 ically, in this paper we evaluate how the rulesets asso-
                         itive play, constructing these rulesets can be quite difficult.                                              ciated with competitive Pokémon affect dominant teams.
                         Often, these rules are based on speculation, anecdotal ev-                                                   There has been past work that has examined the Pokémon
                         idence, or post-hoc analysis. As such, the formation of                                                      metagame [1, 2], but that previous work examines what
                         effective rulesets is an imperfect science that can be time-                                                 teams of Pokémon are particularly strong in a metagame
                         consuming and prone to errors (constructing rules where                                                      as defined by the rules associated with competitive play or
                         there shouldn’t be one or missing a rule that should be                                                      investigate countering the metagame [3]. In this paper, we
                         present).                                                                                                    examined whether these rules are justified and how one
                            In this paper, we investigate how machine learning (ML)                                                   might prove them.
                                                                                                                                         To do this, we take inspiration from automated playtest-
                          11th Experimental Artificial Intelligence in Games Workshop, November                                       ing literature and propose that machine learning techniques
                          19, 2024, Lexington, Kentucky, USA.                                                                         can be used to identify problems with rule sets or rules that
                          $ ndfl222@uky.edu (N. Fluty); rhma226@uky.edu (R. D. Flores);                                               are not justified. Typically, video game playtesting is per-
                          goldsmit@uky.edu (J. Goldsmith); bha286@g.uky.edu (B. Harrison)                                             formed by humans to determine whether a game contains
                           0009-0007-3250-4915 (N. Fluty); 0009-0000-2790-5908 (R. D. Flores);                                       errors. This process is time consuming and prone to hu-
                          0000-0002-8383-5390 (J. Goldsmith); 0000-0002-1301-5928 (B. Harrison)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License   man error. Thus, there has been an increased interest in
                                       Attribution 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
automating this process using artificial agents. In the past,       for each of Smogon’s rules, it would make sense that rules
researchers have explored several AI and machine learn-             are tested individually as one sees fit.
ing methods for automating this process [4, 5, 6, 7]. Still,           We decided the rule of greatest interest to us was the one
these approaches are typically done to evaluate level design        prohibiting the use of OHKO moves. Not all Pokémon can
or game mechanics with respect to designer goals. Other             use the moves, so we would expect this rule to restrict a
work has focused on better creating agents that can mimic           small set of Pokémon.
playtesters of different personas [8] or skillsets [9], but still      In their best use cases, OHKO moves will work ∼ 30% of
the primary focus is on evaluating game mechanics or level          the time, dealing a fixed amount of damage that can knock
design. In this work, we focus on evaluating rules that are         out any opponent. Smogon may have set this rule to prevent
not inherent to the game itself, but that are designed after        its users from being too strong, but it might alternatively
the fact to encourage competitive play.                             exist to keep the game more interesting. For example, rely-
   When determining the effect that rule sets have on a             ing on luck to this extent may limit players’ ability to win
team’s win probability, we use machine learning to learn            through better decision-making. As a counterargument, ice
battle strategies for each team. Machine learning has been          moves have a ∼ 10% chance of permanently freezing the
readily explored in Pokémon [10, 11, 12, 13, 14] and found          opponent, a large reason to use the move, yet there is no
to be an effective tool for teaching agents how to battle. The      rule against it.
main difference between our work and this past work is not             While it would be very interesting to know exactly how
in the method, but in the motivation behind the method.             the utility of all Pokémon change by the removal of the rule
These past works primarily focused on developing tech-              and production of a new ranking of some sort, that would be
niques to be more competent in battle. We are using these           either too difficult to compute or require many assumptions
techniques in service of evaluating player-made rule sets.          that may not be agreeable. We instead perform a single
                                                                    experiment where we hope to see some notable effects of
                                                                    the rule.
3. Smogon Rule Sets                                                    To perform our experiment, we use ML to control the
                                                                    actions of players in four repeated team battle scenarios. We
Smogon uses a modified rule set compared to the official
                                                                    configure two teams with three Pokémon each, determine
tournaments hosted by Nintendo. The rules vary across the
                                                                    control movesets for each that make sense for competitive
different Pokémon generations. These modified rules effect
                                                                    play, and create alternate movesets that make use of OHKO
the formation of the teams as well as the actions a player
                                                                    moves. In the four scenarios, we test the likelihood of Team
can take during a battle. Every few years a vote is held on
                                                                    1 winning when one, both, or neither of the teams use the
the Smogon forums to see whether or not any rules need to
                                                                    alternate movesets. All other factors are assumed to be
be updated or replaced.
                                                                    constant.
   The first set of rules dictates the team formation. The
                                                                       We conduct these battles using the most popular of
Species Clause prevents a player from having multiple of
                                                                    Smogon’s battle formats, OU, which only prohibits using
the same Pokémon on their team. This is to encourage more
                                                                    Mew and Mewtwo. There are 14 Pokémon in the OU tier.
diversity and to prevent players from running teams of the
                                                                    Pokémon in lower tiers are allowed but not recommended
same Pokémon. Next is the Evasion Clause, which prevents
                                                                    in the format because they are seen as not worth using over
player from using the moves Double Team or Minimize.
                                                                    the other 14.
These moves make a Pokémon harder to hit, and can lead
                                                                       By observing both the outcome of games and the spe-
to stalled games where neither player can win. The final
                                                                    cific learned behavior of the player agents, we should have
clause is the one-hit-knock-out (OHKO) Clause. Pokémon
                                                                    insight on how competitive play may be affected by the re-
are not allowed to have the moves Horn Drill, Guillotine,
                                                                    moval of this rule set. If a team’s change in moveset appears
Sheer Cold, or Fissure. These moves, referred to as OHKO
                                                                    to increase its likelihood of winning, implying that players
moves, have a low hit rate but will cause the opponent’s
                                                                    would want to use the moves, then we can be confident that
Pokémon to faint if they do hit. The general opinion on the
                                                                    the rule set impacts play. If not, then our results are not
forums is that this rule prevents strong players from losing
                                                                    conclusive but suggest that the rule may not have much
due to randomness
                                                                    impact.
   The second set of clauses affects player behavior during
the battles. The Sleep Clause prevents a player from directly
causing more than one of their opponent’s Pokémon to fall           4.1. The Teams
asleep at a time. If a move attempts to break this clause, the      Reasonably designing the two teams and their movesets is
game will automatically prevent the sleep from occurring.           an important part of testing the rule, as we want the re-
The Freeze Clause is the same as the Sleep Clause but it            sults of the experiments to have implications on how skilled
refers to the freeze status effect. The Endless Battle Clause       players would optimally play. For example, if the introduc-
prevents players from intentionally preventing their oppo-          tion of OHKO moves to Team 1 yields large benefits against
nent from winning without forfeiting. The Timer Clause              Team 2, this would be irrelevant if the control movesets
causes a player to automatically lose the battle if their player    of Team 1 were already ineffective, such that we could see
timer is exhausted.                                                 the same benefits by using other currently legal moves. In
                                                                    another case, if Team 2 is not a good representation of a
4. Experimental Design                                              normal competitive team, then one’s ability to beat it is not
                                                                    meaningful.
To show how ML can be used as a tool for testing rules for             First, we acknowledge that 3v3 battles are uncommon
Pokémon battles, we decided to designate one rule for exper-        for competitive games. Community rule sets, tier lists, and
imentation. Due to the seemingly independent reasoning              recommended movesets are all generally made under the
                                                                    assumption that battles are 6v6. Our primary reasons for this
change are to limit the number of variables affecting battle        but we selected Alakazam for its slightly better offense,
and improve the speed and performance of our applied ML.            which would help compensate for the absence of Tauros
Not every Pokémon is capable of using OHKO moves in the             from that team.
games, so a reduced team size helps raise the concentration
of OHKO move usage while allowing us focus on just a few            4.2. The Moves
of its users.
   Table 1 details all relevant information about the two           When determining the moves for the control teams, we
teams and their movesets. Following is a justification of this      wanted to balance using the most common moves with
configuration.                                                      making all Pokémon reasonably useful for the battle. All
                                                                    three Pokémon on the Normal team are usually seen with
Table 1
                                                                    Ice moves, which is redundant and counters Rhydon too
Configurations of Pokémon on both teams                             well. Snorlax is the one most commonly seen without its
                                                                    Ice move, so we gave it Self-Destruct to account for other
                   Team 1                      Team 2
                                                                    difficult situations.
  Slot 1           Slowbro                     Snorlax
                                                                       A Pokémon can only know four moves, so a move would
                   Amnesia                   Body Slam
                                                                    have to be replaced from each Pokémon that would be given
  Slot 1             Surf                      Reflect
  Moves         Thunder Wave                     Rest               an OHKO move. The fixed damage nature of these moves
                Rest / Fissure        Self-Destruct / Fissure       makes it seem reasonable to replace a damaging move that
  Slot 2           Rhydon                       Tauros              is usually only used situationally. For Snorlax, we replaced
                 Earthquake                  Body Slam              Self-Destruct. For Tauros, we replaced Earthquake, which
  Slot 2     RockSlide/HornDrill            Hyper Beam              is mostly just used against Gengar in OU.
  Moves           Body Slam                    Blizzard                We anticipated that Rhydon would have no reason to use
                  Substitute          Earthquake/HornDrill          Rock Slide or Body Slam when it has Earthquake, so we just
  Slot 3          Alakazam                    Chansey               replaced Rock Slide. Slowbro usually only has one damaging
                   Psychic                  Thunderbolt             move, so ideally we would replace Amnesia or Rest. Each
  Slot 3         Seismic Toss                 Ice Beam              of these moves in the absence of the other is less useful, as
  Moves         Thunder Wave                Soft-Boiled
                                                                    they are effective in combination. In the end, we decided
                   Recover                 Thunder Wave
                                                                    to sacrifice Rest because it leaves the user vulnerable and
                                                                    more predictable.
   In designing the teams, we noted that the three Pokémon,            It may be important to note that neither team was given
Snorlax, Tauros, and Chansey, can be see on almost every            moves to inflict the sleep condition on enemies. Smogon
6v6 team in OU. The Type or Types of a Pokémon usually              has always recognized sleep to be one of the most powerful
cause them to have particular advantages or disadvantages           mechanics given to some Pokémon, which is why they have
against different Pokémon, but these three being of the             a rule preventing a team from making two of their opponents
Normal Type means that there are fewer moves that have              be asleep at the same time. Our primary reason for not using
type advantage against them. Similarly, their moves are less        these moves was that the Pokémon we were considering
likely to be highly effective or ineffective against other types.   selecting could not use them. We avoided intentionally
It happens that these three also have some of the highest           giving the teams access to these moves because we feared
stat totals, making them some of the strongest Pokémon in           that they simply could not be balanced; there are not enough
isolation.                                                          Pokémon on each team for players to not be devastated by
   Another reason for their high usage is their synergy in          the reduction in options. We could also argue that both
performing different roles for a team. Snorlax is one of            teams have enough ability to inflict other status conditions,
the bulkiest Pokémon in the game and has the damage to              helping fill the void made by sleep’s exclusion.
threaten any opponent. Tauros has both damage and speed,
making it able to quickly finish off weakened opponents.
Used effectively, it can force enemies to switch out, allowing      4.3. Software Used
for a free hit on the new Pokémon. Chansey specializes in           To simulate Pokémon battles, we used the pkmn engine,
inflicting negative status conditions and can recover health        a Pokémon battle simulation engine optimized for perfor-
faster than a lot of Pokémon can deal damage. Snorlax               mance in larger scale projects [15]. This open source tool
and Chansey can also be seen using a variety of movesets            accurately implements battles as done in the original game
depending on what best compliments the rest of the team.            code and the popular simulator Pokémon Showdown. Poké-
   For these reasons, we believed it would be interesting to        mon Showdown is sponsored and endorsed by Smogon for
see these three together as a complete team, where Snorlax          competitive battles, as it provides practical implementations
and Tauros can both be given OHKO moves. The only other             of battles for all Pokémon generations and a practical inter-
OHKO move users in OU are Rhydon and Slowbro, so we                 face for playing online. The pkmn engine currently only
put them on the other team. Rhydon is advantageous for its          fully supports the first generation of Pokémon, but it is able
ability to absorb Normal moves, while Slowbro can inflict           to run faster than Showdown while having less unneeded
paralysis with Thunder Wave. Paralysis is critical to the           overhead.
team because OHKO moves will not work if the user has less             We also used and credit another open source project,
speed than the target. It also seemed interesting to make           Wrapsire, which provides a C++ interface for the compiled
use of Slowbro, since he is one of the least used and lowest        library produced by the pkmn engine [16]. We opted to use
rated Pokémon categorized into OU.                                  C++ for this project because of its advantages for memory
   Because these two are particularly slow, their team would        management, multi-threading, and compiler optimizations.
benefit from having a fast Pokémon that can use also use
Thunder Wave. There were several nominees for this role,
4.4. Monte Carlo Tree Search                                        resistant to each commonly used move Type, basically fol-
                                                                    lowing a recipe for team building. Essentially, players can
Pokémon is a stochastic turn-based game, where a turn is
                                                                    expect their opponents to have some kind of counter until
initiated by both players selecting an action independently
                                                                    they have already defeated the counter, and it is not as im-
but concurrently. In order to observe battles that are consis-
                                                                    portant to know what exactly the counter is. All this leads
tent with the skill expected of competitive players, action
                                                                    to teams being rather predictable.
selection must be mindful of what gives the highest chances
                                                                       Other design notes had a lesser effect on the results. Pri-
of winning. To accomplish this, we decided to use Monte
                                                                    marily, since total health points and amounts of damage
Carlo Tree Search (MCTS).
                                                                    dealt are rather inflated, we bucketed states together when
   We chose to use MCTS primarily because of its ability
                                                                    counting visits and related information, ignoring the bottom
to handle uncertainty and large state spaces. MCTS has
                                                                    five bits of both Pokémon’s health points when identifying
also been shown to be effective at simulating Pokémon bat-
                                                                    a state during a search. This allows for much quicker con-
tles [13]. Pokémon uses a random seed as a factor in many
                                                                    vergence and is easily modifiable in our code.
different calculations, most notably in standard damage cal-
culation and secondary move effects. Because of this, a turn
initiated from a given game state by given actions may have         4.5. Simulating Each Scenario
hundreds of possible unique resulting game states depend-           To observe the likelihood of Team 1 winning with reason-
ing on the seed. This is not problematic for MCTS because           able precision, we decided to simulate 500 battles for each
it naturally weighs the value of an action based on the fre-        scenario, using MCTS at each turn. Each search consisted
quency that different states result from it, and these states       of 100,000 iterations, simulating from the active game state.
similarly develop better heuristics for move selection as the          Because of the somewhat large depth and stochastic na-
search continues.                                                   ture of Pokémon battles, we chose to limit the search depth
   The primary problem we faced with implementing MCTS              to 25 turns from the active game state. Simulating past
for Pokémon battles was with determining how to handle              25 turns of depth would likely be unnecessary since states
concurrent action selection. Minimax trees are commonly             would not get enough visits to do anything other than ran-
used for turn-based games like chess, but these assume that         dom rollouts, which only slowly progress to the end state
players alternate turns and know what the enemy previ-              and may favor certain movesets. After 25 turns, we termi-
ously selected. This assumption contradicts the nature of           nated the simulated game and considered the team with the
competitive Pokémon battles, as it almost always involves           larger sum of percentage healths the winner. When this
players intentionally being unpredictable. This is due to a         happened, we gave the winner a modified reward as if it
natural rock-paper-scissors-like relationship among strate-         was an average of 9 wins and 1 loss, which was to help
gies. For example, recovery may counter gradual damage,             encourage actual wins over this method. Implementing this
while applying buffs or statuses may counter recovery, and          caused it to take less time to complete each search.
the right damaging move may yield an enemy’s buffs inef-               We also encountered a problem with battles not ending
fective or make them lose before they gain the longer-term          when all Pokémon on a team were frozen. This was because
benefits.                                                           the algorithm recognized that it could win on any future
   To address this issue, we decided to randomly determine          turn for equal reward. Rather than reducing the reward for
at every simulation of a turn who will select their action first,   winning after more turns, something that could theoretically
and who will pick based on that selected action. While this         negatively influence the decisions made, we decided that
does not perfectly represent the distribution of move com-          it was safe to assume that a team with all their Pokémon
binations used in competitive games, it effectively balances        frozen will not win and let games be terminated at that
the scenarios where a player either picks their safest action       point. None of these Pokémon had moves to deal damage
or correctly predicts that their opponent is picking their          on turns they do not attack, such as Toxic and Leech Seed.
safest action. This gives an equal advantage to each player,           When running the battles, we collected a detailed log of
while also producing more consistency in game outcomes,             the actions selected at every turn and the values of relevant
which is helpful for assuring the integrity of our results. It is   variables, such as health and status conditions. We used
also significantly less computationally expensive than devel-       these to extract interesting information that may inform
oping stochastic policies, which makes it easily applicable         our discussion of how gameplay behavior changes when
at all frequently visited states during a search.                   the external rules for team compositions change.
   There is an additional concern caused by simulating for-
ward in a game, and that is that it causes you to know
everything about the opponent’s team. Normally Pokémon              5. Results and Discussion
is a partial information game, where one does not know
anything about the opponent’s team until they switch to
each Pokémon and use their moves. If the battles in this            Table 2
experiment were done by people instead of ML agents, Team           Summary of battles in each scenario
2 not knowing that Alakazam could switch to Rhydon could                Who Has          Team 1      Avg.    OHKO        KO’s
end up disastrous for them if they told Snorlax to use Self-          OHKO Moves         Win %      Turns    Sel/Bat     /Bat
Destruct.                                                             Neither Team        37.5%     52.754       0         0
   Representing partial observability not only makes the                 Team 1           41.9%     42.448    2.666      0.588
problem too complicated to get reasonable results but also               Team 2           49.6%     49.426     1.15      0.100
may not change the results very much. Most Pokémon have                Both Teams         60.6%     41.002     3.67      0.652
a single set of moves that is used identically in over half
of games. Additionally, teams in 6v6 battles tend to inten-            Observing the summary of the battles in Table 2, par-
tionally cover all of their weaknesses by having a Pokémon          ticularly the Team 1 Win % column, we can see notable
Table 3                                                            of just Team 1, we saw their win ratio go up from 37.5% to
Pokémon summary when NEITHER has OHKO moves                        41.9%. This suggests that we successfully found a use case
  Pokémon        Move Given       Rate Used     KO’s/Battle        for OHKO moves in competitive battles. Slowbro averaged
  T1-Slowbro          Rest          2.29%          N/A             about 1 successfully hit Fissure per 2 battles, while Rhydon
  T1-Rhydon       Rock Slide        9.10%          N/A             hit about 1 Horn Drill per 10 battles. Rhydon’s success rate
  T2-Snorlax     Self-Destruct     12.36%          N/A             was not very impressive, but this Pokémon’s popularity
   T2-Tauros      Earthquake       10.90%          N/A             within OU would probably be enough to cause Horn Drill
                                                                   to show up as an alternative to catch opponents off guard.
                                                                      Slowbro had the most interesting results with OHKO
Table 4
                                                                   moves. We replaced Rest, which Smogon quotes as being
Pokémon summary when Team 1 has OHKO moves
                                                                   important for use with Amnesia. Slowbro probably would
  Pokémon        Move Given       Rate Used     KO’s/Battle        have used Amnesia and Rest a lot more if it had been starting
  T1-Slowbro        Fissure        21.61%         0.482            against a special attacker instead of the physical attacker
  T1-Rhydon        Horn Drill      13.56%         0.106            Snorlax. Against a different team with more special attack-
  T2-Snorlax     Self-Destruct     16.11%          N/A             ers, Slowbro might have done better with the recommended
   T2-Tauros      Earthquake       10.34%          N/A
                                                                   moveset. On the other hand, Slowbro’s increase in ability to
                                                                   use alternate movesets would make it more useful in differ-
Table 5                                                            ent scenarios, which could raise its usage rate. Opponents
Pokémon summary when Team 2 has OHKO moves                         would not know whether to expect the standard Slowbro or
  Pokémon        Move Given       Rate Used     KO’s/Battle        one with Fissure, causing them to have trouble switching
  T1-Slowbro        Rest            1.86%          N/A             appropriately.
  T1-Rhydon       Rock Slide        6.89%          N/A                The scenario where both teams had OHKO moves is not as
  T2-Snorlax       Fissure          4.00%         0.056            interesting because we already showed a disadvantage with
   T2-Tauros      Horn Drill       15.22%         0.044            the movesets for Team 2. However, it offers a little insight to
                                                                   how some things change with different circumstances. Most
                                                                   notably, the change in win rate was significantly greater
Table 6
                                                                   when both changes were present together. Without going
Pokémon summary when BOTH have OHKO moves
                                                                   into too much detail, a strong hypothesis could be that the
  Pokémon        Move Given       Rate Used     KO’s/Battle        absence of Self-Destruct was more devastating for Team
  T1-Slowbro       Fissure         11.98%         0.424            2 when it had a Fissure-using Slowbro to deal with. Slow-
  T1-Rhydon       Horn Drill        8.43%         0.096            bro used over twice as many total moves in battles where
  T2-Snorlax       Fissure          4.98%         0.100            Snorlax did not have Self-Destruct.
   T2-Tauros      Horn Drill       14.14%         0.032               Aside from the four Pokémon already in OU that can
                                                                   use OHKO moves, this rule effects some Pokémon that are
                                                                   just below the cutoff for this tier. Dragonite may be the
changes in the outcome of games based on the change of             most notable of these. In OU, Dragonite is the holder of
movesets. In the control battles where neither team was            the highest base stat total but lacks the moves to back it
given OHKO moves, we saw that the three popular Normal             up. Dragonite is relatively fast and can use Thunder Wave.
types, Team 2, won 5/8 of their games. We expected that            Perhaps the only thing stopping it from becoming one of
this team would have an advantage, since they have high            the strongest Pokémon with Horn Drill is its weakness to
stats, few weaknesses, and seemingly good synergy.                 Ice. Ice moves are very useful and can be known by most
   When we tried replacing moves from the already advan-           Pokémon in OU.
taged Team 2 with OHKO moves, they lost some of their                 Lapras is also a strong candidate for OU viability with
advantage. This suggests that the moves we replaced were           Horn Drill, as it can put Pokémon to sleep with Sing, is
more useful in this battle than the OHKO moves were. In            very bulky, and outspeeds other bulky Pokémon. The rule
fact, they collectively managed only 1 successful OHKO per         limiting a team to putting just one opponent to sleep at once
10 battles. Taking away Tauros’s Earthquake likely was not         currently stops Lapras from being viable in OU, as usually
a big deal since it still had Body Slam, but we suspect that the   having one sleeper is seen as ideal. With the addition of
main reason for the change in performance is Self-Destruct,        Horn Drill, Lapras might be a better replacement for another
which knocks out the user to deal massive damage.                  sleeper, especially with Blizzard to take down Rhydon pretty
   As a quick side note, Self-Destruct caused ties in 4 of the     consistently. Smogon recommends using 3 damaging moves
2,000 battles by knocking out the final two Pokémon. Each          for Lapras, even though it is not very offensive; one of these
of these was treated as half of a win when calculating Team        is likely easily replaceable by Horn Drill.
1’s win ratio.                                                        Win rates aside, the battles can be seen changing in terms
   Snorlax’s Self-Destruct can knock out Alakazam in one           of the experience for players. Earlier we mentioned that the
hit or take out 3/4 of Slowbro’s health. 220 of the 482 times      OHKO rule may exist to keep games interesting. One thing
Snorlax fainted in the control scenario were from using this       that can be noted from Table 2 is that, as the number of teams
move. This indicates that the move was worth having and            with OHKO moves increased, the turn counts decreased.
using against this team. On the other hand, Self-Destruct          When Team 1 had the moves and performed better, the
is more risky in a 6v6 battle because the opponent is more         games were about 4/5 as long, even though the win ratio
likely to be able to switch to a Pokémon like Gengar that          reflected that it was a more even match.
can absorb the blow and leave Snorlax knocked out. Fissure            Game length does not necessarily translate to more or
did not help Snorlax here, but it might find more use with         less fun for the players, but this helps show how the game
the right moveset and team.                                        had more of a twist toward randomly rewarding a team with
   When we tried putting OHKO moves into the movesets              a large advantage that would cause it to end more decisively.
A 30-70 gamble with a high reward likely does this. It makes     paper has applications to games outside of Pokémon. Any
sense that skilled players would appreciate their win ratio      competitive game that can be simulated could very easily
having higher correlation to that skill they put a lot of time   make use of the approaches that we use in this paper. We
into mastering, so this seems like a strong potential motive     hope this encourages further research into how artificial
for the rule.                                                    intelligence and machine learning can be applied in com-
   We also show in Table 2 the average number of OHKO            petitive games and metagames.
moves used by any Pokémon per battle in the column OHKO
Sel/Bat. This helps us gain a sense of whether Pokémon are
overutilizing OHKO moves with respect to other moves. If         References
a Pokémon is disproportionately using OHKO moves, this
                                                                  [1] S. Reis, R. Novais, L. P. Reis, N. Lau, An adversarial
could lead to the opposing player becoming annoyed or frus-
                                                                      approach for automated pokémon team building and
trated with the battle since OHKO moves generally involve
                                                                      meta-game balance, IEEE Transactions on Games
luck to succeed. Three OHKO moves being selected in a
                                                                      (2023).
battle lasting 41 turns is not likely to be seen as overusage
                                                                  [2] S. P. R. A. Reis, Artificial intelligence methods for au-
or annoying. It may be important to acknowledge that this
                                                                      tomated difficulty and power balance in games (2024).
battle used all four Pokémon in the current OU tier that
                                                                  [3] D. Crane, Z. Holmes, T. T. Kosiara, M. Nickels,
are able to use OHKO moves, so this can be seen as an esti-
                                                                      M. Spradling, Team counter-selection games, in: 2021
mated upper bound on OHKO move usage. Given this, the
                                                                      IEEE Conference on Games (CoG), IEEE, 2021, pp. 1–8.
luck-based nature of these moves seems less concerning.
                                                                  [4] R. Ferdous, F. Kifetew, D. Prandi, I. Prasetya,
   Under these almost ideal conditions for OHKO move us-
                                                                      S. Shirzadehhajimahmood, A. Susi, Search-based auto-
age, we observed this strategy not posing a notable problem
                                                                      mated play testing of computer games: A model-based
to player experience. OHKO moves can be observed usu-
                                                                      approach, in: International Symposium on Search
ally requiring setup by using Thunder Wave or Body Slam,
                                                                      Based Software Engineering, Springer, 2021, pp. 56–
which shows that the strategy still requires a degree of skill
                                                                      71.
and team-building decisions to make effective. Furthermore,
                                                                  [5] P. L. P. de Woillemont, R. Labory, V. Corruble, Auto-
we can see that allowing these moves may give more benefit
                                                                      mated play-testing through rl based human-like play-
to less commonly used Pokémon, and we can anticipate that
                                                                      styles generation, in: Proceedings of the AAAI Confer-
some Pokémon that are not considered strong enough for
                                                                      ence on Artificial Intelligence and Interactive Digital
this battle format may become more relevant.
                                                                      Entertainment, volume 18, 2022, pp. 146–154.
   The results of this experiment do not show a strong justi-
                                                                  [6] S. Stahlke, A. Nova, P. Mirza-Babaei, Artificial players
fication for the rule banning the use of OHKO moves, but
                                                                      in the design process: Developing an automated test-
that is not to say that an alternate problematic case does
                                                                      ing tool for game level and world design, in: Proceed-
not exist. Of course, it must be acknowledged that this is
                                                                      ings of the Annual Symposium on Computer-Human
a limited case study designed to show how we can collect
                                                                      Interaction in Play, 2020, pp. 267–280.
and analyze data to test the concerns associated with a rule.
                                                                  [7] S. Stahlke, A. Nova, P. Mirza-Babaei, Artificial play-
Should the rule be seriously put into question by the Smogon
                                                                      fulness: A tool for automated agent-based playtesting,
community, it would be wise to observe more cases than
                                                                      in: Extended Abstracts of the 2019 CHI Conference on
this and in a less anecdotal way.
                                                                      Human Factors in Computing Systems, 2019, pp. 1–6.
                                                                  [8] C. Holmgård, M. C. Green, A. Liapis, J. Togelius, Auto-
6. Conclusion                                                         mated playtesting with procedural personas through
                                                                      mcts with evolved heuristics, IEEE Transactions on
Additional rule sets are often used in competitive games              Games 11 (2018) 352–362.
to create a healthy competitive metagame. This ensures            [9] B. Horn, J. Miller, G. Smith, S. Cooper, A monte carlo
that all participants are playing on a level field. These rule        approach to skill-based automated playtesting, in:
sets, however, are often designed using intuition and post-           Proceedings of the AAAI Conference on Artificial In-
hoc usage statistics/analysis. In this paper, we explore how          telligence and Interactive Digital Entertainment, vol-
artificial intelligence techniques can be used to empirically         ume 14, 2018, pp. 166–172.
evaluate competitive rule sets in the game Pokémon.              [10] D. Simoes, S. Reis, N. Lau, L. P. Reis, Competitive deep
   To do this, we use Monte-Carlo tree search to simulate             reinforcement learning over a Pokémon battling simu-
3v3 battles using a relaxed version of the Smogon gener-              lator, in: 2020 IEEE International Conference on Au-
ation 1 rule set, allowing Pokémon to use OHKO moves.                 tonomous Robot Systems and Competitions (ICARSC),
Results show that OHKO moves can have a visible effect                IEEE, 2020, pp. 40–45.
on how the battles play out, but that effect is nuanced in       [11] G. Rodriguez, E. Villanueva, J. Baldeón, Enhancing
many ways. Our results indicate that while OHKO are used              pokémon vgc player performance: Intelligent agents
in battle when they’re available, they likely are not used            through deep reinforcement learning and neuroevo-
enough to cause annoyance. This, however, is not definitive,          lution, in: International Conference on Human-
and more work must be done before any conclusions can be              Computer Interaction, Springer, 2024, pp. 275–294.
drawn. Overall, this approach gives designers critical con-      [12] D. Huang, S. Lee, A self-play policy optimization ap-
text on how rules affect the potential metagame surrounding           proach to battling pokémon, in: 2019 IEEE conference
competitive play.                                                     on games (CoG), IEEE, 2019, pp. 1–4.
   Overall, we feel that these preliminary results are promis-   [13] H. Ihara, S. Imai, S. Oyama, M. Kurihara, Implemen-
ing and provide evidence that further work on utilizing               tation and evaluation of information set monte carlo
automated playtesting techniques to evaluate competitive              tree search for pokémon, in: 2018 IEEE international
rulesets has merit. In addition, the approach we use in this
     conference on systems, man, and cybernetics (SMC),
     IEEE, 2018, pp. 2182–2187.
[14] J. Wang, Winning at Pokémon Random Battles Using
     Reinforcement Learning, Ph.D. thesis, Massachusetts
     Institute of Technology, 2024.
[15] K. Scheibelhut, libpkmn, 2021-24. URL: https://github.
     com/pkmn/engine.
[16] pasyg, wrapsire, 2023-24. URL: https://github.com/
     pasyg/wrapsire.