=Paper=
{{Paper
|id=Vol-3926/paper3
|storemode=property
|title=An Empirical Analysis of the Validity of Competitive Pokémon Rule Sets
|pdfUrl=https://ceur-ws.org/Vol-3926/paper3.pdf
|volume=Vol-3926
|authors=Nicholas Fluty,Ryan D. Flores,Judy Goldsmith,Brent Harrison
|dblpUrl=https://dblp.org/rec/conf/exag/FlutyFGH24
}}
==An Empirical Analysis of the Validity of Competitive Pokémon Rule Sets==
An Empirical Analysis of the Validity of Competitive Pokémon
Rule Sets
Nicholas Fluty1 , Ryan D. Flores1 , Judy Goldsmith1 and Brent Harrison1
1
Department of Computer Science, University of Kentucky, Lexington, KY 40506-0633 USA
Abstract
In competitive Pokémon battling, players have adopted a set of extra rules that are meant to encourage fair play. They are used to
constrain team formation so that no one team has an overwhelming advantage over all others. These rule sets are often derived based
on trial and error, intuition, or post-hoc evaluations of team performance, which means that the rules may not be ideal solutions to the
problem they are supposed to address, or the problem may not have been worth addressing.
In this paper, we explore how artificial intelligence and machine learning techniques can be used to potentially evaluate the quality
of a rule set. This is meant to be a preliminary study that will ultimately lead to the automatic formulation of such rule sets. Our case
study investigates how the inclusion or exclusion of one-hit-knock-out (OHKO) moves affects the outcomes and player behaviors in
games between two teams battling under Generation 1 rules.
Keywords
Competitive Rules, Artificial Intelligence, Machine Learning
1. Introduction techniques can be used to support the evaluation of com-
petitive rule sets for the game of Pokémon. We are using
Pokémon is a game in which players construct teams of the Pokémon domain to test this concept because of the
combatants, the titular Pokémon, to battle against other existence of community tournaments that contain rules that
players’ teams. A great deal of thought is often put into exist outside of the game environment. Specifically, we
how these teams are constructed, as one wants to utilize present a case study in which we examine the Smogon rules
powerful Pokémon while promoting good synergy as a team. associated with the first generation of the game, demon-
In order to ensure that a healthy competitive atmosphere is strate how we can test changes, and present a data-based
maintained, there are often rules put in place on how a team discussion of the effects of the change. We chose this ruleset
can be constructed. This is meant to ensure that strategies because Smogon rules are largely community-driven and
that are potentially too strong don’t become prevalent as a not necessarily subject to rigorous empirical analysis. The
part of the competitive metagame. primary contribution of this work is to explore how AI and
The rulesets Smogon uses to govern their competitive machine learning techniques can be used to perform vul-
battles are good examples of this. Smogon is a competitive nerability tests on these types of rulesets. This case study
battling community that organizes tournaments, provides serves as preliminary evidence of the feasibility of such an
competitive battling resources, etc. In service of this, they approach, and we hope it will encourage further work in
also define rulesets that are used when these tournaments the area.
are held. These rules govern how players construct and use The remainder of the paper is organized as follows. In the
their teams and are meant to guard against overpowered or next section, we review relevant related work on evaluating
degenerate strategies. In addition to a set of rules common rule sets and metagame in Pokémon. We will then introduce
to all battles, Smogon defines various battle formats that the Smogon generation 1 tournament rule set. Finally, we
restrict which Pokémon can be used to allow for diverse us- will detail our case study and present the results of said
age of both strong and weak Pokémon. The most commonly study.
used format is reffered to as OverUsed (OU), which allows all
but some of the strongest "legendary" Pokémon that were
intentionally given this advantage for purposes outside of 2. Related Works
competitive play. Again, these rulesets are meant to ensure
that no one strategy for constructing teams or battling is The primary contribution of this work is in evaluating the
strictly dominant over all others. rulesets associated with competitive play in games. Specif-
While these rules are often necessary for healthy compet- ically, in this paper we evaluate how the rulesets asso-
itive play, constructing these rulesets can be quite difficult. ciated with competitive Pokémon affect dominant teams.
Often, these rules are based on speculation, anecdotal ev- There has been past work that has examined the Pokémon
idence, or post-hoc analysis. As such, the formation of metagame [1, 2], but that previous work examines what
effective rulesets is an imperfect science that can be time- teams of Pokémon are particularly strong in a metagame
consuming and prone to errors (constructing rules where as defined by the rules associated with competitive play or
there shouldn’t be one or missing a rule that should be investigate countering the metagame [3]. In this paper, we
present). examined whether these rules are justified and how one
In this paper, we investigate how machine learning (ML) might prove them.
To do this, we take inspiration from automated playtest-
11th Experimental Artificial Intelligence in Games Workshop, November ing literature and propose that machine learning techniques
19, 2024, Lexington, Kentucky, USA. can be used to identify problems with rule sets or rules that
$ ndfl222@uky.edu (N. Fluty); rhma226@uky.edu (R. D. Flores); are not justified. Typically, video game playtesting is per-
goldsmit@uky.edu (J. Goldsmith); bha286@g.uky.edu (B. Harrison) formed by humans to determine whether a game contains
0009-0007-3250-4915 (N. Fluty); 0009-0000-2790-5908 (R. D. Flores); errors. This process is time consuming and prone to hu-
0000-0002-8383-5390 (J. Goldsmith); 0000-0002-1301-5928 (B. Harrison)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License man error. Thus, there has been an increased interest in
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
automating this process using artificial agents. In the past, for each of Smogon’s rules, it would make sense that rules
researchers have explored several AI and machine learn- are tested individually as one sees fit.
ing methods for automating this process [4, 5, 6, 7]. Still, We decided the rule of greatest interest to us was the one
these approaches are typically done to evaluate level design prohibiting the use of OHKO moves. Not all Pokémon can
or game mechanics with respect to designer goals. Other use the moves, so we would expect this rule to restrict a
work has focused on better creating agents that can mimic small set of Pokémon.
playtesters of different personas [8] or skillsets [9], but still In their best use cases, OHKO moves will work ∼ 30% of
the primary focus is on evaluating game mechanics or level the time, dealing a fixed amount of damage that can knock
design. In this work, we focus on evaluating rules that are out any opponent. Smogon may have set this rule to prevent
not inherent to the game itself, but that are designed after its users from being too strong, but it might alternatively
the fact to encourage competitive play. exist to keep the game more interesting. For example, rely-
When determining the effect that rule sets have on a ing on luck to this extent may limit players’ ability to win
team’s win probability, we use machine learning to learn through better decision-making. As a counterargument, ice
battle strategies for each team. Machine learning has been moves have a ∼ 10% chance of permanently freezing the
readily explored in Pokémon [10, 11, 12, 13, 14] and found opponent, a large reason to use the move, yet there is no
to be an effective tool for teaching agents how to battle. The rule against it.
main difference between our work and this past work is not While it would be very interesting to know exactly how
in the method, but in the motivation behind the method. the utility of all Pokémon change by the removal of the rule
These past works primarily focused on developing tech- and production of a new ranking of some sort, that would be
niques to be more competent in battle. We are using these either too difficult to compute or require many assumptions
techniques in service of evaluating player-made rule sets. that may not be agreeable. We instead perform a single
experiment where we hope to see some notable effects of
the rule.
3. Smogon Rule Sets To perform our experiment, we use ML to control the
actions of players in four repeated team battle scenarios. We
Smogon uses a modified rule set compared to the official
configure two teams with three Pokémon each, determine
tournaments hosted by Nintendo. The rules vary across the
control movesets for each that make sense for competitive
different Pokémon generations. These modified rules effect
play, and create alternate movesets that make use of OHKO
the formation of the teams as well as the actions a player
moves. In the four scenarios, we test the likelihood of Team
can take during a battle. Every few years a vote is held on
1 winning when one, both, or neither of the teams use the
the Smogon forums to see whether or not any rules need to
alternate movesets. All other factors are assumed to be
be updated or replaced.
constant.
The first set of rules dictates the team formation. The
We conduct these battles using the most popular of
Species Clause prevents a player from having multiple of
Smogon’s battle formats, OU, which only prohibits using
the same Pokémon on their team. This is to encourage more
Mew and Mewtwo. There are 14 Pokémon in the OU tier.
diversity and to prevent players from running teams of the
Pokémon in lower tiers are allowed but not recommended
same Pokémon. Next is the Evasion Clause, which prevents
in the format because they are seen as not worth using over
player from using the moves Double Team or Minimize.
the other 14.
These moves make a Pokémon harder to hit, and can lead
By observing both the outcome of games and the spe-
to stalled games where neither player can win. The final
cific learned behavior of the player agents, we should have
clause is the one-hit-knock-out (OHKO) Clause. Pokémon
insight on how competitive play may be affected by the re-
are not allowed to have the moves Horn Drill, Guillotine,
moval of this rule set. If a team’s change in moveset appears
Sheer Cold, or Fissure. These moves, referred to as OHKO
to increase its likelihood of winning, implying that players
moves, have a low hit rate but will cause the opponent’s
would want to use the moves, then we can be confident that
Pokémon to faint if they do hit. The general opinion on the
the rule set impacts play. If not, then our results are not
forums is that this rule prevents strong players from losing
conclusive but suggest that the rule may not have much
due to randomness
impact.
The second set of clauses affects player behavior during
the battles. The Sleep Clause prevents a player from directly
causing more than one of their opponent’s Pokémon to fall 4.1. The Teams
asleep at a time. If a move attempts to break this clause, the Reasonably designing the two teams and their movesets is
game will automatically prevent the sleep from occurring. an important part of testing the rule, as we want the re-
The Freeze Clause is the same as the Sleep Clause but it sults of the experiments to have implications on how skilled
refers to the freeze status effect. The Endless Battle Clause players would optimally play. For example, if the introduc-
prevents players from intentionally preventing their oppo- tion of OHKO moves to Team 1 yields large benefits against
nent from winning without forfeiting. The Timer Clause Team 2, this would be irrelevant if the control movesets
causes a player to automatically lose the battle if their player of Team 1 were already ineffective, such that we could see
timer is exhausted. the same benefits by using other currently legal moves. In
another case, if Team 2 is not a good representation of a
4. Experimental Design normal competitive team, then one’s ability to beat it is not
meaningful.
To show how ML can be used as a tool for testing rules for First, we acknowledge that 3v3 battles are uncommon
Pokémon battles, we decided to designate one rule for exper- for competitive games. Community rule sets, tier lists, and
imentation. Due to the seemingly independent reasoning recommended movesets are all generally made under the
assumption that battles are 6v6. Our primary reasons for this
change are to limit the number of variables affecting battle but we selected Alakazam for its slightly better offense,
and improve the speed and performance of our applied ML. which would help compensate for the absence of Tauros
Not every Pokémon is capable of using OHKO moves in the from that team.
games, so a reduced team size helps raise the concentration
of OHKO move usage while allowing us focus on just a few 4.2. The Moves
of its users.
Table 1 details all relevant information about the two When determining the moves for the control teams, we
teams and their movesets. Following is a justification of this wanted to balance using the most common moves with
configuration. making all Pokémon reasonably useful for the battle. All
three Pokémon on the Normal team are usually seen with
Table 1
Ice moves, which is redundant and counters Rhydon too
Configurations of Pokémon on both teams well. Snorlax is the one most commonly seen without its
Ice move, so we gave it Self-Destruct to account for other
Team 1 Team 2
difficult situations.
Slot 1 Slowbro Snorlax
A Pokémon can only know four moves, so a move would
Amnesia Body Slam
have to be replaced from each Pokémon that would be given
Slot 1 Surf Reflect
Moves Thunder Wave Rest an OHKO move. The fixed damage nature of these moves
Rest / Fissure Self-Destruct / Fissure makes it seem reasonable to replace a damaging move that
Slot 2 Rhydon Tauros is usually only used situationally. For Snorlax, we replaced
Earthquake Body Slam Self-Destruct. For Tauros, we replaced Earthquake, which
Slot 2 RockSlide/HornDrill Hyper Beam is mostly just used against Gengar in OU.
Moves Body Slam Blizzard We anticipated that Rhydon would have no reason to use
Substitute Earthquake/HornDrill Rock Slide or Body Slam when it has Earthquake, so we just
Slot 3 Alakazam Chansey replaced Rock Slide. Slowbro usually only has one damaging
Psychic Thunderbolt move, so ideally we would replace Amnesia or Rest. Each
Slot 3 Seismic Toss Ice Beam of these moves in the absence of the other is less useful, as
Moves Thunder Wave Soft-Boiled
they are effective in combination. In the end, we decided
Recover Thunder Wave
to sacrifice Rest because it leaves the user vulnerable and
more predictable.
In designing the teams, we noted that the three Pokémon, It may be important to note that neither team was given
Snorlax, Tauros, and Chansey, can be see on almost every moves to inflict the sleep condition on enemies. Smogon
6v6 team in OU. The Type or Types of a Pokémon usually has always recognized sleep to be one of the most powerful
cause them to have particular advantages or disadvantages mechanics given to some Pokémon, which is why they have
against different Pokémon, but these three being of the a rule preventing a team from making two of their opponents
Normal Type means that there are fewer moves that have be asleep at the same time. Our primary reason for not using
type advantage against them. Similarly, their moves are less these moves was that the Pokémon we were considering
likely to be highly effective or ineffective against other types. selecting could not use them. We avoided intentionally
It happens that these three also have some of the highest giving the teams access to these moves because we feared
stat totals, making them some of the strongest Pokémon in that they simply could not be balanced; there are not enough
isolation. Pokémon on each team for players to not be devastated by
Another reason for their high usage is their synergy in the reduction in options. We could also argue that both
performing different roles for a team. Snorlax is one of teams have enough ability to inflict other status conditions,
the bulkiest Pokémon in the game and has the damage to helping fill the void made by sleep’s exclusion.
threaten any opponent. Tauros has both damage and speed,
making it able to quickly finish off weakened opponents.
Used effectively, it can force enemies to switch out, allowing 4.3. Software Used
for a free hit on the new Pokémon. Chansey specializes in To simulate Pokémon battles, we used the pkmn engine,
inflicting negative status conditions and can recover health a Pokémon battle simulation engine optimized for perfor-
faster than a lot of Pokémon can deal damage. Snorlax mance in larger scale projects [15]. This open source tool
and Chansey can also be seen using a variety of movesets accurately implements battles as done in the original game
depending on what best compliments the rest of the team. code and the popular simulator Pokémon Showdown. Poké-
For these reasons, we believed it would be interesting to mon Showdown is sponsored and endorsed by Smogon for
see these three together as a complete team, where Snorlax competitive battles, as it provides practical implementations
and Tauros can both be given OHKO moves. The only other of battles for all Pokémon generations and a practical inter-
OHKO move users in OU are Rhydon and Slowbro, so we face for playing online. The pkmn engine currently only
put them on the other team. Rhydon is advantageous for its fully supports the first generation of Pokémon, but it is able
ability to absorb Normal moves, while Slowbro can inflict to run faster than Showdown while having less unneeded
paralysis with Thunder Wave. Paralysis is critical to the overhead.
team because OHKO moves will not work if the user has less We also used and credit another open source project,
speed than the target. It also seemed interesting to make Wrapsire, which provides a C++ interface for the compiled
use of Slowbro, since he is one of the least used and lowest library produced by the pkmn engine [16]. We opted to use
rated Pokémon categorized into OU. C++ for this project because of its advantages for memory
Because these two are particularly slow, their team would management, multi-threading, and compiler optimizations.
benefit from having a fast Pokémon that can use also use
Thunder Wave. There were several nominees for this role,
4.4. Monte Carlo Tree Search resistant to each commonly used move Type, basically fol-
lowing a recipe for team building. Essentially, players can
Pokémon is a stochastic turn-based game, where a turn is
expect their opponents to have some kind of counter until
initiated by both players selecting an action independently
they have already defeated the counter, and it is not as im-
but concurrently. In order to observe battles that are consis-
portant to know what exactly the counter is. All this leads
tent with the skill expected of competitive players, action
to teams being rather predictable.
selection must be mindful of what gives the highest chances
Other design notes had a lesser effect on the results. Pri-
of winning. To accomplish this, we decided to use Monte
marily, since total health points and amounts of damage
Carlo Tree Search (MCTS).
dealt are rather inflated, we bucketed states together when
We chose to use MCTS primarily because of its ability
counting visits and related information, ignoring the bottom
to handle uncertainty and large state spaces. MCTS has
five bits of both Pokémon’s health points when identifying
also been shown to be effective at simulating Pokémon bat-
a state during a search. This allows for much quicker con-
tles [13]. Pokémon uses a random seed as a factor in many
vergence and is easily modifiable in our code.
different calculations, most notably in standard damage cal-
culation and secondary move effects. Because of this, a turn
initiated from a given game state by given actions may have 4.5. Simulating Each Scenario
hundreds of possible unique resulting game states depend- To observe the likelihood of Team 1 winning with reason-
ing on the seed. This is not problematic for MCTS because able precision, we decided to simulate 500 battles for each
it naturally weighs the value of an action based on the fre- scenario, using MCTS at each turn. Each search consisted
quency that different states result from it, and these states of 100,000 iterations, simulating from the active game state.
similarly develop better heuristics for move selection as the Because of the somewhat large depth and stochastic na-
search continues. ture of Pokémon battles, we chose to limit the search depth
The primary problem we faced with implementing MCTS to 25 turns from the active game state. Simulating past
for Pokémon battles was with determining how to handle 25 turns of depth would likely be unnecessary since states
concurrent action selection. Minimax trees are commonly would not get enough visits to do anything other than ran-
used for turn-based games like chess, but these assume that dom rollouts, which only slowly progress to the end state
players alternate turns and know what the enemy previ- and may favor certain movesets. After 25 turns, we termi-
ously selected. This assumption contradicts the nature of nated the simulated game and considered the team with the
competitive Pokémon battles, as it almost always involves larger sum of percentage healths the winner. When this
players intentionally being unpredictable. This is due to a happened, we gave the winner a modified reward as if it
natural rock-paper-scissors-like relationship among strate- was an average of 9 wins and 1 loss, which was to help
gies. For example, recovery may counter gradual damage, encourage actual wins over this method. Implementing this
while applying buffs or statuses may counter recovery, and caused it to take less time to complete each search.
the right damaging move may yield an enemy’s buffs inef- We also encountered a problem with battles not ending
fective or make them lose before they gain the longer-term when all Pokémon on a team were frozen. This was because
benefits. the algorithm recognized that it could win on any future
To address this issue, we decided to randomly determine turn for equal reward. Rather than reducing the reward for
at every simulation of a turn who will select their action first, winning after more turns, something that could theoretically
and who will pick based on that selected action. While this negatively influence the decisions made, we decided that
does not perfectly represent the distribution of move com- it was safe to assume that a team with all their Pokémon
binations used in competitive games, it effectively balances frozen will not win and let games be terminated at that
the scenarios where a player either picks their safest action point. None of these Pokémon had moves to deal damage
or correctly predicts that their opponent is picking their on turns they do not attack, such as Toxic and Leech Seed.
safest action. This gives an equal advantage to each player, When running the battles, we collected a detailed log of
while also producing more consistency in game outcomes, the actions selected at every turn and the values of relevant
which is helpful for assuring the integrity of our results. It is variables, such as health and status conditions. We used
also significantly less computationally expensive than devel- these to extract interesting information that may inform
oping stochastic policies, which makes it easily applicable our discussion of how gameplay behavior changes when
at all frequently visited states during a search. the external rules for team compositions change.
There is an additional concern caused by simulating for-
ward in a game, and that is that it causes you to know
everything about the opponent’s team. Normally Pokémon 5. Results and Discussion
is a partial information game, where one does not know
anything about the opponent’s team until they switch to
each Pokémon and use their moves. If the battles in this Table 2
experiment were done by people instead of ML agents, Team Summary of battles in each scenario
2 not knowing that Alakazam could switch to Rhydon could Who Has Team 1 Avg. OHKO KO’s
end up disastrous for them if they told Snorlax to use Self- OHKO Moves Win % Turns Sel/Bat /Bat
Destruct. Neither Team 37.5% 52.754 0 0
Representing partial observability not only makes the Team 1 41.9% 42.448 2.666 0.588
problem too complicated to get reasonable results but also Team 2 49.6% 49.426 1.15 0.100
may not change the results very much. Most Pokémon have Both Teams 60.6% 41.002 3.67 0.652
a single set of moves that is used identically in over half
of games. Additionally, teams in 6v6 battles tend to inten- Observing the summary of the battles in Table 2, par-
tionally cover all of their weaknesses by having a Pokémon ticularly the Team 1 Win % column, we can see notable
Table 3 of just Team 1, we saw their win ratio go up from 37.5% to
Pokémon summary when NEITHER has OHKO moves 41.9%. This suggests that we successfully found a use case
Pokémon Move Given Rate Used KO’s/Battle for OHKO moves in competitive battles. Slowbro averaged
T1-Slowbro Rest 2.29% N/A about 1 successfully hit Fissure per 2 battles, while Rhydon
T1-Rhydon Rock Slide 9.10% N/A hit about 1 Horn Drill per 10 battles. Rhydon’s success rate
T2-Snorlax Self-Destruct 12.36% N/A was not very impressive, but this Pokémon’s popularity
T2-Tauros Earthquake 10.90% N/A within OU would probably be enough to cause Horn Drill
to show up as an alternative to catch opponents off guard.
Slowbro had the most interesting results with OHKO
Table 4
moves. We replaced Rest, which Smogon quotes as being
Pokémon summary when Team 1 has OHKO moves
important for use with Amnesia. Slowbro probably would
Pokémon Move Given Rate Used KO’s/Battle have used Amnesia and Rest a lot more if it had been starting
T1-Slowbro Fissure 21.61% 0.482 against a special attacker instead of the physical attacker
T1-Rhydon Horn Drill 13.56% 0.106 Snorlax. Against a different team with more special attack-
T2-Snorlax Self-Destruct 16.11% N/A ers, Slowbro might have done better with the recommended
T2-Tauros Earthquake 10.34% N/A
moveset. On the other hand, Slowbro’s increase in ability to
use alternate movesets would make it more useful in differ-
Table 5 ent scenarios, which could raise its usage rate. Opponents
Pokémon summary when Team 2 has OHKO moves would not know whether to expect the standard Slowbro or
Pokémon Move Given Rate Used KO’s/Battle one with Fissure, causing them to have trouble switching
T1-Slowbro Rest 1.86% N/A appropriately.
T1-Rhydon Rock Slide 6.89% N/A The scenario where both teams had OHKO moves is not as
T2-Snorlax Fissure 4.00% 0.056 interesting because we already showed a disadvantage with
T2-Tauros Horn Drill 15.22% 0.044 the movesets for Team 2. However, it offers a little insight to
how some things change with different circumstances. Most
notably, the change in win rate was significantly greater
Table 6
when both changes were present together. Without going
Pokémon summary when BOTH have OHKO moves
into too much detail, a strong hypothesis could be that the
Pokémon Move Given Rate Used KO’s/Battle absence of Self-Destruct was more devastating for Team
T1-Slowbro Fissure 11.98% 0.424 2 when it had a Fissure-using Slowbro to deal with. Slow-
T1-Rhydon Horn Drill 8.43% 0.096 bro used over twice as many total moves in battles where
T2-Snorlax Fissure 4.98% 0.100 Snorlax did not have Self-Destruct.
T2-Tauros Horn Drill 14.14% 0.032 Aside from the four Pokémon already in OU that can
use OHKO moves, this rule effects some Pokémon that are
just below the cutoff for this tier. Dragonite may be the
changes in the outcome of games based on the change of most notable of these. In OU, Dragonite is the holder of
movesets. In the control battles where neither team was the highest base stat total but lacks the moves to back it
given OHKO moves, we saw that the three popular Normal up. Dragonite is relatively fast and can use Thunder Wave.
types, Team 2, won 5/8 of their games. We expected that Perhaps the only thing stopping it from becoming one of
this team would have an advantage, since they have high the strongest Pokémon with Horn Drill is its weakness to
stats, few weaknesses, and seemingly good synergy. Ice. Ice moves are very useful and can be known by most
When we tried replacing moves from the already advan- Pokémon in OU.
taged Team 2 with OHKO moves, they lost some of their Lapras is also a strong candidate for OU viability with
advantage. This suggests that the moves we replaced were Horn Drill, as it can put Pokémon to sleep with Sing, is
more useful in this battle than the OHKO moves were. In very bulky, and outspeeds other bulky Pokémon. The rule
fact, they collectively managed only 1 successful OHKO per limiting a team to putting just one opponent to sleep at once
10 battles. Taking away Tauros’s Earthquake likely was not currently stops Lapras from being viable in OU, as usually
a big deal since it still had Body Slam, but we suspect that the having one sleeper is seen as ideal. With the addition of
main reason for the change in performance is Self-Destruct, Horn Drill, Lapras might be a better replacement for another
which knocks out the user to deal massive damage. sleeper, especially with Blizzard to take down Rhydon pretty
As a quick side note, Self-Destruct caused ties in 4 of the consistently. Smogon recommends using 3 damaging moves
2,000 battles by knocking out the final two Pokémon. Each for Lapras, even though it is not very offensive; one of these
of these was treated as half of a win when calculating Team is likely easily replaceable by Horn Drill.
1’s win ratio. Win rates aside, the battles can be seen changing in terms
Snorlax’s Self-Destruct can knock out Alakazam in one of the experience for players. Earlier we mentioned that the
hit or take out 3/4 of Slowbro’s health. 220 of the 482 times OHKO rule may exist to keep games interesting. One thing
Snorlax fainted in the control scenario were from using this that can be noted from Table 2 is that, as the number of teams
move. This indicates that the move was worth having and with OHKO moves increased, the turn counts decreased.
using against this team. On the other hand, Self-Destruct When Team 1 had the moves and performed better, the
is more risky in a 6v6 battle because the opponent is more games were about 4/5 as long, even though the win ratio
likely to be able to switch to a Pokémon like Gengar that reflected that it was a more even match.
can absorb the blow and leave Snorlax knocked out. Fissure Game length does not necessarily translate to more or
did not help Snorlax here, but it might find more use with less fun for the players, but this helps show how the game
the right moveset and team. had more of a twist toward randomly rewarding a team with
When we tried putting OHKO moves into the movesets a large advantage that would cause it to end more decisively.
A 30-70 gamble with a high reward likely does this. It makes paper has applications to games outside of Pokémon. Any
sense that skilled players would appreciate their win ratio competitive game that can be simulated could very easily
having higher correlation to that skill they put a lot of time make use of the approaches that we use in this paper. We
into mastering, so this seems like a strong potential motive hope this encourages further research into how artificial
for the rule. intelligence and machine learning can be applied in com-
We also show in Table 2 the average number of OHKO petitive games and metagames.
moves used by any Pokémon per battle in the column OHKO
Sel/Bat. This helps us gain a sense of whether Pokémon are
overutilizing OHKO moves with respect to other moves. If References
a Pokémon is disproportionately using OHKO moves, this
[1] S. Reis, R. Novais, L. P. Reis, N. Lau, An adversarial
could lead to the opposing player becoming annoyed or frus-
approach for automated pokémon team building and
trated with the battle since OHKO moves generally involve
meta-game balance, IEEE Transactions on Games
luck to succeed. Three OHKO moves being selected in a
(2023).
battle lasting 41 turns is not likely to be seen as overusage
[2] S. P. R. A. Reis, Artificial intelligence methods for au-
or annoying. It may be important to acknowledge that this
tomated difficulty and power balance in games (2024).
battle used all four Pokémon in the current OU tier that
[3] D. Crane, Z. Holmes, T. T. Kosiara, M. Nickels,
are able to use OHKO moves, so this can be seen as an esti-
M. Spradling, Team counter-selection games, in: 2021
mated upper bound on OHKO move usage. Given this, the
IEEE Conference on Games (CoG), IEEE, 2021, pp. 1–8.
luck-based nature of these moves seems less concerning.
[4] R. Ferdous, F. Kifetew, D. Prandi, I. Prasetya,
Under these almost ideal conditions for OHKO move us-
S. Shirzadehhajimahmood, A. Susi, Search-based auto-
age, we observed this strategy not posing a notable problem
mated play testing of computer games: A model-based
to player experience. OHKO moves can be observed usu-
approach, in: International Symposium on Search
ally requiring setup by using Thunder Wave or Body Slam,
Based Software Engineering, Springer, 2021, pp. 56–
which shows that the strategy still requires a degree of skill
71.
and team-building decisions to make effective. Furthermore,
[5] P. L. P. de Woillemont, R. Labory, V. Corruble, Auto-
we can see that allowing these moves may give more benefit
mated play-testing through rl based human-like play-
to less commonly used Pokémon, and we can anticipate that
styles generation, in: Proceedings of the AAAI Confer-
some Pokémon that are not considered strong enough for
ence on Artificial Intelligence and Interactive Digital
this battle format may become more relevant.
Entertainment, volume 18, 2022, pp. 146–154.
The results of this experiment do not show a strong justi-
[6] S. Stahlke, A. Nova, P. Mirza-Babaei, Artificial players
fication for the rule banning the use of OHKO moves, but
in the design process: Developing an automated test-
that is not to say that an alternate problematic case does
ing tool for game level and world design, in: Proceed-
not exist. Of course, it must be acknowledged that this is
ings of the Annual Symposium on Computer-Human
a limited case study designed to show how we can collect
Interaction in Play, 2020, pp. 267–280.
and analyze data to test the concerns associated with a rule.
[7] S. Stahlke, A. Nova, P. Mirza-Babaei, Artificial play-
Should the rule be seriously put into question by the Smogon
fulness: A tool for automated agent-based playtesting,
community, it would be wise to observe more cases than
in: Extended Abstracts of the 2019 CHI Conference on
this and in a less anecdotal way.
Human Factors in Computing Systems, 2019, pp. 1–6.
[8] C. Holmgård, M. C. Green, A. Liapis, J. Togelius, Auto-
6. Conclusion mated playtesting with procedural personas through
mcts with evolved heuristics, IEEE Transactions on
Additional rule sets are often used in competitive games Games 11 (2018) 352–362.
to create a healthy competitive metagame. This ensures [9] B. Horn, J. Miller, G. Smith, S. Cooper, A monte carlo
that all participants are playing on a level field. These rule approach to skill-based automated playtesting, in:
sets, however, are often designed using intuition and post- Proceedings of the AAAI Conference on Artificial In-
hoc usage statistics/analysis. In this paper, we explore how telligence and Interactive Digital Entertainment, vol-
artificial intelligence techniques can be used to empirically ume 14, 2018, pp. 166–172.
evaluate competitive rule sets in the game Pokémon. [10] D. Simoes, S. Reis, N. Lau, L. P. Reis, Competitive deep
To do this, we use Monte-Carlo tree search to simulate reinforcement learning over a Pokémon battling simu-
3v3 battles using a relaxed version of the Smogon gener- lator, in: 2020 IEEE International Conference on Au-
ation 1 rule set, allowing Pokémon to use OHKO moves. tonomous Robot Systems and Competitions (ICARSC),
Results show that OHKO moves can have a visible effect IEEE, 2020, pp. 40–45.
on how the battles play out, but that effect is nuanced in [11] G. Rodriguez, E. Villanueva, J. Baldeón, Enhancing
many ways. Our results indicate that while OHKO are used pokémon vgc player performance: Intelligent agents
in battle when they’re available, they likely are not used through deep reinforcement learning and neuroevo-
enough to cause annoyance. This, however, is not definitive, lution, in: International Conference on Human-
and more work must be done before any conclusions can be Computer Interaction, Springer, 2024, pp. 275–294.
drawn. Overall, this approach gives designers critical con- [12] D. Huang, S. Lee, A self-play policy optimization ap-
text on how rules affect the potential metagame surrounding proach to battling pokémon, in: 2019 IEEE conference
competitive play. on games (CoG), IEEE, 2019, pp. 1–4.
Overall, we feel that these preliminary results are promis- [13] H. Ihara, S. Imai, S. Oyama, M. Kurihara, Implemen-
ing and provide evidence that further work on utilizing tation and evaluation of information set monte carlo
automated playtesting techniques to evaluate competitive tree search for pokémon, in: 2018 IEEE international
rulesets has merit. In addition, the approach we use in this
conference on systems, man, and cybernetics (SMC),
IEEE, 2018, pp. 2182–2187.
[14] J. Wang, Winning at Pokémon Random Battles Using
Reinforcement Learning, Ph.D. thesis, Massachusetts
Institute of Technology, 2024.
[15] K. Scheibelhut, libpkmn, 2021-24. URL: https://github.
com/pkmn/engine.
[16] pasyg, wrapsire, 2023-24. URL: https://github.com/
pasyg/wrapsire.