An Empirical Analysis of the Validity of Competitive Pokémon Rule Sets Nicholas Fluty1 , Ryan D. Flores1 , Judy Goldsmith1 and Brent Harrison1 1 Department of Computer Science, University of Kentucky, Lexington, KY 40506-0633 USA Abstract In competitive Pokémon battling, players have adopted a set of extra rules that are meant to encourage fair play. They are used to constrain team formation so that no one team has an overwhelming advantage over all others. These rule sets are often derived based on trial and error, intuition, or post-hoc evaluations of team performance, which means that the rules may not be ideal solutions to the problem they are supposed to address, or the problem may not have been worth addressing. In this paper, we explore how artificial intelligence and machine learning techniques can be used to potentially evaluate the quality of a rule set. This is meant to be a preliminary study that will ultimately lead to the automatic formulation of such rule sets. Our case study investigates how the inclusion or exclusion of one-hit-knock-out (OHKO) moves affects the outcomes and player behaviors in games between two teams battling under Generation 1 rules. Keywords Competitive Rules, Artificial Intelligence, Machine Learning 1. Introduction techniques can be used to support the evaluation of com- petitive rule sets for the game of Pokémon. We are using Pokémon is a game in which players construct teams of the Pokémon domain to test this concept because of the combatants, the titular Pokémon, to battle against other existence of community tournaments that contain rules that players’ teams. A great deal of thought is often put into exist outside of the game environment. Specifically, we how these teams are constructed, as one wants to utilize present a case study in which we examine the Smogon rules powerful Pokémon while promoting good synergy as a team. associated with the first generation of the game, demon- In order to ensure that a healthy competitive atmosphere is strate how we can test changes, and present a data-based maintained, there are often rules put in place on how a team discussion of the effects of the change. We chose this ruleset can be constructed. This is meant to ensure that strategies because Smogon rules are largely community-driven and that are potentially too strong don’t become prevalent as a not necessarily subject to rigorous empirical analysis. The part of the competitive metagame. primary contribution of this work is to explore how AI and The rulesets Smogon uses to govern their competitive machine learning techniques can be used to perform vul- battles are good examples of this. Smogon is a competitive nerability tests on these types of rulesets. This case study battling community that organizes tournaments, provides serves as preliminary evidence of the feasibility of such an competitive battling resources, etc. In service of this, they approach, and we hope it will encourage further work in also define rulesets that are used when these tournaments the area. are held. These rules govern how players construct and use The remainder of the paper is organized as follows. In the their teams and are meant to guard against overpowered or next section, we review relevant related work on evaluating degenerate strategies. In addition to a set of rules common rule sets and metagame in Pokémon. We will then introduce to all battles, Smogon defines various battle formats that the Smogon generation 1 tournament rule set. Finally, we restrict which Pokémon can be used to allow for diverse us- will detail our case study and present the results of said age of both strong and weak Pokémon. The most commonly study. used format is reffered to as OverUsed (OU), which allows all but some of the strongest "legendary" Pokémon that were intentionally given this advantage for purposes outside of 2. Related Works competitive play. Again, these rulesets are meant to ensure that no one strategy for constructing teams or battling is The primary contribution of this work is in evaluating the strictly dominant over all others. rulesets associated with competitive play in games. Specif- While these rules are often necessary for healthy compet- ically, in this paper we evaluate how the rulesets asso- itive play, constructing these rulesets can be quite difficult. ciated with competitive Pokémon affect dominant teams. Often, these rules are based on speculation, anecdotal ev- There has been past work that has examined the Pokémon idence, or post-hoc analysis. As such, the formation of metagame [1, 2], but that previous work examines what effective rulesets is an imperfect science that can be time- teams of Pokémon are particularly strong in a metagame consuming and prone to errors (constructing rules where as defined by the rules associated with competitive play or there shouldn’t be one or missing a rule that should be investigate countering the metagame [3]. In this paper, we present). examined whether these rules are justified and how one In this paper, we investigate how machine learning (ML) might prove them. To do this, we take inspiration from automated playtest- 11th Experimental Artificial Intelligence in Games Workshop, November ing literature and propose that machine learning techniques 19, 2024, Lexington, Kentucky, USA. can be used to identify problems with rule sets or rules that $ ndfl222@uky.edu (N. Fluty); rhma226@uky.edu (R. D. Flores); are not justified. Typically, video game playtesting is per- goldsmit@uky.edu (J. Goldsmith); bha286@g.uky.edu (B. Harrison) formed by humans to determine whether a game contains  0009-0007-3250-4915 (N. Fluty); 0009-0000-2790-5908 (R. D. Flores); errors. This process is time consuming and prone to hu- 0000-0002-8383-5390 (J. Goldsmith); 0000-0002-1301-5928 (B. Harrison) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License man error. Thus, there has been an increased interest in Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings automating this process using artificial agents. In the past, for each of Smogon’s rules, it would make sense that rules researchers have explored several AI and machine learn- are tested individually as one sees fit. ing methods for automating this process [4, 5, 6, 7]. Still, We decided the rule of greatest interest to us was the one these approaches are typically done to evaluate level design prohibiting the use of OHKO moves. Not all Pokémon can or game mechanics with respect to designer goals. Other use the moves, so we would expect this rule to restrict a work has focused on better creating agents that can mimic small set of Pokémon. playtesters of different personas [8] or skillsets [9], but still In their best use cases, OHKO moves will work ∼ 30% of the primary focus is on evaluating game mechanics or level the time, dealing a fixed amount of damage that can knock design. In this work, we focus on evaluating rules that are out any opponent. Smogon may have set this rule to prevent not inherent to the game itself, but that are designed after its users from being too strong, but it might alternatively the fact to encourage competitive play. exist to keep the game more interesting. For example, rely- When determining the effect that rule sets have on a ing on luck to this extent may limit players’ ability to win team’s win probability, we use machine learning to learn through better decision-making. As a counterargument, ice battle strategies for each team. Machine learning has been moves have a ∼ 10% chance of permanently freezing the readily explored in Pokémon [10, 11, 12, 13, 14] and found opponent, a large reason to use the move, yet there is no to be an effective tool for teaching agents how to battle. The rule against it. main difference between our work and this past work is not While it would be very interesting to know exactly how in the method, but in the motivation behind the method. the utility of all Pokémon change by the removal of the rule These past works primarily focused on developing tech- and production of a new ranking of some sort, that would be niques to be more competent in battle. We are using these either too difficult to compute or require many assumptions techniques in service of evaluating player-made rule sets. that may not be agreeable. We instead perform a single experiment where we hope to see some notable effects of the rule. 3. Smogon Rule Sets To perform our experiment, we use ML to control the actions of players in four repeated team battle scenarios. We Smogon uses a modified rule set compared to the official configure two teams with three Pokémon each, determine tournaments hosted by Nintendo. The rules vary across the control movesets for each that make sense for competitive different Pokémon generations. These modified rules effect play, and create alternate movesets that make use of OHKO the formation of the teams as well as the actions a player moves. In the four scenarios, we test the likelihood of Team can take during a battle. Every few years a vote is held on 1 winning when one, both, or neither of the teams use the the Smogon forums to see whether or not any rules need to alternate movesets. All other factors are assumed to be be updated or replaced. constant. The first set of rules dictates the team formation. The We conduct these battles using the most popular of Species Clause prevents a player from having multiple of Smogon’s battle formats, OU, which only prohibits using the same Pokémon on their team. This is to encourage more Mew and Mewtwo. There are 14 Pokémon in the OU tier. diversity and to prevent players from running teams of the Pokémon in lower tiers are allowed but not recommended same Pokémon. Next is the Evasion Clause, which prevents in the format because they are seen as not worth using over player from using the moves Double Team or Minimize. the other 14. These moves make a Pokémon harder to hit, and can lead By observing both the outcome of games and the spe- to stalled games where neither player can win. The final cific learned behavior of the player agents, we should have clause is the one-hit-knock-out (OHKO) Clause. Pokémon insight on how competitive play may be affected by the re- are not allowed to have the moves Horn Drill, Guillotine, moval of this rule set. If a team’s change in moveset appears Sheer Cold, or Fissure. These moves, referred to as OHKO to increase its likelihood of winning, implying that players moves, have a low hit rate but will cause the opponent’s would want to use the moves, then we can be confident that Pokémon to faint if they do hit. The general opinion on the the rule set impacts play. If not, then our results are not forums is that this rule prevents strong players from losing conclusive but suggest that the rule may not have much due to randomness impact. The second set of clauses affects player behavior during the battles. The Sleep Clause prevents a player from directly causing more than one of their opponent’s Pokémon to fall 4.1. The Teams asleep at a time. If a move attempts to break this clause, the Reasonably designing the two teams and their movesets is game will automatically prevent the sleep from occurring. an important part of testing the rule, as we want the re- The Freeze Clause is the same as the Sleep Clause but it sults of the experiments to have implications on how skilled refers to the freeze status effect. The Endless Battle Clause players would optimally play. For example, if the introduc- prevents players from intentionally preventing their oppo- tion of OHKO moves to Team 1 yields large benefits against nent from winning without forfeiting. The Timer Clause Team 2, this would be irrelevant if the control movesets causes a player to automatically lose the battle if their player of Team 1 were already ineffective, such that we could see timer is exhausted. the same benefits by using other currently legal moves. In another case, if Team 2 is not a good representation of a 4. Experimental Design normal competitive team, then one’s ability to beat it is not meaningful. To show how ML can be used as a tool for testing rules for First, we acknowledge that 3v3 battles are uncommon Pokémon battles, we decided to designate one rule for exper- for competitive games. Community rule sets, tier lists, and imentation. Due to the seemingly independent reasoning recommended movesets are all generally made under the assumption that battles are 6v6. Our primary reasons for this change are to limit the number of variables affecting battle but we selected Alakazam for its slightly better offense, and improve the speed and performance of our applied ML. which would help compensate for the absence of Tauros Not every Pokémon is capable of using OHKO moves in the from that team. games, so a reduced team size helps raise the concentration of OHKO move usage while allowing us focus on just a few 4.2. The Moves of its users. Table 1 details all relevant information about the two When determining the moves for the control teams, we teams and their movesets. Following is a justification of this wanted to balance using the most common moves with configuration. making all Pokémon reasonably useful for the battle. All three Pokémon on the Normal team are usually seen with Table 1 Ice moves, which is redundant and counters Rhydon too Configurations of Pokémon on both teams well. Snorlax is the one most commonly seen without its Ice move, so we gave it Self-Destruct to account for other Team 1 Team 2 difficult situations. Slot 1 Slowbro Snorlax A Pokémon can only know four moves, so a move would Amnesia Body Slam have to be replaced from each Pokémon that would be given Slot 1 Surf Reflect Moves Thunder Wave Rest an OHKO move. The fixed damage nature of these moves Rest / Fissure Self-Destruct / Fissure makes it seem reasonable to replace a damaging move that Slot 2 Rhydon Tauros is usually only used situationally. For Snorlax, we replaced Earthquake Body Slam Self-Destruct. For Tauros, we replaced Earthquake, which Slot 2 RockSlide/HornDrill Hyper Beam is mostly just used against Gengar in OU. Moves Body Slam Blizzard We anticipated that Rhydon would have no reason to use Substitute Earthquake/HornDrill Rock Slide or Body Slam when it has Earthquake, so we just Slot 3 Alakazam Chansey replaced Rock Slide. Slowbro usually only has one damaging Psychic Thunderbolt move, so ideally we would replace Amnesia or Rest. Each Slot 3 Seismic Toss Ice Beam of these moves in the absence of the other is less useful, as Moves Thunder Wave Soft-Boiled they are effective in combination. In the end, we decided Recover Thunder Wave to sacrifice Rest because it leaves the user vulnerable and more predictable. In designing the teams, we noted that the three Pokémon, It may be important to note that neither team was given Snorlax, Tauros, and Chansey, can be see on almost every moves to inflict the sleep condition on enemies. Smogon 6v6 team in OU. The Type or Types of a Pokémon usually has always recognized sleep to be one of the most powerful cause them to have particular advantages or disadvantages mechanics given to some Pokémon, which is why they have against different Pokémon, but these three being of the a rule preventing a team from making two of their opponents Normal Type means that there are fewer moves that have be asleep at the same time. Our primary reason for not using type advantage against them. Similarly, their moves are less these moves was that the Pokémon we were considering likely to be highly effective or ineffective against other types. selecting could not use them. We avoided intentionally It happens that these three also have some of the highest giving the teams access to these moves because we feared stat totals, making them some of the strongest Pokémon in that they simply could not be balanced; there are not enough isolation. Pokémon on each team for players to not be devastated by Another reason for their high usage is their synergy in the reduction in options. We could also argue that both performing different roles for a team. Snorlax is one of teams have enough ability to inflict other status conditions, the bulkiest Pokémon in the game and has the damage to helping fill the void made by sleep’s exclusion. threaten any opponent. Tauros has both damage and speed, making it able to quickly finish off weakened opponents. Used effectively, it can force enemies to switch out, allowing 4.3. Software Used for a free hit on the new Pokémon. Chansey specializes in To simulate Pokémon battles, we used the pkmn engine, inflicting negative status conditions and can recover health a Pokémon battle simulation engine optimized for perfor- faster than a lot of Pokémon can deal damage. Snorlax mance in larger scale projects [15]. This open source tool and Chansey can also be seen using a variety of movesets accurately implements battles as done in the original game depending on what best compliments the rest of the team. code and the popular simulator Pokémon Showdown. Poké- For these reasons, we believed it would be interesting to mon Showdown is sponsored and endorsed by Smogon for see these three together as a complete team, where Snorlax competitive battles, as it provides practical implementations and Tauros can both be given OHKO moves. The only other of battles for all Pokémon generations and a practical inter- OHKO move users in OU are Rhydon and Slowbro, so we face for playing online. The pkmn engine currently only put them on the other team. Rhydon is advantageous for its fully supports the first generation of Pokémon, but it is able ability to absorb Normal moves, while Slowbro can inflict to run faster than Showdown while having less unneeded paralysis with Thunder Wave. Paralysis is critical to the overhead. team because OHKO moves will not work if the user has less We also used and credit another open source project, speed than the target. It also seemed interesting to make Wrapsire, which provides a C++ interface for the compiled use of Slowbro, since he is one of the least used and lowest library produced by the pkmn engine [16]. We opted to use rated Pokémon categorized into OU. C++ for this project because of its advantages for memory Because these two are particularly slow, their team would management, multi-threading, and compiler optimizations. benefit from having a fast Pokémon that can use also use Thunder Wave. There were several nominees for this role, 4.4. Monte Carlo Tree Search resistant to each commonly used move Type, basically fol- lowing a recipe for team building. Essentially, players can Pokémon is a stochastic turn-based game, where a turn is expect their opponents to have some kind of counter until initiated by both players selecting an action independently they have already defeated the counter, and it is not as im- but concurrently. In order to observe battles that are consis- portant to know what exactly the counter is. All this leads tent with the skill expected of competitive players, action to teams being rather predictable. selection must be mindful of what gives the highest chances Other design notes had a lesser effect on the results. Pri- of winning. To accomplish this, we decided to use Monte marily, since total health points and amounts of damage Carlo Tree Search (MCTS). dealt are rather inflated, we bucketed states together when We chose to use MCTS primarily because of its ability counting visits and related information, ignoring the bottom to handle uncertainty and large state spaces. MCTS has five bits of both Pokémon’s health points when identifying also been shown to be effective at simulating Pokémon bat- a state during a search. This allows for much quicker con- tles [13]. Pokémon uses a random seed as a factor in many vergence and is easily modifiable in our code. different calculations, most notably in standard damage cal- culation and secondary move effects. Because of this, a turn initiated from a given game state by given actions may have 4.5. Simulating Each Scenario hundreds of possible unique resulting game states depend- To observe the likelihood of Team 1 winning with reason- ing on the seed. This is not problematic for MCTS because able precision, we decided to simulate 500 battles for each it naturally weighs the value of an action based on the fre- scenario, using MCTS at each turn. Each search consisted quency that different states result from it, and these states of 100,000 iterations, simulating from the active game state. similarly develop better heuristics for move selection as the Because of the somewhat large depth and stochastic na- search continues. ture of Pokémon battles, we chose to limit the search depth The primary problem we faced with implementing MCTS to 25 turns from the active game state. Simulating past for Pokémon battles was with determining how to handle 25 turns of depth would likely be unnecessary since states concurrent action selection. Minimax trees are commonly would not get enough visits to do anything other than ran- used for turn-based games like chess, but these assume that dom rollouts, which only slowly progress to the end state players alternate turns and know what the enemy previ- and may favor certain movesets. After 25 turns, we termi- ously selected. This assumption contradicts the nature of nated the simulated game and considered the team with the competitive Pokémon battles, as it almost always involves larger sum of percentage healths the winner. When this players intentionally being unpredictable. This is due to a happened, we gave the winner a modified reward as if it natural rock-paper-scissors-like relationship among strate- was an average of 9 wins and 1 loss, which was to help gies. For example, recovery may counter gradual damage, encourage actual wins over this method. Implementing this while applying buffs or statuses may counter recovery, and caused it to take less time to complete each search. the right damaging move may yield an enemy’s buffs inef- We also encountered a problem with battles not ending fective or make them lose before they gain the longer-term when all Pokémon on a team were frozen. This was because benefits. the algorithm recognized that it could win on any future To address this issue, we decided to randomly determine turn for equal reward. Rather than reducing the reward for at every simulation of a turn who will select their action first, winning after more turns, something that could theoretically and who will pick based on that selected action. While this negatively influence the decisions made, we decided that does not perfectly represent the distribution of move com- it was safe to assume that a team with all their Pokémon binations used in competitive games, it effectively balances frozen will not win and let games be terminated at that the scenarios where a player either picks their safest action point. None of these Pokémon had moves to deal damage or correctly predicts that their opponent is picking their on turns they do not attack, such as Toxic and Leech Seed. safest action. This gives an equal advantage to each player, When running the battles, we collected a detailed log of while also producing more consistency in game outcomes, the actions selected at every turn and the values of relevant which is helpful for assuring the integrity of our results. It is variables, such as health and status conditions. We used also significantly less computationally expensive than devel- these to extract interesting information that may inform oping stochastic policies, which makes it easily applicable our discussion of how gameplay behavior changes when at all frequently visited states during a search. the external rules for team compositions change. There is an additional concern caused by simulating for- ward in a game, and that is that it causes you to know everything about the opponent’s team. Normally Pokémon 5. Results and Discussion is a partial information game, where one does not know anything about the opponent’s team until they switch to each Pokémon and use their moves. If the battles in this Table 2 experiment were done by people instead of ML agents, Team Summary of battles in each scenario 2 not knowing that Alakazam could switch to Rhydon could Who Has Team 1 Avg. OHKO KO’s end up disastrous for them if they told Snorlax to use Self- OHKO Moves Win % Turns Sel/Bat /Bat Destruct. Neither Team 37.5% 52.754 0 0 Representing partial observability not only makes the Team 1 41.9% 42.448 2.666 0.588 problem too complicated to get reasonable results but also Team 2 49.6% 49.426 1.15 0.100 may not change the results very much. Most Pokémon have Both Teams 60.6% 41.002 3.67 0.652 a single set of moves that is used identically in over half of games. Additionally, teams in 6v6 battles tend to inten- Observing the summary of the battles in Table 2, par- tionally cover all of their weaknesses by having a Pokémon ticularly the Team 1 Win % column, we can see notable Table 3 of just Team 1, we saw their win ratio go up from 37.5% to Pokémon summary when NEITHER has OHKO moves 41.9%. This suggests that we successfully found a use case Pokémon Move Given Rate Used KO’s/Battle for OHKO moves in competitive battles. Slowbro averaged T1-Slowbro Rest 2.29% N/A about 1 successfully hit Fissure per 2 battles, while Rhydon T1-Rhydon Rock Slide 9.10% N/A hit about 1 Horn Drill per 10 battles. Rhydon’s success rate T2-Snorlax Self-Destruct 12.36% N/A was not very impressive, but this Pokémon’s popularity T2-Tauros Earthquake 10.90% N/A within OU would probably be enough to cause Horn Drill to show up as an alternative to catch opponents off guard. Slowbro had the most interesting results with OHKO Table 4 moves. We replaced Rest, which Smogon quotes as being Pokémon summary when Team 1 has OHKO moves important for use with Amnesia. Slowbro probably would Pokémon Move Given Rate Used KO’s/Battle have used Amnesia and Rest a lot more if it had been starting T1-Slowbro Fissure 21.61% 0.482 against a special attacker instead of the physical attacker T1-Rhydon Horn Drill 13.56% 0.106 Snorlax. Against a different team with more special attack- T2-Snorlax Self-Destruct 16.11% N/A ers, Slowbro might have done better with the recommended T2-Tauros Earthquake 10.34% N/A moveset. On the other hand, Slowbro’s increase in ability to use alternate movesets would make it more useful in differ- Table 5 ent scenarios, which could raise its usage rate. Opponents Pokémon summary when Team 2 has OHKO moves would not know whether to expect the standard Slowbro or Pokémon Move Given Rate Used KO’s/Battle one with Fissure, causing them to have trouble switching T1-Slowbro Rest 1.86% N/A appropriately. T1-Rhydon Rock Slide 6.89% N/A The scenario where both teams had OHKO moves is not as T2-Snorlax Fissure 4.00% 0.056 interesting because we already showed a disadvantage with T2-Tauros Horn Drill 15.22% 0.044 the movesets for Team 2. However, it offers a little insight to how some things change with different circumstances. Most notably, the change in win rate was significantly greater Table 6 when both changes were present together. Without going Pokémon summary when BOTH have OHKO moves into too much detail, a strong hypothesis could be that the Pokémon Move Given Rate Used KO’s/Battle absence of Self-Destruct was more devastating for Team T1-Slowbro Fissure 11.98% 0.424 2 when it had a Fissure-using Slowbro to deal with. Slow- T1-Rhydon Horn Drill 8.43% 0.096 bro used over twice as many total moves in battles where T2-Snorlax Fissure 4.98% 0.100 Snorlax did not have Self-Destruct. T2-Tauros Horn Drill 14.14% 0.032 Aside from the four Pokémon already in OU that can use OHKO moves, this rule effects some Pokémon that are just below the cutoff for this tier. Dragonite may be the changes in the outcome of games based on the change of most notable of these. In OU, Dragonite is the holder of movesets. In the control battles where neither team was the highest base stat total but lacks the moves to back it given OHKO moves, we saw that the three popular Normal up. Dragonite is relatively fast and can use Thunder Wave. types, Team 2, won 5/8 of their games. We expected that Perhaps the only thing stopping it from becoming one of this team would have an advantage, since they have high the strongest Pokémon with Horn Drill is its weakness to stats, few weaknesses, and seemingly good synergy. Ice. Ice moves are very useful and can be known by most When we tried replacing moves from the already advan- Pokémon in OU. taged Team 2 with OHKO moves, they lost some of their Lapras is also a strong candidate for OU viability with advantage. This suggests that the moves we replaced were Horn Drill, as it can put Pokémon to sleep with Sing, is more useful in this battle than the OHKO moves were. In very bulky, and outspeeds other bulky Pokémon. The rule fact, they collectively managed only 1 successful OHKO per limiting a team to putting just one opponent to sleep at once 10 battles. Taking away Tauros’s Earthquake likely was not currently stops Lapras from being viable in OU, as usually a big deal since it still had Body Slam, but we suspect that the having one sleeper is seen as ideal. With the addition of main reason for the change in performance is Self-Destruct, Horn Drill, Lapras might be a better replacement for another which knocks out the user to deal massive damage. sleeper, especially with Blizzard to take down Rhydon pretty As a quick side note, Self-Destruct caused ties in 4 of the consistently. Smogon recommends using 3 damaging moves 2,000 battles by knocking out the final two Pokémon. Each for Lapras, even though it is not very offensive; one of these of these was treated as half of a win when calculating Team is likely easily replaceable by Horn Drill. 1’s win ratio. Win rates aside, the battles can be seen changing in terms Snorlax’s Self-Destruct can knock out Alakazam in one of the experience for players. Earlier we mentioned that the hit or take out 3/4 of Slowbro’s health. 220 of the 482 times OHKO rule may exist to keep games interesting. One thing Snorlax fainted in the control scenario were from using this that can be noted from Table 2 is that, as the number of teams move. This indicates that the move was worth having and with OHKO moves increased, the turn counts decreased. using against this team. On the other hand, Self-Destruct When Team 1 had the moves and performed better, the is more risky in a 6v6 battle because the opponent is more games were about 4/5 as long, even though the win ratio likely to be able to switch to a Pokémon like Gengar that reflected that it was a more even match. can absorb the blow and leave Snorlax knocked out. Fissure Game length does not necessarily translate to more or did not help Snorlax here, but it might find more use with less fun for the players, but this helps show how the game the right moveset and team. had more of a twist toward randomly rewarding a team with When we tried putting OHKO moves into the movesets a large advantage that would cause it to end more decisively. A 30-70 gamble with a high reward likely does this. It makes paper has applications to games outside of Pokémon. Any sense that skilled players would appreciate their win ratio competitive game that can be simulated could very easily having higher correlation to that skill they put a lot of time make use of the approaches that we use in this paper. We into mastering, so this seems like a strong potential motive hope this encourages further research into how artificial for the rule. intelligence and machine learning can be applied in com- We also show in Table 2 the average number of OHKO petitive games and metagames. moves used by any Pokémon per battle in the column OHKO Sel/Bat. This helps us gain a sense of whether Pokémon are overutilizing OHKO moves with respect to other moves. If References a Pokémon is disproportionately using OHKO moves, this [1] S. Reis, R. Novais, L. P. Reis, N. Lau, An adversarial could lead to the opposing player becoming annoyed or frus- approach for automated pokémon team building and trated with the battle since OHKO moves generally involve meta-game balance, IEEE Transactions on Games luck to succeed. Three OHKO moves being selected in a (2023). battle lasting 41 turns is not likely to be seen as overusage [2] S. P. R. A. Reis, Artificial intelligence methods for au- or annoying. It may be important to acknowledge that this tomated difficulty and power balance in games (2024). battle used all four Pokémon in the current OU tier that [3] D. Crane, Z. Holmes, T. T. Kosiara, M. Nickels, are able to use OHKO moves, so this can be seen as an esti- M. Spradling, Team counter-selection games, in: 2021 mated upper bound on OHKO move usage. Given this, the IEEE Conference on Games (CoG), IEEE, 2021, pp. 1–8. luck-based nature of these moves seems less concerning. [4] R. Ferdous, F. Kifetew, D. Prandi, I. Prasetya, Under these almost ideal conditions for OHKO move us- S. Shirzadehhajimahmood, A. Susi, Search-based auto- age, we observed this strategy not posing a notable problem mated play testing of computer games: A model-based to player experience. OHKO moves can be observed usu- approach, in: International Symposium on Search ally requiring setup by using Thunder Wave or Body Slam, Based Software Engineering, Springer, 2021, pp. 56– which shows that the strategy still requires a degree of skill 71. and team-building decisions to make effective. Furthermore, [5] P. L. P. de Woillemont, R. Labory, V. Corruble, Auto- we can see that allowing these moves may give more benefit mated play-testing through rl based human-like play- to less commonly used Pokémon, and we can anticipate that styles generation, in: Proceedings of the AAAI Confer- some Pokémon that are not considered strong enough for ence on Artificial Intelligence and Interactive Digital this battle format may become more relevant. Entertainment, volume 18, 2022, pp. 146–154. The results of this experiment do not show a strong justi- [6] S. Stahlke, A. Nova, P. Mirza-Babaei, Artificial players fication for the rule banning the use of OHKO moves, but in the design process: Developing an automated test- that is not to say that an alternate problematic case does ing tool for game level and world design, in: Proceed- not exist. Of course, it must be acknowledged that this is ings of the Annual Symposium on Computer-Human a limited case study designed to show how we can collect Interaction in Play, 2020, pp. 267–280. and analyze data to test the concerns associated with a rule. [7] S. Stahlke, A. Nova, P. Mirza-Babaei, Artificial play- Should the rule be seriously put into question by the Smogon fulness: A tool for automated agent-based playtesting, community, it would be wise to observe more cases than in: Extended Abstracts of the 2019 CHI Conference on this and in a less anecdotal way. Human Factors in Computing Systems, 2019, pp. 1–6. [8] C. Holmgård, M. C. Green, A. Liapis, J. Togelius, Auto- 6. Conclusion mated playtesting with procedural personas through mcts with evolved heuristics, IEEE Transactions on Additional rule sets are often used in competitive games Games 11 (2018) 352–362. to create a healthy competitive metagame. This ensures [9] B. Horn, J. Miller, G. Smith, S. Cooper, A monte carlo that all participants are playing on a level field. These rule approach to skill-based automated playtesting, in: sets, however, are often designed using intuition and post- Proceedings of the AAAI Conference on Artificial In- hoc usage statistics/analysis. In this paper, we explore how telligence and Interactive Digital Entertainment, vol- artificial intelligence techniques can be used to empirically ume 14, 2018, pp. 166–172. evaluate competitive rule sets in the game Pokémon. [10] D. Simoes, S. Reis, N. Lau, L. P. Reis, Competitive deep To do this, we use Monte-Carlo tree search to simulate reinforcement learning over a Pokémon battling simu- 3v3 battles using a relaxed version of the Smogon gener- lator, in: 2020 IEEE International Conference on Au- ation 1 rule set, allowing Pokémon to use OHKO moves. tonomous Robot Systems and Competitions (ICARSC), Results show that OHKO moves can have a visible effect IEEE, 2020, pp. 40–45. on how the battles play out, but that effect is nuanced in [11] G. Rodriguez, E. Villanueva, J. Baldeón, Enhancing many ways. Our results indicate that while OHKO are used pokémon vgc player performance: Intelligent agents in battle when they’re available, they likely are not used through deep reinforcement learning and neuroevo- enough to cause annoyance. This, however, is not definitive, lution, in: International Conference on Human- and more work must be done before any conclusions can be Computer Interaction, Springer, 2024, pp. 275–294. drawn. Overall, this approach gives designers critical con- [12] D. Huang, S. Lee, A self-play policy optimization ap- text on how rules affect the potential metagame surrounding proach to battling pokémon, in: 2019 IEEE conference competitive play. on games (CoG), IEEE, 2019, pp. 1–4. Overall, we feel that these preliminary results are promis- [13] H. Ihara, S. Imai, S. Oyama, M. Kurihara, Implemen- ing and provide evidence that further work on utilizing tation and evaluation of information set monte carlo automated playtesting techniques to evaluate competitive tree search for pokémon, in: 2018 IEEE international rulesets has merit. In addition, the approach we use in this conference on systems, man, and cybernetics (SMC), IEEE, 2018, pp. 2182–2187. [14] J. Wang, Winning at Pokémon Random Battles Using Reinforcement Learning, Ph.D. thesis, Massachusetts Institute of Technology, 2024. [15] K. Scheibelhut, libpkmn, 2021-24. URL: https://github. com/pkmn/engine. [16] pasyg, wrapsire, 2023-24. URL: https://github.com/ pasyg/wrapsire.