Liquid Snake: a test environment for video game testing agents Pablo Gutiérrez-Sánchez1,∗, Marco A. Gómez-Martín1,∗, Pedro A. González-Calero1,∗, Pedro P. Gómez-Martín1,∗ 1 Complutense University of Madrid, Madrid, Spain Abstract In recent years, a number of benchmarks and test environments have been proposed for research on AI algorithms that have made it possible to evaluate and accelerate development in this field. There exists, however, an absence of environments in which to evaluate the feasibility of such algorithms in the context of games intended for continuous development, in particular in regression testing and automatic error detection tasks in commercial video games. In this paper we propose a new test-bed - Liquid Snake: a 3D third-person stealth game prototype, designed to conveniently integrate autonomous agent-driven quality control mechanisms into the development life cycle of a video game, based on the open source ML-Agents library in Unity3D. Focusing on the problem of regression testing on the potential unexpected changes induced in a game by altering the AI of enemies, we argue that this environment lends itself to be used as a sample test environment for automated QA methodologies thanks to the complexity and variety in the behaviors of NPCs naturally present in stealth titles. Keywords Benchmark, QA, regression testing 1. Introduction In the continuous development of a commercial title, it is common to find numerous enemies and NPCs being reused in different sections of the game that evolve over time. In this context, the evolution of the AI controlling these agents can end up inducing non-negligible modifications in the gameplay of a level, contrary to the intentions of the original designs. These changes are not always straightforward to detect: on the one hand development teams often do not have the resources to perform sufficiently exhaustive testing tasks, and on the other hand these modifications do not necessarily have to “break” sections of the game, simply altering the user experience in a more or less subtle way and thus may go unnoticed by testers with extensive experience in the game. The tests involved in the task of determining whether a feature or a design that was correct in the past continues to work properly after some progress in the development of the project I Congreso Español de Videojuegos, December 1–2, 2022, Madrid, Spain ∗ Corresponding author. Envelope-Open pabgut02@ucm.es (P. Gutiérrez-Sánchez); marcoa@fdi.ucm.es (M. A. Gómez-Martín); pagoncal@ucm.es (P. A. González-Calero); pedrop@fdi.ucm.es (P. P. Gómez-Martín) Orcid 0000-0002-6702-5726 (P. Gutiérrez-Sánchez); 0000-0002-5186-1164 (M. A. Gómez-Martín); 0000-0002-9151-5573 (P. A. González-Calero); 0000-0002-3855-7344 (P. P. Gómez-Martín) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) are known as regression tests, and their cost of execution grows dramatically as the volume of the code base and elements included in the game increases, as they essentially entail the recurring repetition of previously executed test batteries or checks. These tests may raise functional questions (“does the enemy continue to approach the player when it encounters them at a distance below a certain threshold?”), or more abstract questions linked to the game design (“is it possible to complete the level in less than 10 minutes and without losing health points?” or “is it possible to traverse the level while picking up all the collectibles and without being detected by enemies in the process?”, to name a few examples). While the first types of questions can often be addressed through the use of tools such as unit tests, for which most commercial engines offer good support, this is typically not the case for the second type of problems, which generally require humans playing the game repeatedly trying to figure out an answer to the question posed by the test. Since the latter poses both an economic and logistical bottleneck, different strategies have been proposed in recent years in an attempt to automate these checks and alleviate the burden they can place on a QA team. These include ideas such as replaying game traces recorded by human players during testing [1] or training autonomous agents based on AIs capable of interacting with the game in specific ways, each of them being more or less feasible to implement in a real development context [2, 3, 4]. While it is true that nowadays there are numerous standardized test-beds and benchmarks for machine learning techniques that are intended to act as “collective challenges” in the community to guide research efforts towards solving specific problems and to serve as environments for testing new algorithms, the same cannot be said for applications of such techniques to development cycles within commercial studios. Regression testing techniques described in the literature are often obtuse or inaccessible from the perspective of development teams, with proprietary use examples or technologies that are difficult to transfer to new environments. It is therefore relevant, from our point of view, to propose a test-bed oriented to serve as a testing framework not only for machine learning algorithms, but also for their applications on quality control strategies in commercial games and the development of support tools for automated testing. Having a common environment also enables a shared vocabulary in the community: for instance, it becomes possible to compare the results of two ways of approaching a regression testing problem on specific and shareable modifications (such as “does my method detect that it is more difficult to evade level 2 enemies after altering a node of its behavior tree?”). On the other hand, from the developers’ point of view, having a simple reference environment where they can try out automatic testing mechanisms allows them to experiment with different strategies in a smaller external project and validate them before taking the step of integrating them into their own games. At the same time, this test-bed can be used as an architectural reference for those teams that are considering undertaking automatic testing but are held back by the complexities and technical unknowns associated with the problem. With this, in this paper we present Liquid Snake - a third-person 3D prototype belonging to the stealth genre and developed in Unity3D intended to act as a common test-bed for regression testing and automatic quality control, as well as an architectural reference for the integration of such methods in commercial projects. The rest of the paper is structured as follows. Section 2 discusses related work of interest in this field. Section 3 introduces Liquid Snake along with the most relevant high-level dynamics within the game to provide a general understanding of the prototype. In turn, section 4 details the technical and architectural features that we argue make the project suitable for the uses described above, ending in section 6 with conclusions and future work. 2. Related work In view of the problem described above, in recent years a number of strategies have been proposed to implement automatic testing methods, typically making use of autonomous control agents capable of interacting with a level over a large number of simulations while collecting metrics that must fall within specific ranges to consider that the user experience has not been altered [5, 6]. This same principle has been used in platforms and engines such as Unity3D to introduce tools to support the design and balancing of video games [7]. To create these control agents, one of the most straightforward alternatives is simply to make use of segments recorded manually by a human performing the specified tasks, replaying them every so often over the original environment to check that the player’s trace is still able to complete the set objective [8]. However, when the structure of the environment is modified, or the environment includes random elements that do not remain constant between runs, these strategies are no longer valid, motivating the need to create agents with a certain capacity to adapt to changes in their surroundings. This is where methods based on AI-based strategies come into play, offering more reactive and adaptive policies to changing environments. Some examples of these techniques for automatic testing in video games adopt the use of machine learning algorithms based on Deep Reinforcement Learning (DRL) [9, 10], Imitation Learning (IL) [11], or even hybrid models of the previous strategies with control structures such as behavior trees (BTs) [5]. Nonetheless, the reality of the matter is that at present the implementation of these method- ologies in commercial developments suffers from several integration problems that discourage developers from employing them in their projects. First, the generation of autonomous agents capable of naturally playing a given game is complex and requires non-trivial knowledge in the field of machine learning to be implemented. This is cushioned to some extent by new tools such as ML Agents [12] in Unity3D or MindMaker [13] in Unreal Engine, which allow training policies in a more developer-friendly way, but these are mostly aimed at research or small-scale proof-of-concept creation. Secondly, to our knowledge there is a dearth of accessible benchmarks and environments that serve as showcases and examples for the use of automatic testing techniques in games intended for continuous development. Although there is a wide variety of benchmarks and environments designed to test different types of machine learning algorithms, such as DeepMind Lab [14], MineRL [15], OpenAI Gym [16] or Starcraft II Learning Environment [17], to name a few, all of them stem from a closed game in which the aim is to find a policy capable of fulfilling a certain fixed objective (or maximizing a given performance metric), as opposed to the problem that concerns us: starting from a game in open development and using agents to perform regression tests as it is modified. In this paper we present a prototype stealth game implemented in Unity3D - Liquid Snake, as a contribution in progress to this field, intended to serve as an open source testing envi- Figure 1: Height system in Liquid Snake. Captures from player in crawling (A) and crouching (B) states. Figure 2: Sample interaction scenarios in Liquid Snake: picking up a collectible item (A), stealthily attacking an enemy from their back (B) and jumping over a log (C). ronment that comes equipped with the necessary tools to experiment with automatic testing methodologies on a project under development. 3. Liquid Snake Liquid Snake is a game with mechanics inspired mostly by the stealth genre: in it, the player must navigate through different 3D rooms in an attempt to find the exit, while trying to avoid the enemies that patrol the area in search of intruders and collecting as much loot as possible as they move through the environment. The player has a finite number of health points that are reduced every time they are hit by an enemy projectile, being defeated when these are reduced to 0. Players may choose to walk, run, crouch, or crawl to navigate the room, thus modifying the height at which enemies perceive them, their movement speed, and the noise they produce (which influences the enemies’ ability to detect them). The game considers two different heights for the player, as shown in figure 1. When deciding to walk or run, the player maintains the default height H2, the only difference between the two states being the speed of the character’s movement and the noise produced (running being the faster but louder action). In the crouching and crawling states, the player adopts a height H1, which allows them to conceal themselves behind low obstacles such as the trunks in the figure so as to go unnoticed by enemies operating at a height H2. As with the walking-running pair, the only difference between the crouching and crawling states is the speed of the player’s movement and the noise generated, with the crawling state being the slower but stealthier of the two. The game also features interaction actions with objects in the environment and shooting Figure 3: Sample level 1: disjoint enemy patrol paths separated by columns. Figure 4: Sample level 2: clockwise overlapping enemy patrol paths. with limited ammo to deal with certain vulnerable enemies. Some of the available interactions in the sample environments can be found in figure 2. This includes scenarios such as picking up a collectible item (figure 2-A), stealthily eliminating an enemy by interacting with it from its back before being detected (figure 2-B), or jumping over a low obstacle leaving the player positioned on the opposite side of the object (figure 2-C). In all cases, the player must approach the element with which they wish to interact and press an interaction key in order to execute the corresponding action. If there is more than one object with which an interaction can be performed, the one closest to the character at that time is selected. The current project also includes some preconfigured levels that can be used as a reference for testing. All of them include at least one enemy with a predefined patrolling route and a number of collectibles hidden in various less accessible areas of the level, as well as other interactive elements such as jumpable logs. Some examples can be found in figures 3 and 4. The decision to take a stealth game as a starting point for this test-bed is based on the fact that these environments are particularly appealing due to the presence of a number of mechanics that are very prevalent in a wide variety of genres and games and the presence of NPCs whose behaviors significantly determine the gameplay of the level. Indeed, in a game of this type it is common to design enemies in very specific ways to motivate certain strategies on the part of the player, but also to increase the complexity of the AI so as to adapt it to new design needs, which is why we believe them to be an ideal candidate for regression testing. The source code for the project, along with some usage examples, may be found in [18]. 4. Testing environment features In this section we will summarize the most relevant features of the proposed testing environment: its native integration with the ML Agents library in Unity 3D, the chosen architecture for the input and actions schemes to conveniently switch between control mechanisms, and the use of Behavior Trees to support growing enemy policies. 4.1. Natural integration with ML Agents Liquid Snake is natively integrated with the ML Agents library for the training of autonomous agents that act as automatic testers and, as will be described later, allows to switch smoothly between manual control of the character and control by means of agents already trained or in training. Additionally, each level of the project is wrapped by a L e v e l M a n a g e r to orchestrate the initialization and reboot processes of the environment between training episodes. To coordinate the resetting of the components within the environment, an I R e s e t t e a b l e interface is provided with a single R e s e t method, to be implemented by whichever level constituents require a return to their initial configuration at the beginning of a new training episode, or when respawning the player after dying. The L e v e l M a n a g e r is responsible for subscribing to all events in the environment that may result in a level reset (player death, time limit exceeded, etc) and maintains references to all elements adhering to the I R e s e t t e a b l e interface to call their corresponding R e s e t methods each time one of these events is triggered in-game. This way, it is sufficient to implement this interface to ensure that a new level element is restarted when necessary, without the need to establish an explicit dependency between components, thereby facilitating the scalability of the project. A C h r a c t e r E v e n t s component is also provided for global notification of the different events of interest involving the character, which is particularly useful when linking occurrences with training rewards (for instance, granting a positive reward after the event of reaching the room exit) or with logistical operations on the scene (such as restarting the level from the L e v e l M a n a g e r after the event of character death). In practice, all the events triggered by the various components that define the character’s behavior (health, interactors, movement managers, etc.) end up being intercepted by the C h a r a c t e r E v e n t s component, which lifts and unifies them to provide a single access point from the outside to the catalog of events exposed by the character. As a result, it is only necessary to subscribe to the events of this component rather than having to access each component of interest individually. Once a trained model is available, it is possible to perform simulations on the level of interest while collecting different customizable metrics related to the performance of the agents. At the moment, the project provides a set of Unity 3D play mode tests that can be used as a guideline to load scenes automatically, instantiate a standalone controller, associate it to the character object and then run the corresponding scene for a fixed number of times collecting event-driven Figure 5: Control scheme in Liquid Snake. metrics. This allows to gather simulation metrics from a fair number of levels in a convenient way, avoiding manual switches and executions. One limitation at the moment, however, is that no historical record is kept of the metrics collected in the tests, which makes it necessary to keep an external storage in which to place and analyze them retrospectively. 4.2. Decoupled input-actions scheme One of the main bottlenecks when integrating a machine learning model with a test environment is the difficulty to conveniently switch between control schemes. The most classic examples of this situation are found in the need to alternate between manipulating the character manually, using a policy trained by reinforcement learning, executing traces prerecorded by human demonstrators, or applying hand-scripted routines to specify desired behaviors. With this in mind, we consider it essential to start from a model in which the command modes mentioned above are kept strictly decoupled from the actual control logic of the character, by means of mechanisms natural to a programmer familiar with the engine. The proposed scheme is summarized in figure 5, for the case in which it is intended to alternate between manual control and control through an ML Agents library agent (either via heuristics, through a trained model or a model undergoing training). In the previous scheme, there exists an abstraction layer that is always present within the character and acts as an interaction API with its set of supported actions, given by a component C h a r a c t e r A c t i o n s C o n t r o l l e r attached to the object to be handled. On top of this layer it is possible to place specific controllers that make direct use of this component to manipulate the character (M a n u a l C h a r a c t e r C o n t r o l l e r and M L C h a r a c t e r C o n t r o l l e r in figure 5). For those cases where one wishes to implement a controller that requires human input, such as in conventional controls or in the heuristic mode of ML Agents, we introduce an I n p u t H a n d l e r object in the scene, responsible for transforming human input registered in Unity’s input system into methods specific to any controller that implements the I C h a r a c t e r C o n t r o l l e r interface (methods such as “Move”, “Shoot” or “Interact”, which are more manageable and understandable than raw input). It is worth mentioning here that the heuristic mode differs from traditional control in that in the former case the agent must receive the commands represented following the encoding used for the RL model, in an array of actions. This enables the collection of human demonstrations that can be later used to train learning-by-demonstration models or to reproduce historical traces. In this way, we are able to run a single input system, but operate it in different ways depending on our needs (for instance, one would not expect to use heuristic control as the main control in a commercial game). To implement and deploy a character controller, we opt for a separation between the character actor object itself (a game object in the scene that exposes a C h a r a c t e r A c t i o n s C o n t r o l l e r ) and the controller itself, which is nothing more than an empty object in the Unity scene with a component that implements the I C h a r a c t e r C o n t r o l l e r interface and is provided with a reference to the object that represents the character. In this way the character object acts as a “pawn”, and the controller can be modified as desired without further restructuring references from other objects in the scene to the player. This is convenient because in a game of these characteristics it is very common to have a multitude of elements that reference the player in a fairly direct way, being the enemies perhaps the most typical case, as they need to keep a reference to the player in order to know what to chase and attack, and to gather information from the player at run-time. If the controller were present in the character object itself, this would imply that to change the scheme it would be necessary either to replace the object in use with another one with different control components, or to allow the coexistence of numerous controllers in the same object and implement some kind of manager dedicated to enable and disable them as required, with the corresponding clutter and logistical complication that this may cause. Additionally, when using a controller derived from the A g e n t class of ML Agents, it is common to introduce different sensor components associated to the agent to enable more sophisticated forms of perception such as visual observations from a camera, or spatial observations by means of sensors based on raycasts around the object or grids centered on the agent to detect the relative position of nearby elements in the scene. These sensors must be attached as components to the object that hosts the controller and will always dispatch their observations without the ability to disable them. Since it is usually desirable to experiment with various configurations of sensors and parameters, it seems natural to keep each controller as an independent object that can be easily replaced whenever one wishes to enforce a new perception system. One thing to note here is that if one intends to use a sensor centered on the player’s object, then it becomes necessary to ensure that the position of the controller matches that of the character, which is typically rather straightforward, but important to note nonetheless. 4.3. Behavior trees for rich NPCs As we mentioned before, in the development of commercial games with enemies and other NPCs, it is common to start with a relatively simple AI that meets the design needs of the initial levels and then evolve and expand it to adapt to new requirements as the project progresses. These AIs are usually coded using behavior trees (BTs) or state machines, which is why in Liquid Snake we include the Behavior Bricks library [19] for specifying the behavior of enemies in the form of BTs. From this library it is possible to design arbitrarily complex flows that can be expanded during development. The current project features a suite of pre-designed behavior trees to define the flows of several enemies present in the example levels that may be used as a reference for changes or expansions. Included here are nodes encoding common conditions and checks such as “is target in sight?” or actions such as “advance to the next patrol point” or “shoot at target”. 5. Testing process Following the descriptions given in the previous sections, the process of creating a battery of tests on one of the project’s environments would be as follows: 1. Set up a level on which one wishes to perform a regression test as a scene within the game. Currently the levels in figures 3 and 4 are provided as testing samples (for which pre-trained controllers and Play Mode tests that make use of them are provided). 2. Configure a controller that is able to automatically perform the desired behavior at the created level. This can be done in a number of ways, either with a manually programmed AI, with traces of human inputs or, as in the case of the examples in the project, by training an agent through the ML-Agents library to take control of the player. As for the behavior to be generated, this need not be limited to successfully completing the level, but can specify more precise tasks, such as defeating a particular enemy or reaching a sequence of points in order. In the example cases, the goal of the level is always to reach an escape point before dying or running out of time, with a reward function that penalizes enemy deaths and damage, and awards positive stimuli for approaching the goal, reaching an exit point, or retrieving a collectible item. The implementation details for these controllers (observations, reward functions, configuration of the underlying neural network, etc) can be consulted in the project repository, where a quick start guide explaining a simple training process is also included. 3. Write a Play Mode test in Unity’s Test Runner that instantiates the implemented controller on the scene to be tested and binds it to the player object in the environment. In general, a test runs the controller simulation over the scene for a set number of times (which should be high enough to be in proportion to the variability of the level in order to collect as many occurrences as possible) and collects execution metrics configured by the test designer, such as number of objects retrieved or number of times that the player is detected by the enemy. These metrics can make use of the player’s C h a r a c t e r E v e n t s component to be constructed as certain standardized events are received. In the example cases, damage to the player and goal reached events are used to compute the metrics per episode of health remaining at the end of the level and time taken to escape from the room. The test executor is also responsible for compiling the execution metrics and dumping them into a log file, thus enabling subsequent analysis. 4. After applying a structural change on the scene, re-run the previous test to obtain a new set of simulation metrics with the same agent. At this point it is possible to perform a comparative analysis of the distributions of the metrics (applying, for example, a non- parametric test such as Mann-Whitney to contrast the equality of pre- and post-change distributions) or, if preferred, simply check if they are within acceptable limits given by the design team. The use of a Test Runner such as the one integrated in Unity allows to configure and launch large numbers of tests on an arbitrary number of scenes in an automated way, so that it is possible to repeat the capture of metrics at the end of a development day as a first verification mechanism to check whether the metrics are maintained at appropriate values. 6. Conclusions and future Work In this paper we present a test environment for automatic regression testing in Unity 3D as a contribution to the field of quality control in commercial video games, arguing that it can be used both as a reference for the integration of automatic testing methodologies in new projects, and to evaluate new testing algorithms in a controlled and prepared environment before deciding to proceed to incorporate them into a proprietary project. In particular, we conclude that the architecture proposed in this paper offers a number of features that make it particularly suitable for automated testing, such as its native integration with machine learning libraries such as ML-Agents, a decoupled scheme of character controllers that allows seamless switching between input and control mechanisms, and reference scenarios that exemplify its use in Unity’s Play Mode Tests for the automated execution of multiple batteries of simulations. The test-bed described in this article is under active development, and we aim to incorporate new features that make it as easy as possible to perform tests on it, as well as to develop thoroughly documented application examples of automatic testing techniques such as those described in [5] as a showcase of the environment. In the short term, the highest priority is to improve the tools to conveniently configure and perform the automatic execution of a large number of tests with autonomous agents, as well as to collect the execution results of such tests in a structured and manageable way. This will enable a good degree of control over the evolution of the set of game sections avoiding the tedium of launching each regression test manually. 7. Acknowledgments This work was supported by the Ministry of Science and Innovation (PID2021-123368OB-I00). References [1] M. Ostrowski, S. Aroudj, Automated Regression Testing within Video Game Development, GSTF Journal on Computing (JoC) 3 (2013) 10. URL: http://www.globalsciencejournals. com/article/10.7603/s40601-013-0010-4. doi:1 0 . 7 6 0 3 / s 4 0 6 0 1 - 0 1 3 - 0 0 1 0 - 4 . [2] J. Pfau, J. D. Smeddinck, R. Malaka, Automated Game Testing with ICARUS: Intelligent Completion of Adventure Riddles via Unsupervised Solving, in: Extended Abstracts Publication of the Annual Symposium on Computer-Human Interaction in Play, ACM, Amsterdam The Netherlands, 2017, pp. 153–164. URL: https://dl.acm.org/doi/10.1145/ 3130859.3131439. doi:1 0 . 1 1 4 5 / 3 1 3 0 8 5 9 . 3 1 3 1 4 3 9 . [3] S. Ariyurek, A. Betin-Can, E. Surer, Automated Video Game Testing Using Synthetic and Humanlike Agents, IEEE Transactions on Games 13 (2021) 50–67. doi:1 0 . 1 1 0 9 / T G . 2 0 1 9 . 2947597. [4] J. Bergdahl, C. Gordillo, K. Tollmar, L. Gisslén, Augmenting Automated Game Testing with Deep Reinforcement Learning, in: 2020 IEEE Conference on Games (CoG), 2020, pp. 600–603. doi:1 0 . 1 1 0 9 / C o G 4 7 3 5 6 . 2 0 2 0 . 9 2 3 1 5 5 2 , iSSN: 2325-4289. [5] P. Gutiérrez-Sánchez, M. A. Gómez-Martín, P. A. González-Calero, P. P. Gómez-Martín, Reinforcement Learning Methods to Evaluate the Impact of AI Changes in Game Design, Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 17 (2021) 10–17. URL: https://ojs.aaai.org/index.php/AIIDE/article/view/ 18885. [6] P. Gutiérrez-Sánchez, M. A. Gómez-Martın, P. A. González-Calero, P. P. Gómez-Martın, A proposal for combining reinforcement learning and behavior trees for regression testing over gameplay metrics, VII Congreso de la Sociedad Española para las Ciencias del Videojuego 2021 (2021). URL: http://ceur-ws.org/Vol-3082/paper13.pdf. [7] Optimize your game balance with Unity Game Simulation, 2020. URL: https://blog.unity. com/technology/optimize-your-game-balance-with-unity-game-simulation. [8] M. Ostrowski, S. Aroudj, Automated Regression Testing within Video Game Devel- opment, GSTF Journal on Computing (JoC) 3 (2013) 10. URL: https://doi.org/10.7603/ s40601-013-0010-4. doi:1 0 . 7 6 0 3 / s 4 0 6 0 1 - 0 1 3 - 0 0 1 0 - 4 . [9] J. Bergdahl, C. Gordillo, K. Tollmar, L. Gisslen, Augmenting Automated Game Testing with Deep Reinforcement Learning, in: 2020 IEEE Conference on Games (CoG), IEEE, Osaka, Japan, 2020, pp. 600–603. URL: https://ieeexplore.ieee.org/document/9231552/. doi:1 0 . 1 1 0 9 / C o G 4 7 3 5 6 . 2 0 2 0 . 9 2 3 1 5 5 2 . [10] J. Pfau, J. D. Smeddinck, R. Malaka, Automated Game Testing with ICARUS: Intelligent Completion of Adventure Riddles via Unsupervised Solving, in: Extended Abstracts Publication of the Annual Symposium on Computer-Human Interaction in Play, ACM, Amsterdam The Netherlands, 2017, pp. 153–164. URL: https://dl.acm.org/doi/10.1145/ 3130859.3131439. doi:1 0 . 1 1 4 5 / 3 1 3 0 8 5 9 . 3 1 3 1 4 3 9 . [11] S. Ariyurek, A. Betin-Can, E. Surer, Automated Video Game Testing Using Synthetic and Humanlike Agents, IEEE Transactions on Games 13 (2021) 50–67. URL: https://ieeexplore. ieee.org/document/8869824/. doi:1 0 . 1 1 0 9 / T G . 2 0 1 9 . 2 9 4 7 5 9 7 . [12] A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, D. Lange, Unity: A General Platform for Intelligent Agents, arXiv:1809.02627 [cs, stat] (2020). URL: http://arxiv.org/abs/1809.02627, arXiv: 1809.02627. [13] A. Krumins, Mind maker, 2020. URL: https://github.com/krumiaa/MindMaker. [14] C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés, A. Sadik, J. Schrittwieser, K. Anderson, S. York, M. Cant, A. Cain, A. Bolton, S. Gaffney, H. King, D. Hassabis, S. Legg, S. Petersen, DeepMind Lab, 2016. URL: http://arxiv.org/abs/1612.03801. doi:1 0 . 4 8 5 5 0 / a r X i v . 1 6 1 2 . 0 3 8 0 1 , arXiv:1612.03801 [cs]. [15] A. Kanervisto, S. Milani, K. Ramanauskas, N. Topin, Z. Lin, J. Li, J. Shi, D. Ye, Q. Fu, W. Yang, W. Hong, Z. Huang, H. Chen, G. Zeng, Y. Lin, V. Micheli, E. Alonso, F. Fleuret, A. Nikulin, Y. Belousov, O. Svidchenko, A. Shpilman, MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned, 2022. URL: http://arxiv.org/abs/2202.10583. doi:1 0 . 4 8 5 5 0 / a r X i v . 2 2 0 2 . 1 0 5 8 3 , arXiv:2202.10583 [cs]. [16] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym, 2016. a r X i v : a r X i v : 1 6 0 6 . 0 1 5 4 0 . [17] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. van Hasselt, D. Silver, T. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, R. Tsing, StarCraft II: A New Challenge for Reinforce- ment Learning, 2017. URL: http://arxiv.org/abs/1708.04782. doi:1 0 . 4 8 5 5 0 / a r X i v . 1 7 0 8 . 0 4 7 8 2 , arXiv:1708.04782 [cs]. [18] P. Gutiérrez-Sánchez, M. A. Gómez-Martın, P. A. González-Calero, P. P. Gómez-Martın, Liquid Snake, 2022. URL: https://github.com/UCM-GAIA/Liquid_Snake. [19] PadaOne Games, BehaviorBricks, 2021. URL: http://bb.padaonegames.com/.