=Paper=
{{Paper
|id=Vol-2862/paper8
|storemode=property
|title=Synthesizing Retro Game Screenshot Datasets for Sprite Detection
|pdfUrl=https://ceur-ws.org/Vol-2862/paper8.pdf
|volume=Vol-2862
|authors=Chanha Kim,Jaden Kim,Joseph C. Osborn
|dblpUrl=https://dblp.org/rec/conf/aiide/KimKO20
}}
==Synthesizing Retro Game Screenshot Datasets for Sprite Detection==
Synthesizing Retro Game Screenshot Datasets for Sprite Detection Chanha Kim, Jaden Kim, Joseph C. Osborn Formal Analysis of Interactive Media Lab Pomona College 185 East 6th Street Claremont, California 91711 {chanha.kim, jaden.kim, joseph.osborn}@pomona.edu Abstract Scenes in 2D videogames generally consist of a static terrain and a set of dynamic sprites which move around freely. AI systems that aim to understand game rules (for design support or automated gameplay) must be able to distinguish moving elements from the background. To this end, we re-purposed an object detection model from deep learning literature, de- veloping along the way YOLO Artificial Retro-game Data Synthesizer, or YARDS, which efficiently produces semi- realistic, retro-game sprite detection datasets without man- Figure 1: Real (left) vs. Synthetic (right) Screenshot from ual labeling. Provided with sprites, background images, and a set of parameters, the package uses sprite frequency spaces Super Mario Bros. on NES to create synthetic gameplay images along with their corre- sponding labels. resources to train sprite detection models for their corre- sponding games, which should generalize better than classi- Introduction cal computer vision techniques or emulator instrumentation. Many videogames employ a visual language which presents While our approach still requires that users collect a static terrain (with background and foreground elements) spritesheets and background images, it eliminates the need juxtaposed with dynamic, animated, and freely moving for manually labeling images or writing computer vision sprites. Sprites are often game characters or objects of in- code. We have packaged this generator in a Python pack- terest: potential threats, powerups, or the player’s charac- age called YOLO Artificial Retro-game Data Synthesizer ter. Knowledge about these sprites (e.g. their type, loca- (YARDS). YARDS is a command-line tool which, given tion, or speed) is vital for AI systems meant to understand sprites and background images, generates synthetic train- games, especially in systems such as automated game de- ing images that mimic patterns found in real game screen- sign learning (Osborn, Summerville, and Mateas 2017) and shots. YARDS pre-formats the synthetic data for integration learning-based level generation (Guzdial and Riedl 2016; with YOLO and can generate 1,000 labeled images in 10 Summerville et al. 2016a) as well as general videogame seconds—a task which took the authors 10 long hours! playing and automated approaches to accessibility. In this paper, we present our two key contributions. First, Besides manual image labeling, current methods for ob- we apply recent progress made in deep learning and com- taining sprite segmentations generally involve game-specific puter vision to aid in producing semi-realistic datasets suit- image processing or deep instrumentation of game emula- able for training sprite detection models. Second, we in- tors, as in CHARDA (Summerville, Osborn, and Mateas troduce a software package that can rapidly generate large 2017). These techniques can be difficult to generalize and datasets. In the following sections, we will discuss our may be expensive, slow, or potentially fragile (e.g. the loca- approach for synthesizing retro-game screenshots, demon- tions of hardware sprites in memory do not correspond ex- strate how YARDS works, and evaluate several models actly to the positions of human-legible sprites). In this work, trained on synthetic data against those trained on manually we instead generate synthetic images starting from readily labeled images. accessible spritesheets and sprite-free background images. We can then use the synthetic datasets generated from these Related Works Copyright c 2020 for this paper by its authors. Use permitted un- The intersection of computer vision and games is a growing der Creative Commons License Attribution 4.0 International (CC area of research that can benefit both designers and play- BY 4.0). ers. In 2018, Luo et al. demonstrated how transfer learning Figure 2: Examples of Synthetic Images Generated with YARDS datasets in niche application domains (Wong et al. 2019) and the under-representation of the full target distribution in small datasets (Lateh et al. 2017). For such reasons, re- searchers have suggested numerous approaches to generat- ing synthetic data over the past decade (Nikolenko 2019; Seib, Lange, and Wirtz 2020). One common approach is to first generate a synthetic im- age and then stylize the image to be more realistic (Dwibedi, Figure 3: Adapted diagram of the original YOLO architec- Misra, and Hebert 2017; Georgakis et al. 2017; Wong et al. ture (Koylu, Zhao, and Shao 2019) 2019). For example, Wang et al. 2019 generated photoreal- istic synthetic images using a virtual 3D object-environment reconstruction method and style transfer techniques. Hinter- could improve the task of extracting player experiences di- stoisser et al. 2019 used 3D CAD models and pose curric- rectly from gameplay videos. In the same year, Zhang et al. ula to generate foreground-background compositions, then introduced the problem of content-based retrieval of game made those compositions photorealistic via rendering tech- moments and presented a prototype search engine for re- niques. trieving such moments based on user-provided game screen- shots (2018). Our research contributes to this area by apply- A significant issue arising from synthetic datasets is the ing synthetic data generation techniques to the task of train- synthetic-to-real domain gap (Tremblay et al. 2018; Yun et ing sprite detection models and by introducing a software al. 2019b). This gap occurs when the synthetic images used package that supports end-users with synthesizing their own to train a model are not representative of the target image datasets. distribution. In terms of model performance, research shows In our research, we use YOLO (You Only Look Once) to that models trained with both real and synthetic images detect sprites in four retro games: Super Mario Bros. (NES), achieve the best performance, followed by models trained Earthbound (SNES), Super Street Fighter II (SNES), and with purely real images (Rozantsev, Lepetit, and Fua 2015; Super Mario World (SNES). YOLO is a well-known ob- Dwibedi, Misra, and Hebert 2017; Georgakis et al. 2017; ject detection model (Redmon et al. 2015) which has Rajpura, Bojinov, and Hegde 2017; Yun et al. 2019a). These seen several iterations (Redmon and Farhadi 2016; 2018; studies also demonstrate that training with purely synthetic Bochkovskiy, Wang, and Liao 2020). The original model’s images seems to detract from model performance. However, architecture is inspired by GoogLeNet (Szegedy et al. 2014) comparable performance is achievable in image segmenta- and has 24 convolutional layers followed by 2 fully con- tion tasks (Di Cicco et al. 2017) and fine-tuning models nected layers (see Fig. 3). This end-to-end model allows for trained on synthetic data with additional real images can efficient training and detection of objects in both images and yield better performance than mixed training (Nowruzi et videos. The specific YOLO version that we are using is Ul- al. 2019). tralytic’s YOLOv5 (Ultralytics 2020), a recent implementa- tion of YOLO in PyTorch. Our synthetic data generation approach is most similar to The use of synthetic data to train models is a well- the one presented by Dwibedi et al. 2017. Since our games known practice in computer vision. Motivations for gener- of interest use a pixelated art style, we can skip the step of ating synthetic datasets include the high cost of manually increasing the realism of the generated images; it is enough labelling real images (Roig et al. 2020), privacy concerns to simply paste the sprites onto the backgrounds in a rea- when using real user data (Triastcyn and Faltings 2018; sonable distribution. We therefore focus on developing an Shaked and Rokach 2020), and the shortage of real train- efficient technique for pasting sprites onto background im- ing examples for rare cases (Beery et al. 2019). Other argu- ages based on sprite frequency distributions observed in real ments for using synthetic data include the lack of extensive gameplay images. Synthetic Data Generation Approach Our approach involves two steps: first, collect the sprites and background images for a given game; and second, paste sprites onto background images using their frequency dis- tributions. Both sprites and background images are easily obtainable by extracting the data from an emulator, borrow- ing from archives compiled by fans, or utilizing approaches proposed by researchers like Summerville et al. 2016b. Ad- ditional methods for obtaining sprites and background im- ages include scripting some game-specific image extraction code or providing the assets directly if the user is the one developing the game. One reason why our approach is so effective in the videogame domain is because we are working with much smaller image spaces (sets of possible images) than the real-world image spaces typically used in computer vision Figure 4: Example of sprite clipping on worst-case scenario tasks. While the immense complexity of real-world images of L-shaped sprite in bottom left corner of screen. Midpoints can depend on virtually anything, from lighting conditions are where sprite will be cropped. to object textures, our focus on low-resolution, retro-game screenshots allows us to synthesize realistic screenshots just by pasting sprites onto background images. Sprite Frequency Spaces Even though we are working with low-resolution images, synthesizing images that roughly mimic those seen in real gameplay is a nontrivial task. We generate synthetic images by providing the sprite frequency space for each class of sprites to be detected. We define the sprite frequency space for a given class as the discrete probability distribution over the frequency of appearances for that class on a given game- Figure 5: Example of top-left, top-right, bottom-left, and play image. That is, it is a function mapping the numbers of bottom-right quadrants in a clipped sprite sprites in a class to the probabilities that those numbers of sprites actually appear on a screenshot. These sprite frequency spaces can either be defined by the the lower left corner of the screen might have no visible pix- user (as in our reported results) or approximated by feed- els (i.e., opaque pixels) in the screenshot (see Fig. 4). Train- ing pre-labeled images into YARDS. At runtime, YARDS ing a model would be very difficult if our training set asserts will approximate the sprite-frequency spaces by counting that background pixels are in fact part of a character. To han- the frequencies of the desired sprite classes in the image la- dle these edge cases, we propose an approach that relies on bels. Using these sprite frequency spaces, we can determine transparency quadrants for determining whether enough of the number of times each sprite class should appear on each the sprite is visible in the screenshot. generated image. For example, in Super Mario Bros. (NES), there should almost always be one player on the screen, For a given sprite, we first determine its transparency while there may be any number of enemy sprites from zero quadrants by using the locations of the first and last visi- to 16, each number appearing at a different rate. ble (non-transparent) pixels along each of its border axes. For each output image, we choose the number of times The upper-left quadrant (Fig. 5) is determined by the first each class of sprites should appear by sampling the class’s visible pixel in the first row of visible pixels and the first corresponding sprite frequency space. We thereby incorpo- visible pixel in the first column of visible pixels. Corre- rate the given or estimated frequencies with which our tar- sponding horizontal and vertical lines are drawn from each get sprite classes appear in authentic gameplay images. This pixel, and their intersection defines the quadrant. Similarly, allows the YOLO model to train on data that roughly mim- the bottom-right quadrant (Fig. 5) is determined by the last ics our target distribution and avoid the previously-discussed visible pixel in the last row of visible pixels and the last vis- domain gap issues. ible pixel in the last column of visible pixels. The upper- right and bottom-left quadrants are formed analogously. In general, the quadrant is defined by the intersection of the Edge Handling with Transparency Quadrants horizontal and vertical lines drawn from each relevant pixel. Given that sprites often have transparent pixels, we must en- Once we have the sprite’s transparency quadrants, we de- sure that sprites are at least partially visible in screenshots no termine where the sprite is clipped by the boundaries of the matter their shape. For example, an L-shaped sprite placed in screenshot. For example, if it is partially off the left side of Table 1: YARDS Configuration Parameters game title The game’s title, which is prepended to each image’s filename to avoid naming conflicts. num images The total number of images to generate. train size The proportion of total images which should be included in the train set. mix size The proportion of total real images which should be included in the train set. label all classes Determines whether all classes should be labeled or if only specific classes should be. labeled classes Determines which classes to label if label all classes is false. Useful for focusing attention on a single sprite and introducing noise in the form of other sprites or random images. max sprites per class The maximum number of sprites per class which can appear in any given image. If set to -1, no cap will be set. Provides a means for limiting noise. Useful primarily when setting classification scheme to random, as it allows for more control of the distribution. transform sprites Another means for introducing noise. If set to true, transforms sprites by rotating a multiple of ninety degrees, mirroring, or scaling to twice their original size. The reason for the set scaling is because pixel art gets distorted by any non-double scaling. clip sprites1 Determines whether to keep all sprites entirely on screen or to allow some sprite clipping. classification scheme2 Determines the classification scheme by which to place sprites. the screenshot, we check the right-most quadrants (Fig. 5). what each parameter does, and the complete documentation Then, if the larger width of the two quadrants (i.e., the trans- is available in the project’s source code repository. parency space) is greater than the width of the sprite that is clip_sprites1 and classification_scheme2 visible after clipping (i.e., the clipping space), not enough are the parameters that control our synthetic data gener- of the sprite is within the boundaries of the screenshot. For ation approach. clip_sprites1 determines whether or instance, assume the transparency space is greater than or not the synthetic screenshots should have clipped sprites. equal to the clipping space for a sprite being clipped off the classification_scheme2 accepts one of four key- left of an image. In this case, we average the x-positions words that define different methods for characterizing the of the inner vertical edges of all four transparency quad- sprite frequency space: distribution, mimic-real, rants. Then, we crop the sprite to be from the resulting av- random, and discrete. The distribution method erage x-position to the sprite’s rightmost border and paste takes a set number of predefined classes such as player, the cropped sprite into the screenshot with the sprite’s left enemy, or item and corresponding sprite frequency spaces border aligned with the screenshot’s left border. Because we for each class, represented by an array. For instance, use all four transparency edges to determine how to crop player: [0.20, 0.40, 0.40] means that for the the sprite, we know that enough of the sprite’s useful infor- player class, zero sprites should appear twenty percent mation will appear on the generated image. We perform an of the time, one sprite should appear forty percent of the analogous procedure for each screen boundary that the sprite time, and two sprites should appear forty percent of the time. overlaps. The mimic-real method analyzes a set of pre-labeled images to approximate the sprite distribution in a dataset Using YARDS and takes as input an array of class numbers, which corre- Integrating YARDS into object detection projects is simple. spond to the class numbers in the image labels. It then uses The development pipeline with YARDS involves three main the approximated distributions to generate the images. The steps: preprocessing, synthetic data generation, and model random method samples each class with a uniform distri- training. In preprocessing, we gather sprite and background bution, given the maximum number of sprites for each class. images for a given game and define the corresponding folder The discrete method takes inspiration from games like locations in the configuration file. During synthetic data gen- Street Fighter II where each screen has a constant number eration, we define the parameters in the rest of the configu- of sprites, and it takes a constant number of sprites to dis- ration file and run YARDS via command line to generate the play on each screenshot. images. The command-line package for YARDS takes up to two parameters. The configuration parameter (--config Tests and Results or -c) defines the path to the configuration file, and the vi- To evaluate our dataset synthesizer we compare model per- sualize parameter (--visualize or -v) tells the package formance for different datasets from a single game, train to draw bounding boxes for a sample of images. Finally, in binary classifiers to detect real versus synthetic data, and model training, we train the model and validate it in a con- demonstrate the generalizability of our approach. ventional machine learning pipeline. Configuration Parameters Training on Synthetic, Real, and Mixed Datasets YARDS has multiple configuration parameters that the user To compare the performance of models trained on synthetic must define prior to using the package. Table 1 summarizes data to those trained on real data, we trained nine YOLO Figure 6: Detection Results of Models Trained on Various Datasets for Super Mario Bros. models on various datasets for Super Mario Bros. (screen- shot dimension: 256×192). Each YOLOv5 model (Ultralyt- Table 3: Accuracy of binary classifiers trained to classify ics 2020) was trained for 200 epochs with batch-size 32, and synthetic versus real Super Mario Bros. gameplay images the best weights for each model (i.e. the weights that yielded Model Accuracy the best model performance in training) were validated on LeNet-5 0.5000 250 real images. We used mAP@0.5, a mean average preci- AlexNet 0.8490 sion metric for measuring the performance of object detec- ResNet-50 0.9690 tion models, as our single-valued evaluation metric. Table 4: Mean average precision of datasets containing mix- Table 2: Mean average precision of YOLO models trained ture of generated data from Super Mario Bros., Super Street on various datasets for Super Mario Bros. (Better-than- Fighter II, and Earthbound baseline performance is in bold.) Dataset mAP@0.5 Training Set Composition mAP@0.5 60k imgs w/ clipping 0.9525 750 real (baseline) 0.856 60k imgs w/o clipping 0.9635 375 real + 375 synthetic 0.886 75 real + 675 synthetic 0.841 750 synthetic 0.677 sus synthetic images. Each architecture was modified to take 750 synthetic + fine-tuning w/ 750 real 0.881 in inputs of 256 × 256 × 3, configured with binary cross- 375 real + 3,375 synthetic 0.956 entropy loss and the Adam optimizer, and trained for 50 3,750 synthetic 0.816 epochs with batch-size 64. LeNet-5 was modified to use 500 real + 9,500 synthetic 0.963 max-pooling and ReLU activation. We trained and tested 10,000 synthetic 0.935 the classifiers on 4000-image datasets composed of real and synthetic training images from Super Mario Bros. Table 2 confirms the results shown by researchers in other Contrary to our original hypothesis that each model would image recognition domains, suggesting that training mixed achieve roughly a 50% accuracy, Table 3 suggests that archi- datasets of real and synthetic images yields the best model tectures with many weights (e.g. AlexNet and ResNet-50) performance—although models trained on 3,750 and 10,000 are able to distinguish between real and synthetic images, synthetic images show that training with large purely syn- while smaller ones like LeNet-5 are not. We therefore need thetic datasets can also work well. The model trained on to develop a training strategy to account for the discrepancy 750 synthetic images also improved significantly after fine- between synthetic and real data; in the future, we may be tuning with 750 real images for 41 epochs. We initialized able to leverage these binary classifiers to guide further im- fine-tuning to train the model’s weights for 200 epochs, but provements to our synthetic data generation approach. the package fast-forwarded the fine-tuning process to the last 41 epochs. Based on these results, we recommend us- Generalization ing YARDS to generate either very large synthetic datasets We trained two very large datasets composed of synthetic or smaller supplementary synthetic datasets to boost manu- data for three separate games: Super Mario World, Super ally labeled images. Street Fighter II, and Earthbound (screenshot dimension: 256 × 224). We generated 20,000 images for each game, Binary Classification of Synthetic vs. Real Data combining them into a total dataset of 60,000 images. We To test whether computer vision models could distinguish split the dataset at a 0.8 train-test ratio and generated two between real and synthetic data, we trained LeNet-5 (Lecun variants: one with clipping and one without. The results can et al. 1998), AlexNet (Krizhevsky, Sutskever, and Hinton be seen in Table 4. Model performance dropped for games 2012), and ResNet-50 (He et al. 2015)—three classic CNN outside of those the model was trained on, which may be architectures of increasing complexity—to classify real ver- due to the lack of sprites representing the larger sphere of Figure 7: Detection Results of Models Trained on Large Dataset with Clipping for Super Mario Bros., Super Street Fighter II, and Earthbound NES/SNES games. Given a wider variety of sprites and that could greatly improve our current approach would be to games, however, we believe in the possibility of training a define spatial curves in addition to sprite frequency spaces. general detection model for most NES/SNES games. Using the spatial curves to paste sprites into regions where they would appear in real gameplay images can serve as Discussion a way of increasing the realism of the synthetic images. Third, trying out existing approaches, such as domain ran- Although models trained on purely synthetic datasets do not domization (Liu, Liu, and Luo 2020; Borrego et al. 2018; perform as well as those trained on purely real datasets, our Tremblay et al. 2018), increasing the accuracy of images results suggest that training models with mixed synthetic- in relation to natural data (Liu, Liu, and Luo 2020), us- and-real datasets can increase overall sprite detection per- ing generative models or GANS (Goodfellow et al. 2014; formance. Furthermore, synthesizing artificial data is much Liu, Liu, and Luo 2020; Bailo, Ham, and Shin 2019; Tri- more efficient than collecting real gameplay screenshots and astcyn and Faltings 2018), and procedural content genera- accelerates object-detection model development, ultimately tion (Nikolenko 2019), may provide further insight into how allowing for better-performing models. Additionally, large to refine our approach. synthetic datasets may outweigh the advantages of using real images and make sprite detection tools more accessible. Training videogame object detectors on mixed datasets Our YARDS implementation can also benefit from addi- can be useful for many applications. For example, it may tional features. First, incorporating multiprocessing would accelerate research in automated game design learning and greatly increase the speed of synthetic data generation. Our help relax the requirement of deep visibility into the in- current package generates 80 images per second on a sin- ner workings of emulated game hardware for distinguish- gle core with no GPU acceleration for Super Mario Bros., ing game sprites from the level geometry. We also envision and parallelizing this task would increase the package’s ef- methods for game developers to improve the accessibility ficiency. Second, adding basic image rendering and filtering of their games by verifying that a trained model recognizes functions such as blurring or pixelating sprites may be use- sprites and their labels in a way which is consistent with ful for videogames that do not use the pixelated style and a designer’s intention (e.g. that enemies “read” as enemies, resolution common to the four games we examined in this that a character is not easy to misinterpret as background work. Third, color filtering functions may help the object de- texture, etc.). Such a model could help predict whether fu- tection models learn the sprites’ essential features and avoid ture players would be able to make the same assumptions overfitting to their color patterns. Fourth, we would like to and easily identify the playable parts of the game. see added support for games in a wider variety of gameplay Synthetic data generation could also be useful as a feature styles and genres. Finally, adding text detection functions extraction tool for reinforcement learning or other general may help with including basic user-interface elements. game-playing agents. By training models that can accurately identify features of sprites belonging to classes like helpful, harmful, item, enemy, etc. (perhaps borrowed from an affor- In summary, this paper has introduced an application of dance grammar like that of Bentley and Osborn 2019), sprite existing synthetic data generation research to the problem detection models may help reinforcement learning agents of sprite detection and a software package that enables an train faster and generalize more effectively. end-user to rapidly generate large, synthetic training im- That being said, there are numerous ways to improve ages based on sprite frequency spaces and edge-handling. our approach. First, we suggest using model visualization An open-source and working prototype of YARDS is avail- techniques (e.g. class activation maps, occlusion sensitiv- able at https://github.com/faimSD/yards. We hope that our ity, gradient ascent) to visualize what the models see when paper and software package will inspire further research in trained with synthetic versus real data. Second, an addition sprite detection and in computer vision and games. References Liu, W.; Liu, J.; and Luo, B. 2020. Can synthetic data im- prove object detection results for remote sensing images? Bailo, O.; Ham, D.; and Shin, Y. M. 2019. Red blood cell image generation for data augmentation using conditional Luo, Z.; Guzdial, M.; Liao, N.; and Riedl, M. 2018. generative adversarial networks. Player experience extraction from gameplay video. CoRR abs/1809.06201. Beery, S.; Liu, Y.; Morris, D.; Piavis, J.; Kapoor, A.; Meister, M.; Joshi, N.; and Perona, P. 2019. Synthetic examples Nikolenko, S. I. 2019. Synthetic data for deep learning. improve generalization for rare classes. Nowruzi, F. E.; Kapoor, P.; Kolhatkar, D.; Hassanat, F. A.; Laganiere, R.; and Rebut, J. 2019. How much real data do Bentley, G. R., and Osborn, J. C. 2019. The videogame af- we actually need: Analyzing object detection performance fordances corpus. In 2019 Experimental AI in Games Work- using synthetic and real data. shop. Osborn, J. C.; Summerville, A.; and Mateas, M. 2017. Au- Bochkovskiy, A.; Wang, C.-Y.; and Liao, H.-Y. M. 2020. tomated game design learning. Yolov4: Optimal speed and accuracy of object detection. Rajpura, P. S.; Bojinov, H.; and Hegde, R. S. 2017. Object Borrego, J.; Dehban, A.; Figueiredo, R.; Moreno, P.; detection using deep cnns trained on synthetic images. Bernardino, A.; and Santos-Victor, J. 2018. Applying do- Redmon, J., and Farhadi, A. 2016. Yolo9000: Better, faster, main randomization to synthetic data for object category de- stronger. tection. Redmon, J., and Farhadi, A. 2018. Yolov3: An incremental Di Cicco, M.; Potena, C.; Grisetti, G.; and Pretto, A. 2017. improvement. Automatic model based dataset generation for fast and accu- rate crop and weeds detection. 2017 IEEE/RSJ International Redmon, J.; Divvala, S.; Girshick, R.; and Farhadi, A. 2015. Conference on Intelligent Robots and Systems (IROS). You only look once: Unified, real-time object detection. Roig, C.; Varas, D.; Masuda, I.; Riveiro, J. C.; and Bou- Dwibedi, D.; Misra, I.; and Hebert, M. 2017. Cut, paste and Balust, E. 2020. Unsupervised multi-label dataset gener- learn: Surprisingly easy synthesis for instance detection. ation from web data. Georgakis, G.; Mousavian, A.; Berg, A. C.; and Kosecka, Rozantsev, A.; Lepetit, V.; and Fua, P. 2015. On rendering J. 2017. Synthesizing training data for object detection in synthetic images for training an object detector. Computer indoor scenes. Vision and Image Understanding 137:24–37. Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Seib, V.; Lange, B.; and Wirtz, S. 2020. Mixing real and Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. synthetic data to enhance neural network training – a review 2014. Generative adversarial networks. of current approaches. Guzdial, M., and Riedl, M. 2016. Toward game level gener- Shaked, S., and Rokach, L. 2020. Privgen: Preserving pri- ation from gameplay videos. vacy of sequences through data generation. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2015. Deep residual Summerville, A.; Guzdial, M.; Mateas, M.; and Riedl, M. learning for image recognition. CoRR abs/1512.03385. 2016a. Learning player tailored content from observation: Hinterstoisser, S.; Pauly, O.; Heibel, H.; Marek, M.; and Platformer level generation from video traces using lstms. In Bokeloh, M. 2019. An annotation saved is an annotation AAAI Conference on Artificial Intelligence and Interactive earned: Using fully synthetic training for object instance de- Digital Entertainment. tection. Summerville, A. J.; Snodgrass, S.; Mateas, M.; and On- Koylu, C.; Zhao, C.; and Shao, W. 2019. Deep neural net- tanón, S. 2016b. The vglc: The video game level corpus. works and kernel density estimation for detecting human arXiv preprint arXiv:1606.07487. activity patterns from geo-tagged images: A case study of Summerville, A.; Osborn, J.; and Mateas, M. 2017. Charda: birdwatching on flickr. ISPRS International Journal of Geo- Causal hybrid automata recovery via dynamic analysis. Information 8(1). arXiv preprint arXiv:1707.03336. Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Imagenet classification with deep convolutional neural net- Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, works. In Pereira, F.; Burges, C. J. C.; Bottou, L.; and Wein- A. 2014. Going deeper with convolutions. berger, K. Q., eds., Advances in Neural Information Process- Tremblay, J.; Prakash, A.; Acuna, D.; Brophy, M.; Jampani, ing Systems 25. Curran Associates, Inc. 1097–1105. V.; Anil, C.; To, T.; Cameracci, E.; Boochoon, S.; and Birch- Lateh, M. A.; Muda, A. K.; Yusof, Z. I. M.; Muda, N. A.; field, S. 2018. Training deep networks with synthetic data: and Azmi, M. S. 2017. Handling a small dataset problem Bridging the reality gap by domain randomization. in prediction model by employ artificial data generation ap- Triastcyn, A., and Faltings, B. 2018. Generating artificial proach: A review. Journal of Physics: Conference Series data for private deep learning. 892:012016. Ultralytics. 2020. Yolov5. Lecun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Wong, M. Z.; Kunii, K.; Baylis, M.; Ong, W. H.; Kroupa, Gradient-based learning applied to document recognition. In P.; and Koller, S. 2019. Synthetic dataset generation for Proceedings of the IEEE, 2278–2324. object-to-model deep learning in industrial applications. Yun, K.; Nguyen, L.; Nguyen, T.; Kim, D.; Eldin, S.; Huyen, A.; Lu, T.; and Chow, E. 2019a. Small target detection for search and rescue operations using distributed deep learning and synthetic data generation. Yun, W.; Lee, J.; Kim, J.; and Kim, J. 2019b. Balancing domain gap for object instance detection. Zhang, X.; Zhan, Z.; Holtz, M.; and Smith, A. M. 2018. Crawling, indexing, and retrieving moments in videogames. In Proceedings of the 13th International Conference on the Foundations of Digital Games, FDG ’18. New York, NY, USA: Association for Computing Machinery.