=Paper=
{{Paper
|id=Vol-2763/CPT2020_paper_s6-6
|storemode=property
|title=Variable Realistic Image Synthesis for CNN Training Dataset Generation
|pdfUrl=https://ceur-ws.org/Vol-2763/CPT2020_paper_s6-6.pdf
|volume=Vol-2763
|authors=Vadim Sanzharov,Vladimir Frolov,Alexey Voloboy
}}
==Variable Realistic Image Synthesis for CNN Training Dataset Generation==
Variable photorealistic image synthesis for training dataset generation V.V. Sanzharov1, V.A. Frolov2,3, A.G. Voloboy3 vs@asugubkin.ru | vladimir.frolov@graphics.cs.msu.ru | voloboy@gin.keldysh.ru 1 Gubkin Russian State University of Oil and Gas, Moscow, Russia 2 Moscow State University, Moscow, Russia 3 Keldysh Institute of Applied Mathematics, Moscow, Russia Photorealistic rendering systems have recently found new applications in artificial intelligence, specifically in computer vision for the purpose of generation of image and video sequence datasets. The problem associated with this application is producing large number of photorealistic images with high variability of 3d models and their appearance. In this work, we propose an approach based on combining existing procedural texture generation techniques and domain randomization to generate large number of highly variative digital assets during the rendering process. This eliminates the need for a large pre-existing database of digital assets (only a small set of 3d models is required), and generates objects with unique appearance during rendering stage, reducing the needed post- processing of images and storage requirements. Our approach uses procedural texturing and material substitution to rapidly produce large number of variations of digital assets. The proposed solution can be used to produce training datasets for artificial intelligence applications and can be combined with most of state-of-the-art methods of scene generation. Keywords: photorealistic rendering, procedural generation, synthetic datasets, computer vision. But there are also several drawbacks with synthetic 1. Introduction data generation. The main problem is producing 3d There are two main challenges in the training of scene setups suitable for rendering adequate image artificial intelligence models is data quantity and data dataset. The first part of this problem is to generate a 3d quality. Data quantity concerns availability of sufficient scene layout meaningful placement of objects (3d amounts of training and testing data. Training modern models and light sources) and choosing 3d models to computer vision algorithms requires image datasets of a include in the generated scene. And the second part is significant volume - tens and hundreds of thousands of setting the optical properties of material models for images for training on static images and an order of objects in scene, so that they will mimic real-life magnitude more for animation [1, 2] Also, by data objects. Similar problem arises in digital content quantity we mean how balanced is the data – are all the creation in visual effects and video games industries, different classes, which the model must recognize, where several variations of the same digital asset (such represented enough. This can be a significant problem as 3d models, material models, texture, etc.) are created because certain classes can be very rare in the data by 3d artists using software tools. However, for large obtained from real world [3]. Data quality can mean datasets generation it’s not feasible to manual produce many different characteristics, but one that is especially variations of 3d assets. important for images is accurate markup. For example, There is also drawback associated with time and if model needs to detect certain objects in an image, computational resources required to render dataset with then in the training data these objects must be the size of thousands of images. accurately annotated. This is usually done manually or In this work, we propose an approach aimed at semi-automatically with the help of segmentation tools alleviating the latter problem by using procedural [4, 5]. Annotating a large image dataset manually is texturing and material substitution to produce large extremely expensive, and often manual marking does number of variations from small set of base digital not have the necessary accuracy (automated markup assets. options also suffer from insufficient accuracy and have 2. Related Work disadvantages). Using synthetic data (in this case, photorealistic The drawback associated with computational images produced by rendering 3d scenes) can easily resources can be solved by using fast and simple solve these problems. The solution to data quantity rasterization-based rendering solutions (usually problem can be achieved by using the algorithms for OpenGL-based) [6], possibly in tandem with global procedurally setting the optical properties of materials illumination approximation such as ambient occlusion and surfaces (displacement maps), this way it’s possible [7]. to quickly generate almost unlimited number of training Of course, rendering of thousands of images or examples with any distribution of objects (and therefore sequences requires significant computational resources. classes) present in generated images. In addition, one But while in simulators developed for training people in can create training examples which are scarce or almost many cases a schematic image of a three-dimensional non-existent in “real-life” datasets. For example, scene is enough for a person (although it is necessary to emergency situations on the road or in production, provide real visualization time), modern AI systems military operations, objects that only exist as design based on deep neural networks are trained according to projects or prototypes. And second, it is possible to different principles. It is important for AI systems to produce pixel-perfect image annotation together with accurately model the data on which training is supposed rendered image. to be carried out (but there is no real-time requirement). For synthetic data to closely approximate real-life Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) datasets, it should simulate reality (i.e. the image should There are also works that use variety of approaches be photorealistic). Otherwise, there is no guarantee that to scene description and generation, such as domain AI will “work” on similar examples in reality, since it is specific languages [23], scene graphs [24], stochastic currently practically impossible to understand the grammars [25] for scenes description and generation. causes of a failure in a multilayer neural network [8-10]. Finally, there are solutions that can generate a So, photorealistic rendering, which is usually based on whole synthetic dataset similar to specified real-world path tracing algorithm or its many variants needs to be dataset [26] used. Works [11-13] demonstrate advantages of using These works mainly focus on composing realistic photorealistic rendering for synthetic datasets 3d scenes from existing digital assets – 3d models, generation. Because the computational cost of textures, materials, etc. While in some cases [8] the physically correct rendering is still quite high and digital assets itself are randomized, this is done in a rendering speed and scaling of the training dataset very limited manner, -usually only the base material generation system as a whole is important, solutions, color is changed. And because of it, these approaches relying on photorealistic rendering have disadvantage in require large databases of digital assets to produce this regard. It is, however, alleviated by the recent images with high variability of objects in them. advent of publicly available hardware accelerated ray- One of the methods to further increase variety and tracing, which can provide significant speedups for realism of synthesized images and to match them more photorealistic rendering [14] as well as denoisers [15]. closely to real-life datasets is domain adaptation [27- Several approaches exist to automate the process of 28]. However, such techniques require additional image creating 3d scenes for photorealistic image datasets processing stage which requires significant time and creation. In [16-19] authors use Augmented Reality computational power, especially for images with (AR) based techniques to insert synthetic objects in relatively high resolution. photos. This approach requires a way to choose the In this work, we propose an approach based on position of inserted objects – random with some combining existing procedural texture generation distribution, using existing image annotation or techniques and domain randomization to generate large additional reconstruction tools. number of highly variative digital assets during the In [8] the 3d scene is generated by a set of rules rendering process. The proposed solution can be which make use of randomized parameters to select combined with most of the reviewed methods of scene some of 3d models from a database and to procedurally generation. generate others. A similar approach, called Domain Randomization (DR) is used in [20-21]. Domain 3. Proposed solution randomization implies making selection of parameters The motivation behind our solution is to produce (aspects of domain) which are randomized for each many variants of the same digital asset (in particular, 3d generated sample. Such parameters may include camera model with assigned materials and textures) to position and field of view, number of objects, number minimize the amount of manual and expensive work of lights, textures used for objects, etc. done by 3d artists. To achieve this, we propose the In [22] physical simulation is used to achieve following generation pipeline (fig. 1). realistic placement of 3d models on a surface. Fig. 1. Architecture of the proposed image generation pipeline Input scenario specifies settings for the whole be done after rendering and randomization domain – pipeline: what kind of scenes are to be generated – which parameters should be randomized and what is classes of objects to be included, lighting type (indoor randomization distribution - material model parameters, or outdoor, day or night, etc.), which AOVs (arbitrary procedural textures and effects, object classes output values) should be output by rendering system distributions, object placement and so on. (instance segmentation masks, binary object masks, Cloud storage or database contains base digital normals, depth, etc.), image post-processing (if any) to assets: 1. 3d models with material markup - i.e. what parts of 3. Finally, procedural textures don’t have fixed the model have or can have different material types. resolution (resulting texture is infinite and has no 2. Materials – base material types, representing seams) and because of this it is possible to produce common BRDF blends, such as purely diffuse detailed high-quality materials suitable for materials (such as brushed wood or rubber), application to variety of 3d models of different reflective materials (such as polished metals), scale. diffuse + reflective materials (such as plastics or As a part of this work we developed several brushed metals), reflective + refractive materials procedural textures which allowed us to greatly increase (such as glass), diffuse with two reflective layer variation of 3d objects and also increase realism of their (such as car paint with coating) and so on. appearance. 3. Textures – collection of image textures and normal Image post-processing tools goal is to adjust maps to be used in materials. images, output by the rendering system or produce 4. Environment maps – HDR spherical panorama additional data about these images. The tasks performed images for use for image-based lighting, by this stage can involve: representing variety of lighting conditions. 1) measuring 2d bounding boxes for objects/instances; 5. Content metadata – information that is used by 2) applying variety of image-space effects to further domain randomization tools to select fitting digital increase variety of output images or better match assets from the storage according to input scenario. them to real-life datasets, for example: This includes: ˗ chromatic aberrations, ˗ correspondence of classes to 3d models (for ˗ barrel simulation, example, which 3d models are models of cars, ˗ blur, chairs or humans), ˗ transformations and warping the image, ˗ correspondence of material classes to materials including resampling for the purpose of anti- in the storage (for example, that stained glass aliasing, and clear non-refractive glass are both of type ˗ noise “glass” and therefore can be assigned to a 3d ˗ and others. model part marked as “glass” type), 3) cutting objects out of rendered image, ˗ correspondence of textures to material 4) composing rendered objects with other images (as parameters (which textures can be used for Augmented Reality based solutions mentioned which material parameters), earlier do) ˗ information of HDR images (what lighting 5) format conversions, conditions this particular image has) and so on. 6) and others. Domain randomization tools produce scene It is worth noting that all listed tasks can be descriptions from input scenario. This stage can query performed using simple python scripts or open-source digital assets storage and using content metadata compositing software like Natron [31,32] and don’t randomly or deterministically (depending on input need complex and computation-intensive processing scenario) select appropriate digital assets and generate with neural networks. requested number of scene descriptions. The generated In the described image generation pipeline scene description is intended to be used by rendering architecture, the domain randomization tools stage can system directly. be replaced by any other of the reviewed approaches to As photorealistic rendering system in our work we scene generation – scene graph produced by neural used open-source system Hydra Renderer [29] which network processing of existing datasets, stochastic uses.xml scene description. Scene description also grammars, markup data from existing dataset for includes what procedural effects should be used and employing augmentation reality techniques. Or any what are their input parameters (if any). Hydra Renderer custom scene generation solution, for example placing supports user extensions for procedural textures [30] objects inside existing scene with respect to its depth and the usage of this functionality is one of the key buffer. elements of our solution. There are several properties of procedural textures 4. Object appearance variation techniques that make them a vital element in our generation Procedural textures pipeline: 1. Procedural textures can be parametrized with As we mentioned before, one of the key parts of our arbitrary values and therefore it is possible to work is the use of procedural textures. In this section we generate a large number of variations of the same describe procedural textures developed for use in our texture. generation pipeline. 2. It’s possible to apply texture to geometry without The first problem we were trying to solve with uv-unwrapping if the texture is parametrized by, for procedural textures is to provide additional details to example, world space or object space position. This rendered 3d models to produce more realistic images in allows to relax requirements for 3d models and contrast to crisp and clear look of rendered objects. For eliminate predominantly manual work of doing uv- this purpose we implemented several procedural mapping for them. textures, simulating effects such as dirt, rust and scratches on materials textures (fig. 2-5). as to dynamically control how far effect spreads on 3d model, which is impossible with ordinary textures. Developed procedural textures can affect not only colors or blending masks between different materials, but also normal maps (fig. 6) or can be used for displacement maps to slightly deform the object (fig. 7). Changing geometry in this way also produces changes in its silhouette and therefore in segmentation masks. Fig. 2. Rust procedural texture variations on different models Fig. 3. Dirt procedural texture variation on different models Fig. 6. Procedural displacement. Top - without displacement, bottom - with mild displacement, warped regions are highlighted Fig. 4. Scratches procedural texture variations. Also affects normal map Fig. 7. Material substitution example Fig. 5. Rust and dirt procedural textures applied to road signs models normal map Material substitution All these textures were parametrized in a way, that allowed domain randomization tools significantly vary In addition to procedural textures, another technique the appearance of the texture by passing different (and to increase variability of 3d models implemented in possibly random) values as these parameters. Since the proposed solution is material substitution. In the implementation of these procedural effects is proposed data generation pipeline, digital content predominantly based on noise functions most of these storage contains materials, while 3d models are marked parameters correspond to noise parameters such as up with material types. This allows to specify a amplitude, frequency and persistence. Among other collection of materials – manually created, pre- common parameters used is the relative height (or other generated or imported from one of the existing open dimension) of an object, the effect reaches. This allows libraries. These materials can then be classified into several categories such as “wood”, “metal”, “car paint”, Proceedings of the IEEE conference on Computer “plastic” and so on. And during scene generation phase, Vision and Pattern Recognition. 2014 domain randomization tools can allow 3d models to use [2] Wu, Zuxuan, et al. Deep learning for video random materials within classes, specified as possible classification and captioning // Frontiers of for this model. For example, chairs can have materials multimedia research. 2017. 3-29. from “wood” or “metal” classes, while “glass” type [3] Фаизов Б.В., Шахуро В.И., Санжаров В.В., materials are unlikely to be assigned to chair model. Конушин А.С. Классификация редких In the reviewed existing solutions this technique is дорожных знаков // Компьютерная Оптика, T. mostly used in a very rudimentary form – only color is 44, №2, 2020 changed, not the material (i.e. BRDF or BRDF blend) [4] Moehrmann, Julia, and Gunther Heidemann. itself. Efficient annotation of image data sets for computer vision applications. // Proceedings of the 5. Results and conclusion 1st International Workshop on Visual Interfaces for Proposed image datasets generation pipeline Ground Truth Collection in Computer Vision architecture can create many variations of the same 3d Applications. 2012. model using procedural textures and material [5] Gao, Chao, Dongguo Zhou, and Yongcai Guo. substitution techniques. Other 3d scene parameters, Automatic iterative algorithm for image which are commonly varied in existing solutions, such segmentation using a modified pulse-coupled as lighting (HDR spherical maps for image-based neural network. // Neurocomputing 119 (2013): lighting point and area lights) can also be utilized in the 332-338. proposed pipeline. This allows to use much smaller pre- [6] Su, Hao, et al. Render for cnn: Viewpoint existing digital assets collection while producing output estimation in images using cnns trained with images of high variability. rendered 3d model views. // Proceedings of the Let’s consider the case of a simple scene with only IEEE International Conference on Computer one 3d model in it. In existing solutions appearance of a Vision. 2015. single 3d model is most commonly varied by a single [7] Kirsanov, Pavel, et al. DISCOMAN: Dataset of parameter – base color. Proposed solution allows also to Indoor Sсenes for Odometry, Mapping And specify several procedural textures, each with at least 3 Navigation. // arXiv preprint arXiv:1909.12146 parameters (related to noise functions), which (2019). significantly alter the appearance of objects (fig. 2-5). [8] Nguyen, Anh, Jason Yosinski, and Jeff Clune. So, for single 3d model with single material each Deep neural networks are easily fooled: High procedural texture introduces another 3 dimensions of confidence predictions for unrecognizable images. appearance variability compared to only single “color” // Proceedings of the IEEE conference on computer dimension in existing solutions. This allows our vision and pattern recognition. 2015. solution to produce exponentially more variations of a [9] Zhang, Chiyuan, et al. Understanding deep learning single 3d model with single material. And with material requires rethinking generalization. // arXiv preprint substitution we can also alter the material types suitable arXiv:1611.03530 (2016). for this particular 3d model. [10] Montavon, Grégoire, Wojciech Samek, and Klaus- Existing works on synthetic image datasets Robert Müller. Methods for interpreting and generation are mostly concerned with scene generation understanding deep neural networks // Digital and rely on having large collections of digital assets to Signal Processing 73 (2018): 1-15. construct these scenes from. While there exist large 3d [11] Movshovitz-Attias, Yair, Takeo Kanade, and Yaser models collections such as ShapeNet [33], they usually Sheikh. How useful is photo-realistic rendering for have content of poor quality compared to assets, created visual learning?. // European Conference on by experienced 3d artists. And with proposed approach Computer Vision. Springer, Cham, 2016. of using procedural textures and material substitution, it [12] Tsirikoglou, Apostolia, et al. Procedural modeling is possible to produce many variations out of small set and physically based rendering for synthetic data of high-quality models that portray real-life objects generation in automotive applications. // arXiv more accurate. preprint arXiv:1710.06270 (2017). However, proposed solution doesn’t exclude lower [13] Zhang, Yinda, et al. Physically-based rendering for quality models and is able to deal with 3d models indoor scene understanding using convolutional without texture coordinates because of used procedural neural networks. // Proceedings of the IEEE texturing techniques. Conference on Computer Vision and Pattern Finally, proposed image generation pipeline can be Recognition. 2017 integrated with any of the reviewed solutions for scene [14] Sanzharov V., Gorbonosov A., Frolov V., Voloboy generation. A. Examination of the Nvidia RTX // CEUR Workshop Proceedings, vol. 2485 (2019), p. 7-12 6. References: [15] S.V.Ershov, D.D.Zhdanov, A.G.Voloboy, V.A.Galaktionov. Two denoising algorithms for bi- [1] Karpathy, Andrej, et al. Large-scale video directional Monte Carlo ray tracing // Mathematica classification with convolutional neural networks. // Montisnigri, Vol. XLIII, 2018, p. 78-100. https://lppm3.ru/files/journal/XLIII/MathMontXLII [30] V.V. Sanzharov, V.F. Frolov. Level of Detail for I-Ershov.pdf Precomputed Procedural Textures // Programming [16] Alhaija, Hassan Abu, et al. Augmented reality and Computer Software, 2019, V. 45, Issue 4, pp. meets computer vision: Efficient data generation 187-195 DOI:10.1134/S0361768819040078 for urban driving scenes. // International Journal of [31] Natron, Open Source Compositing Software For Computer Vision 126.9 (2018): 961-972. VFX and Motion Graphics [17] Dosovitskiy, Alexey, et al. Flownet: Learning https://natrongithub.github.io/ optical flow with convolutional networks. // [32] A.E. Bondarev. On visualization problems in a Proceedings of the IEEE international conference generalized computational experiment (2019). on computer vision. 2015. Scientific Visualization 11.2: 156 - 162, DOI: [18] Varol, Gul, et al. Learning from synthetic humans. 10.26583/sv.11.2.12 (Scopus) http://www.sv- // Proceedings of the IEEE Conference on journal.org/2019-2/12/ Computer Vision and Pattern Recognition. 2017. [33] Chang, Angel X., et al. "Shapenet: An information- [19] Chen, Wenzheng, et al. Synthesizing training rich 3d model repository." arXiv preprint images for boosting human 3d pose estimation. // arXiv:1512.03012 (2015). 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016. About the authors [20] Tobin, Josh, et al. Domain randomization for Vadim Sanzharov, senior lecturer, Gubkin Russian State transferring deep neural networks from simulation University of Oil and Gas. E-mail: vs@asugubkin.ru. to the real world. // 2017 IEEE/RSJ international Vladimir Frolov, PhD, senior researcher at Keldysh conference on intelligent robots and systems Institute of Applied mathematics RAS, researcher at Moscow (IROS). IEEE, 2017 State University. [21] Prakash, Aayush, et al. Structured domain Alexey Voloboy, D.Sc., PhD, leading researcher at Keldysh Institute of Applied mathematics RAS. randomization: Bridging the reality gap by context- aware synthetic data. // 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019 [22] Mitash, Chaitanya, Kostas E. Bekris, and Abdeslam Boularias. A self-supervised learning system for object detection using physics simulation and multi-view pose estimation. // 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017. [23] Fremont, Daniel J., et al. Scenic: a language for scenario specification and scene generation. // Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2019 [24] Armeni, Iro, et al. 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera. // Proceedings of the IEEE International Conference on Computer Vision. 2019 [25] Jiang, Chenfanfu, et al. "Configurable 3d scene synthesis and 2d image rendering with per-pixel ground truth using stochastic grammars." International Journal of Computer Vision 126.9 (2018): 920-941. [26] Kar, Amlan, et al. Meta-sim: Learning to generate synthetic datasets. // Proceedings of the IEEE International Conference on Computer Vision. 2019. [27] Hoffman, Judy, et al. Cycada: Cycle-consistent adversarial domain adaptation. // arXiv preprint arXiv:1711.03213 (2017). [28] French, Geoffrey, Michal Mackiewicz, and Mark Fisher. Self-ensembling for visual domain adaptation. // arXiv preprint arXiv:1706.05208 (2017) [29] Ray Tracing Systems, Keldysh Institute of Applied Mathematics, Moscow State Uiversity. Hydra Renderer. Open source rendering system, 2019, https://github.com/Ray-Tracing-Systems/HydraAPI