=Paper=
{{Paper
|id=Vol-2763/CPT2020_paper_s6-5
|storemode=property
|title=Procedural interior generation for artificial intelligence training and computer graphics
|pdfUrl=https://ceur-ws.org/Vol-2763/CPT2020_paper_s6-5.pdf
|volume=Vol-2763
|authors=Egor Feklisov,Mihail Zingerenko,Vladimir Frolov
}}
==Procedural interior generation for artificial intelligence training and computer graphics==
Procedural interior generation for artificial intelligence training and
computer graphics
E. D. Feklisov1, M. V. Zingerenko1, V. A. Frolov1,2, M. A. Trofimov1
egor.feklisov@gmail.com | liahimzer@gmail.com
1
Moscow State University, Moscow, Russia;
2
Keldysh Institute of Applied Mathematics
Since the creation of computers, there has been a lingering problem of data storing and creation for various tasks. In terms of
computer graphics and video games, there has been a constant need in assets. Although nowadays the issue of space is not one of the
developers' prime concerns, the need in being able to automate asset creation is still relevant. The graphical fidelity, that the modern
audiences and applications demand requires a lot of work on the artists' and designers' front, which costs a lot. The automatic
generation of 3D scenes is of critical importance in the tasks of Artificial Intelligent (AI) robotics training, where the amount of
generated data during training cannot even be viewed by a single person due to the large amount of data needed for machine learning
algorithms. A completely separate, but nevertheless necessary task for an integrated solution, is furniture generation and placement,
material and lighting randomisation. In this paper we propose interior generator for computer graphics and robotics learning
applications. The suggested framework is able to generate and render interiors with furniture at photo-realistic quality. We combined
the existing algorithms for generating plans and arranging interiors and then finally add material and lighting randomization. Our
solution contains semantic database of 3D models and materials, which allows generator to get realistic scenes with randomization
and per-pixel mask for training detection and segmentation algorithms.
Keywords: procedural generation, machine learning, AI training, light-processing, tesselation, modeling
1. Introduction representation of a storey, that describes the types of
rooms, their dimensions and possible neighboring
Since the creation of computers, there has been a chambers that is stored in a separate datafile. The
lingering problem of data storing and creation for various "Layout" is the final blueprint showing all rooms being
tasks. In terms of computer graphics and video games, accurately placed and connected.
there has been a constant need in assets. Although Plan generation can be done either by the means of
nowadays the issue of space is not one of the developers' machine learning [1], or by creating a general list of room
prime concerns, the need in being able to automate asset placement that will be used to randomly assemble the
creation is still relevant. The graphical fidelity, that the result [2]. Depending on the type of layout constructor
modern audiences and applications demand requires a lot the rules can be either strict or be more of a loose
of work on the artists' and designers' front, which costs a guideline.
lot. The automatic generation of 3D scenes is of critical Layout assembly is the most complicated task of the
importance in the tasks of Artificial Intelligent (AI) three. The idea is to take a predetermined area (except for
robotics training, where the amount of generated data some cases) and separate it into subareas based on the
during training cannot even be viewed by a single person previously generated plan. There has been developed a
due to the large amount of data needed for machine lot of different ways to solve this problem over the years,
learning algorithms. A completely separate, but some of which are listed below.
nevertheless necessary task for an integrated solution, is
furniture generation and placement, material and lighting Tiling
randomisation.
A number of industries use virtual reproductions of "Tiling" approach works similar to a toy constructors.
indoor scenes: interior design, architecture, gaming and It represents abstracts the whole area with small equal-
virtual reality are a few. A computer model that sized chunks [3]. These pieces are usually placed in a
understood the structure of such scenes well enough to grid and the process of layout generation comes down to
generate new ones could support such industries by placing the tiles in an appropriate manner. The blocks do
enabling fully or semi-automatic population of indoor not contain any information about rooms and are merely
environments. used for determining the overall shape.
Since development of neural networks algorithms, Most grid based modern computer games [4, 5]
there is a big problem with creating training data sets. nowadays use tiles for constructing environments to
Computer vision and robotics researchers have begun provide immense replay-ability and moderate challenge
turning to virtual environments to train data-hungry to the players. The method is also popular among
models for scene understanding and autonomous independent developers since it allows to easily create
navigation. So indoor scene synthesis could also be used level geometry and graphics on a tight budget. The
to automatically synthesize large-scale virtual training advantage of tiling approach is simplicity and
data for various vision and robotics tasks. universality: tile primitives can be used in other methods
for basic building blocks. The drawbacks are clearly
2. Related work (plan generation) visible structure of resulting model and difficulties with
smaller then tile size objects.
Interior layout generation can be divided into three
main subtasks: plan, layout generation and 3D content
generation. By "plan" we are going to implicate a general
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY
4.0)
Dense packing of two networks: one generates layouts based on random
noise, while the other compares the result to existing
"Dense packing" method makes of use of rooms with
layout to determine whether it is appropriate. The most
predetermined shapes and sizes and attempts to place
significant advantage of machine learning approach is
them within a confined space with preset dimensions in
that it unlike other methods takes social aspects into
an optimal manner [6]. It is based on a class of
account by default. The disadvantage of such approach is
optimization problems with the same name. Rooms can
that training data-sets are required which is fundamental
be represented with tiles for ease of modeling. The
problem.
advantage of ths method is that it based on a well-known
mathematical problem and a lot of different solution have 3. Related work (furniture placement)
been developed for it. It also useful when particular size
of rooms is required. The main disadvantage is that Early works in this field uses simple statistical
algorithm may require to regenerate room if placing them relationships between objects [10]. The next step was a
in the area is impossible and thus final solution may take data-driven scene synthesis: learning priors over object
a lot of time not even taking into account optimization occurrence and arrangement from examples. The first
problem complexity. such method learned separate priors for occurrence and
arrangement [11] but is limited to small scale scenes due
Growth to the limited availability of training data and the learning
methods available at the time. Various related methods
"Growth" algorithm is done in three phases [7]. In
have been proposed, modeling object occurrence directed
preparation the area is divided into small sections like
graphical models combined with Gaussian mixture
tiles, each given an initial numeric value. Then the
arrangement patterns [12], and activity-based object
"seeds" of each section/room are subsequently planted in
relation graphs [13].
positions with highest values, each changing the numbers
With the availability of large dataset of indoor virtual
on the grid based on probabilities of other room types
scenes such as SUNCG [14], new data-driven methods
being adjacent. The rooms are later iteratively grown
have been proposed. [15] uses a directed graphical model
around their origin in a rectangular manner. All unused
for object selection but relies on heuristics for object
space is latter consumed by existing chambers. An
layout. [16] uses a probabilistic grammar to model
advantage is straightforward implementation and
scenes, but also requires data about human activity in
possibility to work with non-rectangular shapes of target
scenes (not readily available in all datasets) as well as
space. The main disadvantage of the algorithm is the
manual annotation of important object groups.
inability to control the size of the rooms.
The most relevant papers at the moment use deep
Inside-out convolutional networks to learn priors over which objects
should be in a scene and how they should be arranged
"Inside-out" approach (known as "growth" in some [17] uses deep CNN that operate on top-down image
sources) is based on placing rooms in an optimal manner, representations of scene and synthesises scenes by
succeeded with creating the outer wall of the house based sequential placing objects. [18] this paper utilise the same
on resulting shape [8]. The act placing rooms can be idea but reduces amount of inference steps.
implemented in different ways. For example, the Training synthetic data from virtual indoor scenes
algorithm can choose the first primary chamber and place quickly becoming an essential source of learning data for
other rooms around it. This algorithms does not restrict computer vision and robotics systems. Several recent
the resulting area size and shape and adjacent rooms can works have shown that indoor scene understanding
be calculated and placed more accurately. However, models can be improved by training on large amounts of
chambers can be placed in an unoptimized manner synthetically-generated images from virtual indoor
resulting in visible gaps and the outer shape can turn out scenes. At the intersection of vision and robotics,
to be unrealistic. researchers working on visual navigation often rely on
virtual indoor environments. Our model can complement
Treemap these simulators by automatically generating new
Other set of methods is performed by representing the environments in which to train such intelligent visual
floor plan as a graph and then recursively dividing the reasoning agents.
rectangle area into subsections until all rooms are placed Recently there was published a novel dataset for
[9]. One of the implementations requires building a training and bench-marking semantic SLAM methods
treemap (hence the name) of the graph from a rectangle [19] based on SUNCG dataset rendered with ambient
area. This approach works reliably on office building. occlusion and photon mapping. Authors of [19] mainly
Disadvantage of algorithms is that it can generate rooms focus on sampling trajectories that simulate motions of a
with weird proportions, which however can be rectified simple home robot.
by using a squarified treemap.
4. Suggested approach
Machine learning approaches The main difference of our work is that our system
"Machine learning" way is centered around building a works not only with separate rooms, but is also capable
Generative Adversarial Network (GAN) that generates of creating the layout of buildings itself and then fill
floor based on a set of predefined layouts [1]. It consists rooms with necessary filling, which can be useful both in
the field of architecture and in generating a large amount 3. Clusters, single rooms and a ladder are then packed
of synthetic data for training. It also supports different within floor space similarly to step 2.
lighting and material models, which makes the result 4. The outer wall is drawn around the structure,
photo-realistic. generated above.
5. The algorithm moves to the next floor. This time, the
Floor (plan) generation floor space is reduced by the space of the ladder
The first route we have taken was a mix of "dense from the previous floor and a hole is placed above it.
packing" [6] and "inside-out" [8] methods. A floor plan Thus, our implementation has further advantages:
was generated based on input rules and room sizes in First it gain highly variable and realistic results, but it is
JSON format, then approximate dimensions for the first more flexible than the Machine Learning approaches
floor were calculated. Based on them the packing since it does not require gathering real data to get realistic
algorithm tried to fill the area with rooms, ending by layouts. Next, we can generate multi-store building with
drawing the outer wall around the layout and proceeding connecting ladders and finally, we support for non-
to the next floor. Instead of tiles it used a rectangle room rectangle room and floor shapes. However, our
of arbitrary size as a primitive. implementation has several restrictions:
1) Plan generation algorithm 1. Adding new rules (feeding as input) for plan
This solution is based on some real-world knowledge generation can be rather challenging due to them
and can be further developed to be more realistic. All the being coupled together
random distributions, used throughout the algorithm, 2. The walls colliding with each other resulted in a
have been assembled into a database manually. visual glitch, that was hard to deal with.
1. The algorithm start with random number of floors 3. In comparison to tile based methods our algorithm
and rooms of different types. has difficulties with adding detailed geometry details
2. It checks whether there are enough rooms of each to architectural elements: while tile based methods
required type (for example bathrooms). Otherwise it efficiently uses baked/precomputed geometry for
goes to step 1. windows, doors and e.t.c, our approach requires such
3. Rooms are randomly distributed across floors, geometry to be generated in the fly automatically for
assigning specific dimensions to them. It also target layouts which is not trivial task generally
ensures that each floor has at least one bathroom. In speaking.
addition, it adds a ladder in each floor, except for the
Furniture layout
last one.
4. The algorithm goes to step 3. For the first approximation of the creation of a virtual
o If some floors are empty interior scene, a rather simple algorithm was selected for
o If the floor above is larger than the floor below the layout of office furniture in the room.
o If some floors do not contain rooms except for 1) Rotation layout algorithm. The idea is simple:
bathrooms. traverse the edges of the office’s perimeter. If the edge is
5. If the generation takes too long it goes back to step 1. shorter than the width of a desk, ignore it - a constraint
6. It goes through room floor-by-floor and randomly relaxed in some of our other algorithms. If it is
links them together. sufficiently long to place a desk, start from one end of the
An example of the resulting file: edge and lay down as many desks as possible along that
"floor 1": { edge. This algorithm is run three times with the only
"bathroom 1": { difference being the order in which the edges are
"X": 8, traversed:
1.1) Clockwise: start from the edge left of the main
"Y": 6
door and run clockwise along the perimeter.
}, 1.2) Counterclockwise: start from the edge right of
"link 1": "bedroom-living room", the main door and run counter-clockwise along the
"living room 1": { perimeter.
"X": 92, 1.3) Sort by length: sort the edges by length and
"Y": 61 process them from longest to shortest.
2) Left-right layout algorithm
}
The left right layout algorithm is very similar to the
2) Layout assembly algorithm rotation algorithm. However, there are two key
The implementation is based upon Blender and uses differences. First, it traverses all the sufficiently long
its API to generate final 3D layout and uses simple auto- edges to the left of the door edge first and then the edges
generated shapes to approximate objects. to the right of the door; left and right are determined by
1. The algorithm goes through the plan we generated taking a line perpendicular to the door edge, running
previously floor by floor. through its center. Second, when laying down desks, it
2. It searches for linked rooms and assembles them into always works from the bottom up so that the resulting
clusters. Each cluster gets its overall space calculated layout tends to be more symmetrical and closer to how
and the chambers then placed based on a simplified our architects tend to lay out desks.
dense-packing algorithm.
The left right layout algorithm is run twice. The first The brute force layout algorithm is roughly two
time we enforce that desks must be completely touching orders of magnitude more computationally expensive and
the wall and cannot hang off a short wall such as a so is only run when the above perimeter-based algorithms
mullion. That is, we ignore all walls that are less than do not sufficiently fill the space.
desk width long (as described above). However, many This algorithm assumes that for each edge, desks are
offices have indentations, columns, and other edge either placed in a line facing the wall (FW) or they exist
conditions resulting in walls less than desk width length. as a set of back-to-back bank of desks extending into the
Consequently, we run the algorithm again but this time space.
we attempt to lay down desks on all edges, irrespective of The question is which edges should be set as back-to-
their length, and we allow a desk to overhang an edge. back? As there are no obvious heuristics, we take a brute
After all the algorithms have been run, the code force approach, trying all possible combinations with
determines the highest capacity found. one, two, or three edges designated as back-to-back and
3) Brute force layout algorithm the remaining edges wall-facing. The examples of our
algorithm can be found at fig. 1.
Fig. 1. Examples of generated basic 3d models of interior layout (left) and our results of our furniture placement algorithm (right)
We also try a variant where, for each edge that is open rendering systems that has a full-fledged industrial
longer than desk width, we consider three options: no level pipeline for creating content (with material
desks, face wall, and back to back. The “no desks” option conversion scripts from other popular rendering systems:
can be useful to allow a bank of desks on other walls to VRay, Mental, Corona), while the rendering engine itself
grow. has high performance and works completely on GPU as
Unfortunately, having three options per wall leads to well in Windows and Linux which is essential for
a combinatorial explosion in which the number of training data sets generation due to large amount of
combinations to try grows very quickly with the number required images and available Linux servers with GPUs.
of walls. Thus, we only use this option if the number of For the purpose of material and lighting
walls longer than desk width is 4 or less because 34=81, randomization we have adjusted the work of the artist for
which is manageable, and 35=243 which is too many for randomized content creation via custom 3ds max plugins
current computational resources. that help artist to setup randomized materials and assign
them to object parts (fig. 2). The artist determines the
Materials, lighting and rendering logic of randomization by
This was actually one of the most time consuming setting special material parameters (fig. 2) which will
problems we have to solve. The serious difficulties are later be exported to SQL-based database. This allows us
concentrated around the fact that modern rendering to limit randomization and make it realistic in
systems use exclusively their own lighting and material average. For example, “Target” parameter (fig. 3, down
models which is inconsistent with others. The realistic and left) acquiring some definite value allows to use this
looking computer graphics content is created for the material only on a specific part of a certain class of
target rendering system and cannot be used directly in models. We didn’t choose any modern AI based or
others. So, there is no such thing as open data bases of automatic methods for 3D content generator purpose
realistic 3d models due to importing/exporting 3D because our main requirement is high degree of control
content from one rendering system to another is not over the generated result and this is a problem for neural
trivial task. Taking in to account the fact of required network based methods. Finally, we have created export
randomization we had to build our own content creation tool that automatically adds all created 3d Models in our
pipeline to adopt existing 3D models. For this purpose, SQL-based database and then created 3D model
we used GPU accelerated open source Hydra Renderer randomizer based on this database (see fig. 4).
[20]. We chose this solution because it is one of the few
Fig. 2. Our randomization material plugin GUI and check for artist in 3ds max. This is essential for randomized results to
be realistic in the target application due to artist could check whether customized distribution works in expected way or
not
Fig. 3. Examples of randomized furniture objects from our database
Fig. 4. Early version of our furniture placement algorithm that was prototyped in Unity
Generating datasets very different scenarios for their experiments each time.
Scripts run different parts of our generator (floor plan,
We used python scripts to run a specific generation furniture layout or picking 3D models from database) and
scenario on the Linux server with 8 K100 GPUs. In fact, connect everything together via files. Our solution is able
this process was not automatic because CV engineers ask to generate approximately 10 images per hour on a single
GPU and thus ~2 days is usually needed to generate full scenario due to CV engineer’s requests are very different
training dataset. in practice. Despite the fact that we can generate full
dataset in 2 days, it takes us about 2 weeks to create new
5. Conclusion and future work scripting scenario and debug it with the full pipeline. So
In this paper we have presented procedural house we believe that using real-time rendering engines for
interior generator that is able to produce interior images training AI in practice is almost useless for today: the
with high quality and speed. The example of generated bottleneck is always in human-beings. Nevertheless,
interiors can be found at fig.5-7. However, we were not going all the way towards realistic 3D generator and
able to build complete industrial-level solution. Our rendering for AI training we would like to share our
system is highly fragmented connecting everything experience and state a set of problems which are, in
together with scripts and files, and the biggest problem is general, not solved for today since this area of research is
that these scripts actually have to be changed (sometimes quite new and thus during our work we got more
mostly created from scratch) for each dataset generation questions than answers.
Fig. 5. Example of render (top left), generated layout (top right), objects masks (bottom left) and object masks from in layout view
(bottom right)
Fig. 6. Another examples of rendered interior layouts and object masks
Fig. 7. Examples of different randomization result for single furniture layout and object masks
Tightly integrated framework to our estimates any object like mesh or image is copied
from 4 to 6 times on average due to loading, storing in
In our case at least 3 different people participate in memory, putting to GPU or saving back to disk in
dataset generation process, they are: (1) artist, who different formats. This format conversion madness makes
should create and check input 3D content, (2) a scripting useless any attempts to speed up rendering in practice.
person who creates scenarios for generator and (3) CV However, we were able to optimize this process for some
engineer who control the result. These people need very cases when we have formed scene library and put it to
different skills/knowledge and we don’t think that the GPU once (i. e. we don’t load new 3D models or images
number of participants can be reduced. However, their to GPU for several subsequent frames). This gives
work could be organized better by putting them into a essential benefit even for our prototype with off-line
unit ecosystem with interface convenient for each rendering, but it is of critical importance for systems
participant. Our 3ds max plugins is the first step towards that’s is going to use real-time rendering. We believe that
this direction, but in general this is an open problem even generation scenario should take care of that problem in
for a restricted area of AI training. combination with some caching system and feeding the
We used python scripts to run a specific generation generated images directly to the neural network on the
scenario on the Linux server with 8 K100 GPUs. In fact, same GPU without storing it to disk (except small part of
this process was not automatic because CV engineers ask them for debug cases). We also suppose that modern
very different scenarios for their experiments each time. denoising algorithms [21] could significantly accelerate
Scripts run different parts of our generator (floor plan, generation process.
furniture layout or picking 3D models from database) and
connect everything together via files. Our solution is able Absence of rendering standards and open 3D
to generate approximately 10 images per hour on a single content
GPU and thus ~2 days is usually needed to generate full
training dataset. Available base of 3D models (like well-known
ShapeNet) is not ready even for rendering: their quality is
Unoptimized data path, memory and disk low and segmentation of parts by materials is rough. In
bottleneck the case of randomizing materials, we need to manually
process them anyway and assign relation to our data base.
In our case different algorithms (for example floor Recent story with SUNCG [19] (which is far from
plan generation and further 3D model construction, or photorealistic quality anyway) confirms the need of the
renderer output and further Natron post process) is open content libraries.
communicated via files. Linux cache and fast SSD on
server amortize this problem, but only a little. According
Procedural approaches Dmitry Zhukov, Sergey Bykov, Olga Barinova,
Anton Konushin. DISCOMAN: Dataset of Indoor
Unfortunately, in this work we did not manage to use
SCenes for Odometry, Mapping And Navigation.
procedural approaches [22] for textures, which could
arXiv:1909.12146. September 2019.
additionally increase the variability of the generated
[20] Frolov V., Sanzharov V., Galaktionov V. Open
content.
Source rendering system Hydra Renderer.
References https://github.com/Ray-Tracing-Systems/HydraAPI
[21] S.V. Ershov, D.D. Zhdanov, A.G. Voloboy, V.A.
[1] Merrell P., Schkufza E., Koltun V. Computer- Galaktionov. Two denoising algorithms for bi-
generated residential building layouts //ACM directional Monte Carlo ray tracing // Mathematica
SIGGRAPH Asia 2010 papers. – 2010. – С. 1-12. Montisnigri, Vol. XLIII, 2018, p. 78-100.
[2] Bengtsson D., Melin J. Constrained procedural floor https://lppm3.ru/files/journal/XLIII/MathMontXLIII-
plan generation for game environments. – 2016. Ershov.pdf
[3] Cerny Green M., Khalifa A., Alsoughayer A., Surana [22] V.V. Sanzharov, V.F. Frolov. Level of Detail for
D., Liapis A., Togelius J. Two-step Constructive Precomputed Procedural Textures // Programming
Approaches for Dungeon Generation. – 2019. and Computer Software, 2019, V. 45, Issue 4, pp.
[4] Firaxis Games Sid Meier’s Civilization VI. - 2016. 187-195 DOI:10.1134/S0361768819040078
[5] Triumph Studios Age of Wonders III. - 2014.
[6] Koenig R., Knecht K. Comparing two evolutionary About the Authors
algorithm based methods for layout generation: Egor Feklisov, student at Moscow State University, faculty
Dense packing versus subdivision. - 2014. of Compute Mathematics and Cybernetics, Computer Graphics
[7] Zifeng Guo, Biao Li Evolutionary approach for and Multimedia lab. E-mail: egor.feklisov@gmail.com.
spatial architecture layout design enhanced by an Mihail Zingerenko, student at Moscow State University,
agent-based topology finding system. - 2017. faculty of Compute Mathematics and Cybernetics, Computer
[8] Martin J. Procedural House Generation: A method Graphics and Multimedia lab. E-mail: liamhizer@gmail.com.
for dynamically generating floor plans. - 2016. Vladimir Frolov, Ph. D researcher at Keldysh Institute of
[9] Fernando M. Automatic Real-Time Generation of Applies Mathematics and Moscow State University.
Floor Plans Based on Squarified Treemaps
Algorithm. - 2010.
[10] L.-F. Yu, S.-K. Yeung, C.-K. Tang, D. Terzopoulos,
T. F.Chan, and S. J. Osher. Make It Home:
Automatic Optimization of Furniture Arrangement.
In SIGGRAPH 2011, 2011.
[11] Matthew Fisher, Daniel Ritchie, Manolis Savva,
Thomas Funkhouser, and Pat Hanrahan. 2012.
Example-based Synthesis of 3D Object
Arrangements. In SIGGRAPH Asia 2012.
[12] Paul Henderson and Vittorio Ferrari. 2017. A
Generative Model of 3D Object Layouts in
Apartments
[13] Qiang Fu, Xiaowu Chen, Xiaotian Wang, Sijia Wen,
Bin Zhou, and Hongbo Fu. 2017. Adaptive Synthesis
of Indoor Scenes via Activity-associated Object
Relation Graphs.
[14] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva,
and T. Funkhouser. Semantic Scene Completion
from a Single D. Image.
[15] V. F. Paul Henderson, Kartic Subr. Automatic
Generation of Constrained Furniture Layouts.
[16] Qi, Siyuan and Zhu, Yixin and Huang, Siyuan and
Jiang, Chenfanfu and Zhu, Song-Chun. Human-
centric Indoor Scene Synthesis Using Stochastic
Grammar
[17] Kai Wang, Manolis Savva, Angel X. Chang, and
Daniel [Разрыв обтекания текста]Ritchie. Deep
Convolutional Priors for Indoor Scene Synthesis. In
SIGGRAPH 2018
[18] Daniel Ritchie, Kai Wang and Yu-an Lin. Fast and
Flexible Indoor Scene Synthesis via Deep
Convolutional Generative Models.
[19] Pavel Kirsanov, Airat Gaskarov, Filipp Konokhov,
Konstantin Sofiiuk, Anna Vorontsova, Igor Slinko,