=Paper=
{{Paper
|id=Vol-2903/IUI21WS-HAIGEN-8
|storemode=property
|title=Neurosymbolic Generation of 3D Animal Shapes through Semantic Controls
|pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-HAIGEN-8.pdf
|volume=Vol-2903
|authors=Vivian Liu,Lydia Chilton
|dblpUrl=https://dblp.org/rec/conf/iui/LiuC21
}}
==Neurosymbolic Generation of 3D Animal Shapes through Semantic Controls==
<pdf width="1500px">https://ceur-ws.org/Vol-2903/IUI21WS-HAIGEN-8.pdf</pdf>
<pre>
Neurosymbolic Generation of 3D Animal
Shapes through Semantic Controls
Vivian Liua , Lydia Chiltona
a Columbia University, New York, USA


                                     Abstract
                                     While there have been many advancements in generative models for 3D design, there has been a limited
                                     amount of user interface work in this co-creation domain. The user interface controls and interaction
                                     paradigms emerging in this field tend to be unintuitive and hard to standardize, as they are often based
                                     upon complicated work related to latent space disentanglement, dimensionality reduction, and other
                                     bespoke computational techniques.
                                         We demo a user interface that provides intuitive controls for the generation of basic 3D animal
                                     shapes. These controls, a set of semantic sliders, map to simple and universal operations such as scale
                                     and rotation. By adjusting these parameters over animal limbs, users can semantically guide generative
                                     models towards their goals, optimizing the mapping between AI action and user intention.
                                         Our user interface operates over a generative model that implements Wei et. al.’s semi-supervised
                                     architecture for learning semantically meaningful embeddings [1]. To train it, we collected artist data
                                     and generated synthetic data by authoring a parametric animal shape generator. This generator produces
                                     low-fidelity, abstracted animal shapes we refer to as metashapes.
                                         Our system is an instance of a neurosymbolic generative system, which is when the generative system
                                     learns dually from data as well as from symbolic, algorithmic constraints. We conclude with an analysis
                                     of the benefits and drawbacks of neurosymbolic generation for 3D animal shapes and the utility of
                                     metashapes for user control over AI.

                                     Keywords
                                     neurosymbolic, generative models, semantic controls, human-AI interaction, 3D user interfaces


1. Introduction                                                       ingful signals from latent spaces has become
                                                                      an active area of research. Techniques such
1.1. Related Work                                                     as beta-variational autoencoders, InfoGAN [3],
                                                                      and latent space factorization have been de-
Prior work has shown that generative net-
                                                                      veloped for the purposes of disentangling the
works such as variational autoencoders and
                                                                      latent space. However, all these methods work
GANs can learn latent spaces and generatively
                                                                      to varying degrees of success and tend to be
produce 3D shapes [2]. However, the dimen-
                                                                      contingent upon the curation and training of
sions of these latent spaces are often highly
                                                                      large datasets.
entangled and too hyperdimensional to be hu-
                                                                         Recently, Wei et. al. (2020) proposed an ar-
man interpretable. Reverse engineering mean-
                                                                      chitecture that challenges latent spaces with
                                                                      a semi-supervised model that learns a seman-
Joint Proceedings of the ACM IUI 2021 Workshops, April                tic space. A semantic space can generate in-
13–17, 2021, College Station, USA
                                                                      stances of 3D object classes after being trained
" vivian@cs.columbia.edu (V. Liu);
chilton@cs.columbia.edu (L. Chilton)                                  jointly on synthetic and real data; within this
                                                                     space, users can carry out 3D shape editing
          © 2021 Copyright for this paper by its authors. Use permit-
          ted under Creative Commons License Attribution 4.0 Inter-   operations [1].
          national (CC BY 4.0).
 CEUR
          CEUR            Workshop
               http://ceur-ws.org
                                                  Proceedings            The demo we present implements the ar-
                                    (CEUR-WS.org)
 Workshop      ISSN 1613-0073
 Proceedings
Figure 1: Our user interface generates animal metashapes, which are generic low-fidelity animal
shapes. Above are nine animal metashapes arrived at from our user interface. In the top left rectangle,
we picture six of the "semantic sliders" used to generate these shapes. These sliders give users control
over the following semantically meaningful parameters such as torso length, neck length, neck rotation,
tail length, tail rotation, and leg length. These parameters operate on the shape outputs using intuitive
mental operations like scale and rotation.


chitecture and methods proposed by Wei et. 2. System
al on the domain of quadruped animals. Wei
et. al. demonstrate the success of their method Our system utilizes point clouds as our 3D
using state of the art academic datasets on shape representations, as neural networks have
well-established generative task domains such shown success on a number of 3D tasks in-
as chairs, airplanes, and human bodies. We volving point clouds from instance segmen-
choose to focus on the domain of animals, be- tation [4] to shape editing to interpolation
cause animals are one of the most common [2].
classes of 3D assets created. They display a      To create a dataset of animals, we first web-
high variance in their shapes, which make scraped 228 3D mesh assets made public on
them far harder to statistically parameterize Sketchfab by artists [5] and sampled point
than easier shapes like human silhouettes. In clouds from these assets. Using open3d, a
spite of this variance, animals share struc- Python package for 3D graphics, we rotated,
tural similarities that humans can intuitively normalized, and scaled our data to fit within
characterize. For example, we can generalize a unit sphere, center around the 3D origin,
quadrupeds to be four-legged animals with a and face the same direction.
head, neck, and a tail, and teach this abstrac-
tion to our system. We refer to this abstrac- 2.1. Metashape Generator
tion as a metashape: a generic, low-fidelity
shape that can abstractly characterize a class To create a synthetic dataset of 20,000 ani-
of 3D objects. We utilize these metashapes mal metashapes in accordance with Wei et.
and the aforementioned architecture for our al’s architecture, we utilized Blender’s mod-
neurosymbolic generative system.                ule for Python scripting to spawn metaballs
                                                that coagulate into basic animal shapes. Meta-
                                                balls are 3D primitives common to computer
graphics software that can additively or sub- are all in accordance with best practices pro-
tractively react to one another to form organic- posed by Llano et. al. for explainable compu-
looking shapes. Our metashapes were sym- tational creativity systems [8].
bolically parameterized by vector directions,       Author interactions with the system effi-
limb lengths, and limb rotations. Our inspi- ciently exposed the generative model’s abil-
ration for this approach comes from an idea ities and shortcomings. It was moderately
long theorized by cognitive science that 3D successful at learning from its semantic su-
shapes can be decomposed into more basic pervision and able to produce transformations
primitives known as geons [6].                   in scale and rotation over torso length, neck
   We created two versions of the generative length, and tail length. Exploration through
model for this demo that work with one con- the user interface produced the varied results
sistent user interface. The first is parame- as pictured in Figure 1. However, the model,
terized by six semantic axes, the second by in its first iteration with six semantic param-
twenty one semantic axes. These parameters eters, failed to completely disentangle the se-
corresponded to length, width, height, rota- mantic space. The model showed an on and
tion, radius, position, spacing and other la- off ability to control parameters such as neck
bels that characterized the primitives corre- rotation, tail rotation, leg length, and tail length.
sponding to parts of the animal metashape. By on and off, we mean that while in cer-
The exact labels can be found in the appendix. tain clusters of parameters the 3D transfor-
   These labels supervise the learning of the mations over rotation and scale were accu-
semantic space and teach the model 3D oper- rate, in other clusters the sliders produced
ations such as scale and rotation over specific suboptimal behavior. For example, one spe-
parts. While generating a synthetic dataset cific problem was that the model sometimes
from a template is a source of inductive bias, mixed up the posterior extrusion of the neck
we attempted to mitigate this by informing with the anterior extrusion of the tail. An-
our template with results from Superquadrics, other problem was that the model seemed un-
a recent part-segmentation model that was able to capture the extreme ends of our syn-
applied successfully to animal meshes [7].       thetic training data (i.e. long legs). More ex-
                                                 amples of malformed edits are illustrated and
                                                 captioned in Figure 2.
3. Results
We present a user interface designed in the        4. Discussion
Unity3D game engine, which abstracts over
the generative model and allows users to in-       In this section, we discuss the following learn-
teract with it in real-time using semantic slid-   ings from this demo.
ers. These sliders map to the original axes of
the semantic supervision and offer explain-        4.0.1. Efficient design space exploration
ability for the model’s actions. We addition-
ally demo preliminary features for direct ma-      Though many generative models can now be
nipulation, camera view movement, user his-        interacted with in realtime [9, 10], it is often
tory, generative AI history, and transparency.     still intractable to completely visualize the de-
The real-time nature of the interactions, the      sign space these models sample over. How-
explainability of the model through the se-        ever, our design space is low-dimensional; six-
mantic sliders, and the concept of memory          dimensional in one version, 21-dimensional
Figure 2: Cases of bad output animal metashapes for three semantic axes. Left: edits on tail rotation
result in changes of neck rotation. The model mixes up posterior extrusion of neck with the anterior
extrusion of the tail. Center: Editing tail length leads to a "negative" tail length, which appears as a
posterior indent in the animal shape. Right: Maximizing the leg length parameter leads to an outward,
noisy extension of legs. The affected areas in the images and parameters are saturated and highlighted
respectively.


in another. Users have access to the entire de-      over the design space with universal concepts
sign space and can traverse through it within        like scale and rotation, mental operations that
minutes. They can find best and worst case           are intuitive and shared by everyone. The se-
outputs within seconds. The efficient explo-         mantic meaning attached to each slider opti-
ration that we allow implements the follow-          mized the translation of user intention into
ing principle established by Gero et. al.: in-       AI output. While metashapes are 3D con-
teractions between humans and AI are im-             cepts, they translated well to a user interface
proved when humans can efficiently explore           that would be intuitive even to non-technical
and understand the global knowledge distri-          end users. The minimal interface is a good
butions underlying generative models [11].           counterexample to the many heavier gener-
                                                     ative model user interfaces which encourage
4.0.2. Using metashapes as abstractions              users to interact with lower-level complexi-
       between users and AI                          ties like data distributions [12] and hyperpa-
                                                     rameters [10].
We argue that metashapes are an ideal ab-
straction between users and AI. In our sys-
tem, metashapes gave users ways to operate
4.0.3. Challenges for neurosymbolic            5. Conclusion
       generation
                                                  We present a demo of a neurosymbolic gen-
We acknowledge that there are limitations to erative system that allows users to create 3D
neurosymbolic generation with metashapes. animal shapes with semantically meaningful
One of the most significant challenges is find- controls. Additionally, we illustrate how sym-
ing the right metashape abstraction to encap- bolically generated metashapes can be a use-
sulate a class of 3D shapes. While methods ful abstraction going forward for human-AI
to find these abstractions do exist [13, 14, 7], interaction.
it is hard to evaluate the correctness of their
abstractions. Furthermore, these methods do
not often lend to intuitive user interface con- Acknowledgments
trols and metaphors.
    Additionally, by incorporating symbolic con- Vivian Liu is supported by a National Science
straints from a template, we put a concrete Foundation Graduate Research Fellowship. The
number on the degrees of freedom users may authors thank Panos Achlioptas (@optas) for
access. While we set this number to generate open-sourcing the loss function used for model
a tractable low-dimensional design space, a training, Felix Herbst (@herbst) for contribut-
number that is too small limits users and a ing to the coloring function on the point clouds,
number that is too big can overwhelm them. and David O’Reilly for open sourcing a large
Even if we gave users an exorbitant number pack of 3D animal assets.
of degrees of freedom, users could still very
well want to go beyond what is offered and
define their own axes.                            References
    The neurosymbolic generation in this sys-
                                                   [1] F. Wei, E. Sizikova, A. Sud,
tem could also benefit from understanding the
                                                       S. Rusinkiewicz, T. Funkhouser,
interplay between real artist and synthetic data.
                                                       Learning to infer semantic param-
For example, the synthetic dataset behind our
                                                       eters for 3d shape editing, 2020.
first iteration of the generative model created
                                                       arXiv:2011.04755.
straight legs that pointed downwards. How-
                                                   [2] P.    Achlioptas,       O.    Diamanti,
ever, the generative model altered the param-
                                                       I. Mitliagkas, L. Guibas, Learn-
eter of leg length by extending them not only
                                                       ing representations and generative
downwards but also outwards, reproducing
                                                       models for 3d point clouds, 2018.
patterns present in the shapes of amphibians
                                                       arXiv:1707.02392.
and reptiles. We optimistically believe that
                                                   [3] X. Chen, Y. Duan, R. Houthooft,
the model was able to generalize some nat-
                                                       J. Schulman, I. Sutskever, P. Abbeel,
ural variance and establish some correlation
                                                       Infogan: Interpretable representation
between real and synthetic data. However,
                                                       learning by information maximizing
there were certainly corner cases in which
                                                       generative adversarial nets, 2016.
the output was distinctly “real” or “synthetic”.
                                                       arXiv:1606.03657.
More work could be done to figure out how
                                                   [4] C. R. Qi, H. Su, K. Mo, L. J. Guibas,
to investigate and mitigate overfitting, as the
                                                       Pointnet: Deep learning on point sets
generative model was built on a highly asym-
                                                       for 3d classification and segmentation,
metric composition of datasets.
                                                       2017. arXiv:1612.00593.
 [5] D. O’Reilly, Animals, 2020. URL: http://   tool for understanding shape differ-
     www.davidoreilly.com/library.              ences and variability in 3d model col-
 [6] I. Biederman,           Recognition-by-    lections, Computer Graphics Forum
     components: a theory of human              38 (2019) 187–202. doi:10.1111/cgf.
     image understanding., Psychological        13799.
     review 94 2 (1987) 115–147.
 [7] D. Paschalidou, A. O. Ulusoy, A. Geiger,
     Superquadrics revisited: Learning 3d
     shape parsing beyond cuboids, in: Pro-
     ceedings IEEE Conf. on Computer Vi-
     sion and Pattern Recognition (CVPR),
     2019.
 [8] M. T. Llano, M. d’Inverno, M. Yee-
     King, J. McCormack, A. Ilsar, A. Pease,
     S. Colton, Explainable computational
     creativity, in: ICCC, 2020.
 [9] A. Ghosh, R. Zhang, P. K. Dokania,
     O. Wang, A. A. Efros, P. H. S. Torr,
     E. Shechtman, Interactive sketch fill:
     Multiclass sketch-to-image translation,
     2019. arXiv:1909.11081.
[10] Runway AI, Inc., Runwayml, ???? URL:
     https://runwayml.com.
[11] K. I. Gero, Z. Ashktorab, C. Dugan,
     Q. Pan, J. Johnson, W. Geyer, M. Ruiz,
     S. Miller, D. R. Millen, M. Campbell,
     S. Kumaravel, W. Zhang, Mental Mod-
     els of AI Agents in a Cooperative Game
     Setting, Association for Computing Ma-
     chinery, New York, NY, USA, 2020,
     p. 1–12. URL: https://doi.org/10.1145/
     3313831.3376316.
[12] J. Matejka, M. Glueck, E. Bradner,
     A. Hashemi, T. Grossman, G. Fitzmau-
     rice, Dream Lens : Exploration and
     Visualization of Large-Scale Generative
     Design Datasets (2018) 1–12.
[13] K. Genova, F. Cole, A. Sud, A. Sarna,
     T. Funkhouser, Local deep implicit
     functions for 3d shape, in: Proceedings
     of the IEEE/CVF Conference on Com-
     puter Vision and Pattern Recognition,
     2020, pp. 4857–4866.
[14] R. Huang, P. Achlioptas, L. Guibas,
     M. Ovsjanikov,       Limit shapes – a
A. Appendix
A.0.1. Generative model: Iteration 1
Iteration 1 with 6 semantic parameters con-
sisted of the following parameters: torso length,
neck length, neck rotation, tail rotation, leg
length, tail length.

A.0.2. Generative model: Iteration 2
Iteration 2 with 21 semantic parameters from
the following set of parameters: torso length,
(front) torso width , (front) torso height, (back)
torso width, (back) torso height, a choice be-
tween head type 1 (which emphasizes ear vari-
ation) and head type 2 (which emphasizes jaw
variation), head size, head feature (ear / jaw)
prominence, mouth angle, neck length, neck
rotation, neck size, leg length, position of front
legs, position of back legs, leg gap, leg angle,
tail length, tail rotation, tail radius, tail vari-
ance, a choice between a tail that increases in
width or decreases, and leg radius.

</pre>