Neurosymbolic Generation of 3D Animal Shapes through Semantic Controls Vivian Liua , Lydia Chiltona a Columbia University, New York, USA Abstract While there have been many advancements in generative models for 3D design, there has been a limited amount of user interface work in this co-creation domain. The user interface controls and interaction paradigms emerging in this field tend to be unintuitive and hard to standardize, as they are often based upon complicated work related to latent space disentanglement, dimensionality reduction, and other bespoke computational techniques. We demo a user interface that provides intuitive controls for the generation of basic 3D animal shapes. These controls, a set of semantic sliders, map to simple and universal operations such as scale and rotation. By adjusting these parameters over animal limbs, users can semantically guide generative models towards their goals, optimizing the mapping between AI action and user intention. Our user interface operates over a generative model that implements Wei et. al.’s semi-supervised architecture for learning semantically meaningful embeddings [1]. To train it, we collected artist data and generated synthetic data by authoring a parametric animal shape generator. This generator produces low-fidelity, abstracted animal shapes we refer to as metashapes. Our system is an instance of a neurosymbolic generative system, which is when the generative system learns dually from data as well as from symbolic, algorithmic constraints. We conclude with an analysis of the benefits and drawbacks of neurosymbolic generation for 3D animal shapes and the utility of metashapes for user control over AI. Keywords neurosymbolic, generative models, semantic controls, human-AI interaction, 3D user interfaces 1. Introduction ingful signals from latent spaces has become an active area of research. Techniques such 1.1. Related Work as beta-variational autoencoders, InfoGAN [3], and latent space factorization have been de- Prior work has shown that generative net- veloped for the purposes of disentangling the works such as variational autoencoders and latent space. However, all these methods work GANs can learn latent spaces and generatively to varying degrees of success and tend to be produce 3D shapes [2]. However, the dimen- contingent upon the curation and training of sions of these latent spaces are often highly large datasets. entangled and too hyperdimensional to be hu- Recently, Wei et. al. (2020) proposed an ar- man interpretable. Reverse engineering mean- chitecture that challenges latent spaces with a semi-supervised model that learns a seman- Joint Proceedings of the ACM IUI 2021 Workshops, April tic space. A semantic space can generate in- 13–17, 2021, College Station, USA stances of 3D object classes after being trained " vivian@cs.columbia.edu (V. Liu); chilton@cs.columbia.edu (L. Chilton) jointly on synthetic and real data; within this  space, users can carry out 3D shape editing © 2021 Copyright for this paper by its authors. Use permit- ted under Creative Commons License Attribution 4.0 Inter- operations [1]. national (CC BY 4.0). CEUR CEUR Workshop http://ceur-ws.org Proceedings The demo we present implements the ar- (CEUR-WS.org) Workshop ISSN 1613-0073 Proceedings Figure 1: Our user interface generates animal metashapes, which are generic low-fidelity animal shapes. Above are nine animal metashapes arrived at from our user interface. In the top left rectangle, we picture six of the "semantic sliders" used to generate these shapes. These sliders give users control over the following semantically meaningful parameters such as torso length, neck length, neck rotation, tail length, tail rotation, and leg length. These parameters operate on the shape outputs using intuitive mental operations like scale and rotation. chitecture and methods proposed by Wei et. 2. System al on the domain of quadruped animals. Wei et. al. demonstrate the success of their method Our system utilizes point clouds as our 3D using state of the art academic datasets on shape representations, as neural networks have well-established generative task domains such shown success on a number of 3D tasks in- as chairs, airplanes, and human bodies. We volving point clouds from instance segmen- choose to focus on the domain of animals, be- tation [4] to shape editing to interpolation cause animals are one of the most common [2]. classes of 3D assets created. They display a To create a dataset of animals, we first web- high variance in their shapes, which make scraped 228 3D mesh assets made public on them far harder to statistically parameterize Sketchfab by artists [5] and sampled point than easier shapes like human silhouettes. In clouds from these assets. Using open3d, a spite of this variance, animals share struc- Python package for 3D graphics, we rotated, tural similarities that humans can intuitively normalized, and scaled our data to fit within characterize. For example, we can generalize a unit sphere, center around the 3D origin, quadrupeds to be four-legged animals with a and face the same direction. head, neck, and a tail, and teach this abstrac- tion to our system. We refer to this abstrac- 2.1. Metashape Generator tion as a metashape: a generic, low-fidelity shape that can abstractly characterize a class To create a synthetic dataset of 20,000 ani- of 3D objects. We utilize these metashapes mal metashapes in accordance with Wei et. and the aforementioned architecture for our al’s architecture, we utilized Blender’s mod- neurosymbolic generative system. ule for Python scripting to spawn metaballs that coagulate into basic animal shapes. Meta- balls are 3D primitives common to computer graphics software that can additively or sub- are all in accordance with best practices pro- tractively react to one another to form organic- posed by Llano et. al. for explainable compu- looking shapes. Our metashapes were sym- tational creativity systems [8]. bolically parameterized by vector directions, Author interactions with the system effi- limb lengths, and limb rotations. Our inspi- ciently exposed the generative model’s abil- ration for this approach comes from an idea ities and shortcomings. It was moderately long theorized by cognitive science that 3D successful at learning from its semantic su- shapes can be decomposed into more basic pervision and able to produce transformations primitives known as geons [6]. in scale and rotation over torso length, neck We created two versions of the generative length, and tail length. Exploration through model for this demo that work with one con- the user interface produced the varied results sistent user interface. The first is parame- as pictured in Figure 1. However, the model, terized by six semantic axes, the second by in its first iteration with six semantic param- twenty one semantic axes. These parameters eters, failed to completely disentangle the se- corresponded to length, width, height, rota- mantic space. The model showed an on and tion, radius, position, spacing and other la- off ability to control parameters such as neck bels that characterized the primitives corre- rotation, tail rotation, leg length, and tail length. sponding to parts of the animal metashape. By on and off, we mean that while in cer- The exact labels can be found in the appendix. tain clusters of parameters the 3D transfor- These labels supervise the learning of the mations over rotation and scale were accu- semantic space and teach the model 3D oper- rate, in other clusters the sliders produced ations such as scale and rotation over specific suboptimal behavior. For example, one spe- parts. While generating a synthetic dataset cific problem was that the model sometimes from a template is a source of inductive bias, mixed up the posterior extrusion of the neck we attempted to mitigate this by informing with the anterior extrusion of the tail. An- our template with results from Superquadrics, other problem was that the model seemed un- a recent part-segmentation model that was able to capture the extreme ends of our syn- applied successfully to animal meshes [7]. thetic training data (i.e. long legs). More ex- amples of malformed edits are illustrated and captioned in Figure 2. 3. Results We present a user interface designed in the 4. Discussion Unity3D game engine, which abstracts over the generative model and allows users to in- In this section, we discuss the following learn- teract with it in real-time using semantic slid- ings from this demo. ers. These sliders map to the original axes of the semantic supervision and offer explain- 4.0.1. Efficient design space exploration ability for the model’s actions. We addition- ally demo preliminary features for direct ma- Though many generative models can now be nipulation, camera view movement, user his- interacted with in realtime [9, 10], it is often tory, generative AI history, and transparency. still intractable to completely visualize the de- The real-time nature of the interactions, the sign space these models sample over. How- explainability of the model through the se- ever, our design space is low-dimensional; six- mantic sliders, and the concept of memory dimensional in one version, 21-dimensional Figure 2: Cases of bad output animal metashapes for three semantic axes. Left: edits on tail rotation result in changes of neck rotation. The model mixes up posterior extrusion of neck with the anterior extrusion of the tail. Center: Editing tail length leads to a "negative" tail length, which appears as a posterior indent in the animal shape. Right: Maximizing the leg length parameter leads to an outward, noisy extension of legs. The affected areas in the images and parameters are saturated and highlighted respectively. in another. Users have access to the entire de- over the design space with universal concepts sign space and can traverse through it within like scale and rotation, mental operations that minutes. They can find best and worst case are intuitive and shared by everyone. The se- outputs within seconds. The efficient explo- mantic meaning attached to each slider opti- ration that we allow implements the follow- mized the translation of user intention into ing principle established by Gero et. al.: in- AI output. While metashapes are 3D con- teractions between humans and AI are im- cepts, they translated well to a user interface proved when humans can efficiently explore that would be intuitive even to non-technical and understand the global knowledge distri- end users. The minimal interface is a good butions underlying generative models [11]. counterexample to the many heavier gener- ative model user interfaces which encourage 4.0.2. Using metashapes as abstractions users to interact with lower-level complexi- between users and AI ties like data distributions [12] and hyperpa- rameters [10]. We argue that metashapes are an ideal ab- straction between users and AI. In our sys- tem, metashapes gave users ways to operate 4.0.3. Challenges for neurosymbolic 5. Conclusion generation We present a demo of a neurosymbolic gen- We acknowledge that there are limitations to erative system that allows users to create 3D neurosymbolic generation with metashapes. animal shapes with semantically meaningful One of the most significant challenges is find- controls. Additionally, we illustrate how sym- ing the right metashape abstraction to encap- bolically generated metashapes can be a use- sulate a class of 3D shapes. While methods ful abstraction going forward for human-AI to find these abstractions do exist [13, 14, 7], interaction. it is hard to evaluate the correctness of their abstractions. Furthermore, these methods do not often lend to intuitive user interface con- Acknowledgments trols and metaphors. Additionally, by incorporating symbolic con- Vivian Liu is supported by a National Science straints from a template, we put a concrete Foundation Graduate Research Fellowship. The number on the degrees of freedom users may authors thank Panos Achlioptas (@optas) for access. While we set this number to generate open-sourcing the loss function used for model a tractable low-dimensional design space, a training, Felix Herbst (@herbst) for contribut- number that is too small limits users and a ing to the coloring function on the point clouds, number that is too big can overwhelm them. and David O’Reilly for open sourcing a large Even if we gave users an exorbitant number pack of 3D animal assets. of degrees of freedom, users could still very well want to go beyond what is offered and define their own axes. References The neurosymbolic generation in this sys- [1] F. Wei, E. Sizikova, A. Sud, tem could also benefit from understanding the S. Rusinkiewicz, T. Funkhouser, interplay between real artist and synthetic data. Learning to infer semantic param- For example, the synthetic dataset behind our eters for 3d shape editing, 2020. first iteration of the generative model created arXiv:2011.04755. straight legs that pointed downwards. How- [2] P. Achlioptas, O. Diamanti, ever, the generative model altered the param- I. Mitliagkas, L. Guibas, Learn- eter of leg length by extending them not only ing representations and generative downwards but also outwards, reproducing models for 3d point clouds, 2018. patterns present in the shapes of amphibians arXiv:1707.02392. and reptiles. We optimistically believe that [3] X. Chen, Y. Duan, R. Houthooft, the model was able to generalize some nat- J. Schulman, I. Sutskever, P. Abbeel, ural variance and establish some correlation Infogan: Interpretable representation between real and synthetic data. However, learning by information maximizing there were certainly corner cases in which generative adversarial nets, 2016. the output was distinctly “real” or “synthetic”. arXiv:1606.03657. More work could be done to figure out how [4] C. R. Qi, H. Su, K. Mo, L. J. Guibas, to investigate and mitigate overfitting, as the Pointnet: Deep learning on point sets generative model was built on a highly asym- for 3d classification and segmentation, metric composition of datasets. 2017. arXiv:1612.00593. [5] D. O’Reilly, Animals, 2020. URL: http:// tool for understanding shape differ- www.davidoreilly.com/library. ences and variability in 3d model col- [6] I. Biederman, Recognition-by- lections, Computer Graphics Forum components: a theory of human 38 (2019) 187–202. doi:10.1111/cgf. image understanding., Psychological 13799. review 94 2 (1987) 115–147. [7] D. Paschalidou, A. O. Ulusoy, A. Geiger, Superquadrics revisited: Learning 3d shape parsing beyond cuboids, in: Pro- ceedings IEEE Conf. on Computer Vi- sion and Pattern Recognition (CVPR), 2019. [8] M. T. Llano, M. d’Inverno, M. Yee- King, J. McCormack, A. Ilsar, A. Pease, S. Colton, Explainable computational creativity, in: ICCC, 2020. [9] A. Ghosh, R. Zhang, P. K. Dokania, O. Wang, A. A. Efros, P. H. S. Torr, E. Shechtman, Interactive sketch fill: Multiclass sketch-to-image translation, 2019. arXiv:1909.11081. [10] Runway AI, Inc., Runwayml, ???? URL: https://runwayml.com. [11] K. I. Gero, Z. Ashktorab, C. Dugan, Q. Pan, J. Johnson, W. Geyer, M. Ruiz, S. Miller, D. R. Millen, M. Campbell, S. Kumaravel, W. Zhang, Mental Mod- els of AI Agents in a Cooperative Game Setting, Association for Computing Ma- chinery, New York, NY, USA, 2020, p. 1–12. URL: https://doi.org/10.1145/ 3313831.3376316. [12] J. Matejka, M. Glueck, E. Bradner, A. Hashemi, T. Grossman, G. Fitzmau- rice, Dream Lens : Exploration and Visualization of Large-Scale Generative Design Datasets (2018) 1–12. [13] K. Genova, F. Cole, A. Sud, A. Sarna, T. Funkhouser, Local deep implicit functions for 3d shape, in: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2020, pp. 4857–4866. [14] R. Huang, P. Achlioptas, L. Guibas, M. Ovsjanikov, Limit shapes – a A. Appendix A.0.1. Generative model: Iteration 1 Iteration 1 with 6 semantic parameters con- sisted of the following parameters: torso length, neck length, neck rotation, tail rotation, leg length, tail length. A.0.2. Generative model: Iteration 2 Iteration 2 with 21 semantic parameters from the following set of parameters: torso length, (front) torso width , (front) torso height, (back) torso width, (back) torso height, a choice be- tween head type 1 (which emphasizes ear vari- ation) and head type 2 (which emphasizes jaw variation), head size, head feature (ear / jaw) prominence, mouth angle, neck length, neck rotation, neck size, leg length, position of front legs, position of back legs, leg gap, leg angle, tail length, tail rotation, tail radius, tail vari- ance, a choice between a tail that increases in width or decreases, and leg radius.