=Paper=
{{Paper
|id=Vol-2282/EXAG_115
|storemode=property
|title=Towards 3D Neural Style Transfer
|pdfUrl=https://ceur-ws.org/Vol-2282/EXAG_115.pdf
|volume=Vol-2282
|authors=Jo Mazeika,Jim Whitehead
|dblpUrl=https://dblp.org/rec/conf/aiide/MazeikaW18
}}
==Towards 3D Neural Style Transfer==
<pdf width="1500px">https://ceur-ws.org/Vol-2282/EXAG_115.pdf</pdf>
<pre>
                                         Towards 3D Neural Style Transfer

                                                   Jo Mazeika, Jim Whitehead
                                                     Computer Science Department
                                                            UC Santa Cruz
                                                      Santa Cruz, CA 95064 USA
                                                     {jmazeika, ejw}@soe.ucsc.edu


                            Abstract                                  models carry a lot of information in their positional data. We
  Neural Style Transfer was first unveiled by (Gatys, Ecker,          chose an existing classification network for point clouds—
  and Bethge 2015), and since then has produced fantastic re-         an analog of the network chosen for the 2D case—and ex-
  sults working with 2D images. One logical extension of this         plored the different layers to identify which would be the
  field would be to move from 2D images into 3D models, and           best to use for style transfer. We additionally show the re-
  be able to transfer a notion of 3D style from one model to          sults of our experimentation with different 3D models.
  another. Here, we provide steps towards both understanding             3D style transfer would be useful as a design and ideation
  what style transfer in a 3D setting would look like, as well as     tool — by creating models that embody various styles, and
  demonstrating our own attempts towards one possible imple-          a generic model, a designer could use a style transfer sys-
  mentation.                                                          tem to create versions of the original object in the different
                                                                      styles. These, while not perfect, would give said designer a
                        Introduction                                  springboard to create finalized versions of these 3D models.
(Gatys, Ecker, and Bethge 2015) introduced a technique for            Additionally, having a system like this would also for in-
transferring the style of one image onto another, exploiting          teresting manipulation of scanned 3D spaces — since laser
properties of convolutional networks to extract the informa-          scanners produce point cloud models, being able to perform
tion required. Since then, the technique has been refined and         style transfer on the results could allow for interesting be-
expanded upon, with impressive results. A survey paper of             spoke spaces.
the field (Jing et al. 2017), containing references up through           In this paper, we provide the following contributions:
March 2018, has over one hundred different papers listed.             • An analysis of how to represent style in a 3D space
Because of this interest, it is natural to wonder where this
                                                                      • An implementation of a style transfer system using a net-
technique could be applied to next. Here, we describe our
                                                                         work designed for 3D models
attempts to apply the techniques underlying style transfer to
the domain of 3D point cloud models.                                  • An analysis of our results and what future work will be
   At its core, 2D style transfer works by optimizing a noise            required to make a system like this a reality.
vector to minimize a function describing its distance from
both the style and content images at different layers within                                Related Work
a neural network. In the original paper, the authors utilize          Deep Learning and Point Clouds
VGG-19, a network trained to classify images in the Ima-              Deep learning systems operate natively over large arrays of
geNet data set, and by comparing the                                  floating point numbers, which makes point clouds a natural
   Once the network is chosen, the system’s designer picks a          way of encoding 3D models for neural processing — unlike
particular set of layers of the network, and gets the values of       polygonal models, point clouds exist as a simple list of 3D
the input images as well as the output image at each of the           points in space. Additionally, several common techniques
different layers. From there, the distances between those val-        for analyzing real world 3D spaces map those spaces into
ues are computed and turned into a single loss value, which           point clouds, making them an important target to understand
is used to compute the gradient by which we transform the             and process.
noise image, and we continue the process. Once a specified               In our work here, we focused on examining PointNet (Qi
number of iterations is completed, the noise vector (which            et al. 2016), a network designed primarily to classify point
conveniently is chosen to be a N x M x 3 vector) can be in-           clouds into several different object classes. Since its publica-
terpreted as a N x M bitmap and rendered into a image using           tion, it has seen a number of follow-up works, including one
standard image libraries.                                             extension by the original authors (Qi et al. 2017) as well as
   To implement this for the 3D case, we first explored what          others that extended it beyond preselected 3D models (Zhou
 style transfer in this domain would mean, given that 3D              and Tuzel 2017).
                                                                         PointNet is not the only system family for neural analysis
                                                                      of point clouds, however. It utilizes and benchmarks itself
against the ModelNet dataset (Engelmann et al. 2017) of 40          observer. Additionally, 3D models often have critical, func-
different classes of point clouds for systems to distinguish        tional parts that impact how they are perceived—an airplane
between. ModelNet’s website1 lists over thirty different sys-       without wings could be nearly impossible to identify as an
tems and their relative accuracy rankings on their benchmark        airplane without some other strong ‘airplane-like’ features.
dataset.
                                                                       Given that our models were point clouds with no texture
3D style systems                                                    information, the only changes we could attempt to make
                                                                    were modifying the positions of the different points in 3D
While neural style transfer has not been implemented in this        space. This limited our possible options for stylistic features
sense, there are a number of other systems that attempt to          to consider during the transfer process, however we came up
transfer the style of one 3D object onto another. For instance,     with several different visions of what 3D style transfer could
(Zheng, Cohen-Or, and Mitra 2013; Lun et al. 2016) focus            look like.
on handling style by breaking each object apart into indi-
vidual components and reassembling them based on their                 First, we have the conceptually simplest version which we
structural similarity. In contrast, (Ribeiro et al. 2003) looks     call exemplar style transfer. In this, the target object is mod-
at style as a conceptual blend, using an external knowledge         ified to look like the target object, but purely on a cosmetic
base instead of looking purely at the structural similarity to      level without interfering with its functionality. For instance,
build the comparisons. (Hu et al. 2017) focuses on identify-        an umbrella could be blended with a sword by transforming
ing decorative elements that convey stylistic features across       the handle of the umbrella into the hilt of the sword, or a
an array of different objects. (Ma et al. 2014) takes an anal-      glass mug could be molded to look like any number of exist-
ogy approach to style, starting from a set of example mod-          ing object. In this way, the target object resembles the style,
els (one initial model, with one structural variation and one       but maintains all of its own properties.
stylistic variation) and then extrapolating those variations           Secondly, we have the converse of the above, which we
into a new model. In contrast, (Kalogerakis et al. 2012)            call functional part transfer. Here, instead of transferring the
learns a probabilistic model of an object class, allowing it        cosmetic features of the style object, we instead transfer the
to generate different models that exist within that space, and      functional parts of said object. For instance, we could take
in that way defines a style of model.                               an skateboard model and add strings and frets from a guitar
                                                                    along its body as our way of blending the two objects. Again,
                             3D styles                              the target object still maintains most of its own identify, but
When we look at the results from (Gatys, Ecker, and Bethge          takes on features of the style object to produce the blend.
2015), we can quickly understand what aspects of an image              Next, we have mix-in style transfer where the two models
the algorithm understands as being its style. These aspects         are joined together, creating a model that features pieces of
include the color palette, the length and shape of the strokes      both attached together. For instance, we could have the blend
used in painting, as well as some parts of the content - an         of an airplane and an apple where the airplane has a stem
image generated from van Gogh’s Starry Night retains its            coming out of it, or the apple has wings and the tail of the
bright stars in various portions of the sky. The other image -      airplane.
the content image - provides most of the underlying structure
and direction of the image itself; the geometry of that image          Another possibility is the part-blend style transfer where
is preserved and we see that image as ”stylized” in the style       the parts of one model are made out of similar parts of the
of the other.                                                       other. In this way, we focus on transferring the local aspects
   However, for 3D models, the question of how to mod-              of style, while keeping the broader model’s structure fun-
ify one model to match the style of another is a non-trivial        damentally the same. One example of this would be Lego
question, especially given that it’s unclear what a 3D model        models — while the boarder structure of the model is kept,
is comprised of in general. A 3D model’s geometry could             the local features of the model now must conform to Lego
be constructed of a number of triangles or a point cloud,           bricks.
to pick two examples. Then, some models feature textures               Our final concept for style transfer would be to create an
while others don’t. With all of these differences in how 3D         abstract style definition and apply that directly. This is prob-
models are encoded, it is unclear how we would transfer             ably the most complex approach to implement, as it requires
style between two sufficiently different models.                    the system to be able to translate the abstract style informa-
   Furthermore, even if we have two similarly structured            tion into the transformations that the model should undergo;
models, we need to figure out how to transfer styles between        however, this does allow for the most control over what el-
the two models. In this paper, our system was implemented           ements of style are actually transfered onto the new model.
to follow (Gatys, Ecker, and Bethge 2015) fairly directly,          This is most similar to the work on Lego models done in
however this is only one way that the style of a 3D model           (Mazeika and Whitehead 2017).
could be interpreted. Given the nature of 3D models, mak-
ing adjustments to their structure - moving a few things here          These are not meant to be conclusive; rather these form a
or there, adjusting the size of a small part of the model, etc. -   possibility space of how 3D style transfer could occur. For
can have a large impact in how the model is interpreted by an       the purposes of aligning with the 2D case, we chose to focus
                                                                    our system on the mix-in style, using the neural networks to
   1
       http://modelnet.cs.princeton.edu                             find the relationships between the models and their parts.
Figure 1: A style transfer system diagram. The center box represents the point net network, which is comprised of several
intermediate stages (a more detailed diagram can be found in (Qi et al. 2016)). We select particular layers from the network use
to compute our loss function.


          Style Transfer Implementation                            the square-mean distance between the layers for the content
For our system, we utilized PointNet, an existing network          image and the sum of the difference between the Gram Ma-
designed to classify 3D models into different categories. In       trices of the style image and the output.
the original style transfer paper, the authors chose VGG-19,          For our 3D version, we investigated using these loss func-
a network for classifying 2D images, and so we chose an            tions, but also other ones found in our exploration of met-
analogous network for our system. PointNet operates in two         rics designed for Point Clouds. To this end, we included the
different modes — one that classifies point clouds into dif-       Hausdorff distance and the Chamfer distance as metrics to
ferent classes, and one that learns to segment point cloud,        consider for our loss functions. The Hausdorff distance is
labeling each point of a cloud with a domain specific label        defined in our system as
(ie the wings versus the body of an airplane, or the wheels           dH (X, Y ) = max{max min d(x, y), max min d(x, y)}
                                                                                             x∈X y∈Y               y∈Y x∈X
versus the handlebars of a motorcycle). PointNet’s classifi-
cation network is structurally similar to VGG-19, comprised        where d(x, y) is the euclidean distance between the vectors
of multiple convolutional layers, but it also includes two par-    x and y. Similar, we define the Chamfer distance in our sys-
ticular layers: one that learns a three-by-three matrix product    tem as             X                  X
(intended to account for rotations in the model) and one that           dC (X, Y ) =       min d(x, y) +      min d(x, y)
is intended to learn a permutation function (as point cloud                                     y∈Y                  x∈X
                                                                                          x∈X                  y∈Y
data is invariant to the input order).                             using the notation from above. In English, the Hausdorff dis-
   The segmentation network in PointNet uses the classifica-       tance looks at the minimum distance from each vector to any
tion network as its basis, and adds on an extra four layers for    vector in the other set and returns the overall maximum of
producing the output labels. This version of PointNet has          these values, while the Chamfer distance givens the sum of
a major drawback for our purposes: it must be trained on           these minimal distances instead.
each individual class of model, rather than on all classes at         Both of these metrics look for outliers within the space —
once. While that means that we can’t use a trained segmen-         the Hausdorff distance is maximized when a single vector
tation network for general style transfer, it makes sense that     is far away from ones in the other set, while Chamfer looks
we could use it as a tool for modeling individual classes of       more at the average distance for all of the vectors in both
objects.                                                           sets. Importantly, they are also both differentiable, which is
   Once we have a fully trained model, we then need to pick        a strict requirement for our loss functions.
the set of layers to consider for computing the loss func-            Finally, one of the key components of any system that
tion. When a model is evaluated by the network, it considers       hopes to take multiple components into a single value is
more abstract representations of the model at each of the          weighting. Since different functions over different layers can
subsequent layers. In the original style transfer system, the      produce values on wildly different orders of magnitude, nor-
authors considered earlier layers for the structure and later      malizing the values to both balance the different components
layers for the style of the various images. Here, we use a         against each other, and to provide some bias towards either
similar metric for picking the layers to use for our system,       the structure or the style.
and show the results of exploring this space later.
                                                                   Implementation
Loss Functions
                                                                   While PointNet is publicly published in Tensorflow2 , we
Once the layers have been chosen, it then falls to pick a loss     chose to implement our system in Keras instead, due to
function to evaluate how far the generated image is from the
                                                                      2
inputs on those particular layers. To do this, the system uses            https://github.com/charlesq34/pointnet
                                                                   Figure 3: A model and its twisted counterpart
                                                                   (3rd convolutional layer using the Gram loss function)


                                                                   ble candidate to consider. And, finally, a few layer function
                                                                   pairs simply reproduced the initial model instead, with some
                                                                   small variations due to the fuzziness of the optimization pro-
                                                                   cess.
                                                                      This was disappointing to see; while we could reproduce
Figure 2: The common result of optimizing on arbitrary             the initial model in the very early layers, we had hoped to
layer-function pairs                                               see the noise clouds showing abstract features of the partic-
                                                                   ular model class. This may have resulted from our choice
                                                                   of network — since PointNet attempts to learn how mod-
familiarity. We used an existing Keras implementation3 as          els are structured, regardless of how the points are arranged
a reference for our implementation, and we used the style          and permuted, it could be the case that the abstract features
transfer code provided in (Chollet 2017) as the starting point     are being represented in a way that in imperceptible to hu-
and reference for our implementation. Our reimplementa-            mans. In the 2D case, we’re able to see patterns and varia-
tion required us to train PointNet ourselves, and we did so        tions between different images, as those appear as variations
using the ModelNet data set provided by (Wu et al. 2015).          in color, but the 3D case solely considers the positions of the
For the labeled segmentation data, we used the data set pro-       different points, meaning that there might be relations that
vided by (Yi et al. 2016). Our overall accuracy results for        are not actually the ones we want to express being learned.
training were comparable with the original paper.
                                                                   Classification Model Results
                             Results                               With our results from the previous exploratory work, we at-
                                                                   tempted to blend models with one of the layer-loss pairs that
Exploration of Layers and Loss Functions                           simply reproduced the initial model; specifically the Cham-
As we began our exploration of style transfer for 3D models,       fer loss on the third convolutional layer of the classification
we first began by optimizing our input against a single layer      model. One of the key features of style transfer is balancing
to see how the different layers respond to different loss func-    the different loss values against each other — most of the
tions, hoping to identify layers that correspond to stylistic or   2D systems feature a weighing system in which the differ-
structural features to optimize for. Our intuition comes from      ent sides of the function (the similarity to the style and the
the 2D example, where we can extrapolate what the individ-         similarity to the initial content) are balanced against each
ual layers have learned by optimizing for them directly, as        other to get the desired blend.
shown in (Rupprecht 2017).                                            To this end, we took two models, and blended them at
   We considered all of the convolutional layers of the clas-      different ratios between the different sides — we fixed the
sification model of PointNet and used a fixed random noise         content weight at 1, and scaled the style weights through
input with the Squared Sum, Gram Matrix, Hausdorff and             different powers of 2. We used the third convolutional layer
Chamfer loss functions.                                            with Chamfer loss for the content and Gram loss for the style
   However, most of our layer-function pairs lead to fuzzy         as our basis for our loss function. While the Chamfer loss
noise-spheres, as seen in Figure 2. On the other hand, we          reproduced the original model at that layer, the Gram loss
did have a few pairs that led to interesting results. Some of      created a twisted version of the model, as seen in Figure
the lower layers simply produced a twisted version of the          3. Here, we hypothesized that the twistedness was due to
original model (see Figure 3). Here, the model is vastly de-       having a slightly more abstract understanding of the original,
formed and rotated upside down, which we took as a possi-          and that we would see this conveyed in the style transfer
                                                                   process.
   3
       https://github.com/garyloveavocado/pointnet-keras              The results of blending an airplane model and a model of
Figure 4: A ground-truth segmentation for a motorcycle
                                                                   Figure 5: The results of segmentation style transfer on the
model
                                                                   motorcycle model with different weights

a woman in a dress are shown in Figure 6. At the extreme
values (215 and 2−4 ), our system effectively optimizes for
one model or the other, as the loss value is dominated by
the output’s distance from that model. In the middle, we see
blends in of the two models; however, this occurs merely on
the level of the points’ actual distances from each other —
no abstract qualities are carried between the two models.
   One of the clear qualities here is that the blends are con-
tained in, effectively, the intersection in space between the
two models. This suggests that one of the big issues here is
orientation, since the airplane lays flat while the person is
standing upright. Additionally, using other layers turns the
output into a fuzzball, so this proved to a dead end.

Segmentation Model Results
Finally, we attempted to utilize the segmentation version of
PointNet to transfer the underlying model class’s style onto
an unrelated model. To do so, we first trained the network to
perform segmentation on airplanes. We then took a motor-
cycle model (seen in Figure 4) and assigned each one of its
labels a particular label from the airplane label set (i.e., the
body of the motorcycle corresponded with the body of the
airplane; the wheels corresponded with the engines, etc.).
From here, we built our loss function to optimize for the
original structure of the motorcycle against each point re-
ceiving its correct label under the classification system. The
intent here was to create a new motorcycle such that each
one of its components was interpreted as part of an airplane.
   Unfortunately, again, the results were suboptimal. We           Figure 6: Two models (airplane and woman in a dress)
again tested various weights of style and content, as seen in      blended at different ratios of style weight and content
Figure 5, but optimizing for style merely leads to the model       weight. Labeled values are the style weight compared to a
expanding and not, as hoped, being reshaped.                       content weight of 1.

                        Discussion
Since our experiments produced negative results, the ques-
tion then becomes ”why?” — what led to our results, and
what can we learn from these experiments? Fundamentally,
there are two high-level cases for these failures: either the
system had some errors of design, or there are theoretic fac-         However, the deeper issue remains of what style even
tors that prevent this approach from working at all.               means for point clouds. When we look at style transfer for
   Fundamentally, style transfer in the 2D domain relies on        images, we see clearly what aspects of style are picked up by
being able to detect the edges (for the content image) and         the system—colors, line curviness, patterns, etc—and how
the color palette (for the style image) of an image in a way       those are applied to the content image. With 3D style, it is
that can be optimized for. Doing so requires the system to         unclear what features a neural network would pick up on—
examine relationships between spatially related pixels, and        would it key into the shapes of the different parts, the differ-
then shift a noise vector until it grows closer and closer to      ent relative positions of things, the orientation of the model
these features. This works well for images, since the images       itself, or the patterns of points within the model itself?
are represented as a 2D spatial matrix (with a third dimen-
sion representing color values). On the other hand, the point                 Conclusions and Future Work
clouds are merely a list of different points in space. While       In this paper, we attempted to apply the neural style transfer
PointNet itself attempts to compensate for the input order         techniques that have seen so much success in the domain of
(by learning a reordering function about halfway through the       2D images to the domain of 3D point clouds. Despite this,
system), this may cause the issues with the loss function.         our system was unable to perform style transfer as is seen
   Additionally, one of the common stylistic features of the       in the 2D style transfer systems. While negative results do
2D style transfer images (independent of the style image it-       not necessarily provide conclusive evidence, our exhaustive
self) is a certain amount of fuzziness in the output—edges         exploration provides a strong argument against this being
often have a certain amount of fuzziness to them, a result of      possible.
the optimization process. But, because of the overall shape           However, there is room to explore style transfer within
of the output and the color patches included, humans are still     point cloud models. One idea would be to use a segmenta-
able to recognize the contents of the image without much is-       tion model (such as PointNet) to design a parts-based style
sue. However, in our 3D case there is no color channel that        transfer system, similar to (Lun et al. 2016). Additionally,
we can rely on for context. As such, the only information we       exploring other neural networks and other ways of encoding
have to work with in the system are the locations of the dif-      style in a 3D space could provide interesting results as well.
ferent points. Because of this, the results are highly sensitive
to points being moved around, which means that the fuzzi-                                  References
ness that results from the style transfer can lead to results      Chollet, F. 2017. Deep learning with python. Manning
that are impossible for humans to interpret correctly.             Publications Co.
   For neural style transfer to be possible for 3D point
                                                                   Engelmann, F.; Kontogianni, T.; Hermans, A.; and Leibe, B.
clouds, several adjustments would need to be made. First
                                                                   2017. Exploring spatial context for 3d semantic segmenta-
of all, a different neural network would likely need to be
                                                                   tion of point clouds. In Proceedings of the IEEE Conference
considered. One of the key aspects of PointNet is that the
                                                                   on Computer Vision and Pattern Recognition, 716–724.
network tries to learn a permutation function that allows it
to detect the models correctly regardless of the order their       Gatys, L. A.; Ecker, A. S.; and Bethge, M. 2015. A neural
points are in. This factor may be part of the issue we have in     algorithm of artistic style. arXiv preprint arXiv:1508.06576.
producing visible results, and other networks on point clouds      Hu, R.; Li, W.; Kaick, O. V.; Huang, H.; Averkiou, M.;
may feature clearer results.                                       Cohen-Or, D.; and Zhang, H. 2017. Co-locating style-
   Secondly, a different set of loss functions would need to       defining elements on 3d shapes. ACM Transactions on
be considered. While we tried to include relevant loss func-       Graphics (TOG) 36(3):33.
tions to point clouds in general, it might be the case that        Jing, Y.; Yang, Y.; Feng, Z.; Ye, J.; Yu, Y.; and Song, M.
others exist that would work better with the particular net-       2017. Neural style transfer: A review. arXiv preprint
work or point clouds in general, and other functions would         arXiv:1705.04058.
have produced intelligible results. We chose our functions         Kalogerakis, E.; Chaudhuri, S.; Koller, D.; and Koltun, V.
based on metrics that were used previously in style transfer       2012. A probabilistic model for component-based shape
and known functions for examining point clouds, and ran an         synthesis. ACM Transactions on Graphics (TOG) 31(4):55.
exhaustive search over.                                            Lun, Z.; Kalogerakis, E.; Wang, R.; and Sheffer, A. 2016.
   Finally, in this work, we only considered one layer at a        Functionality preserving shape style transfer. ACM Trans-
time — most of the existing work on style transfer looks at        actions on Graphics (TOG) 35(6):209.
multiple layers at once and averages the loss between all of
them. This provides more consistent results and the gener-         Ma, C.; Huang, H.; Sheffer, A.; Kalogerakis, E.; and Wang,
ated images benefit from these multilayer views. However,          R. 2014. Analogy-driven 3d style transfer. In Computer
in the 2D case, it is clear what sorts of features are captured    Graphics Forum, volume 33, 175–184. Wiley Online Li-
by the different layers of VGG-19; our attempts to visual-         brary.
ize the PointNet network were inconclusive. As we ran an           Mazeika, J., and Whitehead, J. 2017. Solving for bespoke
exhaustive set of experiments over the layer-loss pairs, it        game assets: Applying style to 3d generative artifacts.
is unlikely that we missed any layer that would drastically        Qi, C. R.; Su, H.; Mo, K.; and Guibas, L. J. 2016. Pointnet:
change our results, and reduplicating layers is the equivalent     Deep learning on point sets for 3d classification and segmen-
of doubling the value of the weight.                               tation. arXiv preprint arXiv:1612.00593.
Qi, C. R.; Yi, L.; Su, H.; and Guibas, L. J. 2017. Pointnet++:
Deep hierarchical feature learning on point sets in a metric
space. arXiv preprint arXiv:1706.02413.
Ribeiro, P.; Pereira, F. C.; Marques, B. F.; Leitão, B.; Car-
doso, A.; Polo, I.; and de Marrocos, P. 2003. A model for
creativity in creature generation. In GAME-ON, 175.
Rupprecht, P. 2017. Understanding style transfer. https:
//ptrrupprecht.wordpress.com/2017/12/
05/understanding-style-transfer/.
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.;
and Xiao, J. 2015. 3d shapenets: A deep representation for
volumetric shapes. In Proceedings of the IEEE conference
on computer vision and pattern recognition, 1912–1920.
Yi, L.; Kim, V. G.; Ceylan, D.; Shen, I.-C.; Yan, M.; Su, H.;
Lu, C.; Huang, Q.; Sheffer, A.; and Guibas, L. 2016. A
scalable active framework for region annotation in 3d shape
collections. SIGGRAPH Asia.
Zheng, Y.; Cohen-Or, D.; and Mitra, N. J. 2013. Smart
variations: Functional substructures for part compatibility.
In Computer Graphics Forum, volume 32, 195–204. Wiley
Online Library.
Zhou, Y., and Tuzel, O. 2017. Voxelnet: End-to-end learning
for point cloud based 3d object detection. arXiv preprint
arXiv:1711.06396.

</pre>