<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Models Classification with Use of Convolution Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karyna Khorolska</string-name>
          <email>k.khorolska@knute.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bohdan Bebeshko</string-name>
          <email>b.bebeshko@knute.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alona Desiatko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitaliy Lazorenko</string-name>
          <email>v.lazorenko@knute.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kyiv National University of Trade and Economics</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>0</volume>
      <fpage>1</fpage>
      <lpage>03</lpage>
      <abstract>
        <p>Nowadays the most urgent challenge behind computer image recognition has become a problem of three dimensional reconstruction of the world or environment from the two dimensional representation like 2D images. Such a tendency is especially obvious in the example of architecture companies' requirements. In common one has no access to the readyto-use 3D model. Therefore one has to somehow recognize and reconstruct a three dimensional model or object based on its two dimensional representation from different viewports. Novelty can be an approach that allows one not only to use 2D images for feature extraction and classification but also ready 3D models or objects of the same type. In such circumstances one can train neural network recognition distinguishing features as voxel occupancy or surface curvature. It became obvious that most of the scientists and researchers that are researching the processes in this area commonly develop algorithms of 2D image recognition and extraction features of the same 2D images for further usage in image recognition systems with the aim to classify them and reconstruct the 3D model. Therefore one can build a classifier of three dimensional shapes using not only two dimensional images but also 3D models. Therefore in this paper we propose a new multi presentational 3D model classification framework. Precisely, in this work for the cross presentational information multiple two dimensional images of a three dimensional model as input was used, as well as the extraction of the high level cross presentational information using multiple 2D CNN in separated mode. convolutional neural network 2D, 3D, image recognition, models classification, convolutional neural network, multi view</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays the most urgent challenge behind computer image recognition has become a problem of
three dimensional reconstruction of the world or environment from the two dimensional representation
like 2D images. Such a tendency is especially obvious in the example of architecture companies'
requirements. In common one has no access to the ready-to-use 3D model. Therefore one has to
somehow recognize and reconstruct a three dimensional model or object based on its two dimensional
representation from different viewports. Therefore model or object classification become a next critical
problem in computer image recognition. Classified objects can be very helpful in future for tasks like
3D reconstruction, object detection or object tracking. Traditional approaches to classify objects or
models is to extract features (e.g.: SURF, HOG) or to descript and then classify (e.g.: Bayes approach,
SVM).</p>
      <p>One can outline three main ways of input used to perform three dimensional model classification:
3D voxel, point cloud and multi presentation image.</p>
      <p>Concluding above mentioned, it became obvious that most of the scientists and researchers that are
researching the processes in this area commonly develop algorithms of 2D image recognition and</p>
      <p>2022 Copyright for this paper by its authors.
extraction features of the same 2D images for further usage in image recognition systems with the aim
to classify them and reconstruct the 3D model.</p>
      <p>
        Novelty can be an approach that allows one not only to use 2D images for feature extraction and
classification but also ready 3D models or objects of the same type. In such circumstances one can train
neural network recognition models to utilize 3D models, especially their distinguishing features as
voxel occupancy or surface curvature. Therefore one can build a classifier of three dimensional shapes
using not only two dimensional images but also 3D models. Moreover such approaches become easy
to achieve according to the emergency of a large variety of different 3D object repositories (e.g.:
Shapeways, TurboSquid and others). Some authors [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in their scientific studies overviews such an
approach by presenting a 3D object classifier built on a DNN architecture that in turn was trained on
the voxel models. Basically it was a classifier processing 3D objects to recognize and build up three
dimensional shapes. Such an approach is totally consistent. Nevertheless, in this paper another way is
proposed: to build a 3D object classifier by utilization of the two dimensional images that are renderings
of the 3D model. Using such an approach one can greatly increase performance and outstand the
approach with direct 3D representations use. Looking ahead, a convolutional neural network (CNN)
that was trained on a N-size set of prerendered 3D model presentations with only a single presentation
at a test iteration increases accuracy of the category recognition comparing to the model proposed by
authors in their scientific paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that was trained directly on the 3D objects. It should be also noted
that by increasing the amount of presentations used at the test iteration one can increase performance,
but it requires more computational resources. Although it became obvious that one can concatenate
information of a 2D presentation range into a single descriptor using multi-view CNN. Such a descriptor
in turn contains such an amount of the information for classification purposes as the full collection of
view-based descriptors of the model. In fact it also amplifies efficiency of the retrieval while using both
a similar 3D model or a simple picture regardless if it is digital or hand-drawn, without any resorting
that tremendously slower methods that are based on pair comparisons of image.
      </p>
      <p>
        During the last decades Deep Convolutional Neural Networks (DCNN) became popular and took
huge advances due to their ability for image classification. Dozens of images are classified using DCNN
into thousands of possible categories. Opposed to the single presentation DCNN milti presentation CNN
states for learning convolutional models in the parametric settings where a range of presentation data is
available. In other worlds - it integrates different presentations’ discriminative information, which in
turn produce much more exhaustive representation for the sequential learning process. If one does not
consider the above mentioned scientific paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which proposes shape descriptors from the
voxelbased view of a model through 3D convolutional neural networks, previous researches in field of 3D
model descriptors were largely pioneer sketches according to a particular geometric property of the
model surface or volume. As an example, model shapes can be interpreted as a histogram or set of
model features it was constructed of like distance, angels, triangles and normals [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] that are in turn
gathered in the predefined surface points [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as well as properties of functions defined by volumetric
grid [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], local model measured diameters relative to the surface points [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], kernel signatures plotted
on the polygon meshes [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] or SIFT and SURF feature descriptor extensions for the voxel grids.
Therefore any development of the classifiers or any other machine learning models with teachers on the
basis of previously mentioned model descriptors defines a range of accompanying problems to be
solved. Biggest issue is that the size of well organized repositories with labeled 3D objects is much
more limited than image datasets available for research purposes. To make it clear - Model Net
repository counts nearly 170 thousand models, opposed the Image Net database [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] stores millions of
labeled images. Next issue is concerning 3D model descriptors themselves, 3D model descriptors are
very multidimensional that results in overfitting. As an another example, which is mostly common in
computer graphics setups, can be the Light Field descriptor [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Light Field descriptor extracts Fourier
and geometric parameters set from silhouettes of the object rendered in different viewports. However,
the object's silhouette itself can be decomposed into parts and then be represented in the form of the
acyclic graph. Authors [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] defined resemblance parametric metrics based on the curves that are
matching and therefore grouped similar presentations, that are called 3D model aspect praphs [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. In
study [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] authors attempted to compare human drawn sketches with line drawings of 3D models
created from several different presentations based on local Gabor filters. In [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] the author proposed to
use Fisher vectors on SIFT model features for representing shapes in human sketches. Nevertheless,
mentioned descriptors are mostly narrowly designed and therefore do not fit across different domains.
      </p>
      <p>
        Concerning the view-based methods, they translate 3D models into a range of 2D representations
and then use the features extracted from the two dimensional classification CNN. As an example Multi
View Convolutional Neural Network (MVCNN) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] uses a range of two dimensional representations
of the rendered 3D model for its input. Nevertheless view-based descriptors have a range of desirable
characteristics: they are relatively low-dimensional in comparison to above mentioned, moreover they
are efficient in the course of evaluation process, and solid in terms of 3D models representation artifacts,
such as holes, flipped polygon mesh tessellations or uproarious surfaces. However the rendered models
representations can be directly matched with any other two dimensional picture, image, sketch or even
silhouette. Early research of the view-based model was demonstrated in the scientific paper [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Proposed model was able to recognize models by comparison of its appearance in parametric
eigenspaces built from large sets of three dimensional models rendered to two dimensional images in
various poses, angles and under different illuminations. According to scientific paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed
decomposition method outsmarts MVCNN but only due to increased computational resource
consumption at the training stage. Such decomposition approaches utilize two CNN: one for
presentation pair selection purposes and second for pair labeling. Each of the approaches CNN uses
CNN(M) model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], therefore they have to be trained separately. In addition to the MVCNN,
RotationNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explores multiple presentations from various angles, taking a part of the entire multi
presentation image of a 3D model as an input, and defines the category of the model through the rotation
process. Scientific paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] defines the multiple presentation group information and in turn proposes
the Group View Convolutional Neural Network (GVCNN). GVCNN groups the presentation level
features to generate so-called group level features. Further group level features are combined together
in order to result in the model level feature. The recursive clustering and pooling layer introduced by
the authors in the scientific research [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was developed to concatenate the multi presentation features,
which are in turn providing more exhaustive capabilities for 3D model classification. In the course of
this paper one only retrieves information from the range of similar presentations in contrast to MVCNN.
      </p>
      <p>
        Concerning the volume-based methods, they are simply applying three dimensional CNN on the
voxelized shapes of the objects. As an example, authors in research [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] utilize 3D Shape Net and
therefore propose the application of the Convolutional Deep Belief Network (cDBN) in order to
interpret a three dimensional geometry as a probability distribution over a three dimensional voxel grid.
In work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] authors describe VoxNex as an extension upcasting 2D convolutional neural network kernel
to 3D convolutional neural network kernel. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] researchers introduced VRN Ensemble presenting
deep convolutional network models for modeling generative and discriminative voxel. In the same
research authors explore issues of representations based on the voxel utilizing models. In research [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
authors present 3D A Nets developing an adversarial neural network for 3D purposes with the aim to
efficiently solve problems concerning processing of the 3D volumetric data. However, every
convolution centric model for 3D purposes has a huge disadvantage in terms of excessive technical
complexity and even more exorbitant GPU resources requirements.
      </p>
      <p>The most rapid and primitive way to obtain a solution to multi presentation 3D model classification
with use of 2D CNN could be to merge all of the presentations of the model in form of features as a
single presentation for the neural networks input. But there is one valuable disadvantage - such merged
input will result in the reduced consistent interpretability of presentation information among different
presentations. However, some presented models perform 2D convolutional neural networks on
presentations separately and therefore concatenate them into a pool layer, nevertheless such pooling
models commonly disregard the content relationships among different presentations. To solve such
issues this paper proposes an idea for learning the discriminative cross presentational information
simultaneously, preserving the content relationship among the range of presentations,
crosspresentational information. Moreover, the proposed idea integrates the mentioned two sorts of
information using milti presentation loss fusion method for end-to-end three dimensional model
classification.</p>
      <p>Therefore in this paper we propose a new multi presentational 3D model classification framework.
Precisely, in this work for the cross presentational information multiple two dimensional images of a
three dimensional model as input was used, as well as the extraction of the high level cross
presentational information using multiple 2D CNN in separated mode. It is supportive to define the
intrinsic attributed information for each presentation. For the cross presentational information the cross
presentational information of the single presentation and that of all other various presentations are
utilized in this scientific paper to evaluate the outer products, which in turn obtain the correlation
parametric matrices between different parameters of each presentation pair. Further the amplified
correlation matrix was captured by the maximization operation at the corresponding locations of
obtained correlation matrices in the direction of various presentation pairs. Afterall, one dimensional
convolution and completely connected (CC) transformation over the amplified correlation matrix was
applied in order to obtain high-level cross presentational information of each presentation. It is obliging
to describe the content relationship among the presentations. When the above information was gained,
it was merged and given as input parameter into the presentation specific CC layer, which in order
obtains the presentation specific loss value as well as label prediction. For the cross presentational loss
fusion method, a  0 constrained optimization problem was formulated with the regard to the weights
of the various presentations and therefore obtained the optimal weight distribution. It was valuable to
select different discriminative and informative presentations determined by the high weights and use
their corresponding predictions to build a joint decision. The main goals of this research may be
concluded as following:
 Present and propose a new multi presentation framework that stores the discriminative
information with its relationships among presentations for different presentations and designs
presentation set mechanism with use of the multi presentational fusion method in order to perform
end-to-end 3D models classification.
 Propose the discriminative information with relationships by merging cross presentational and
presentational information itself, where presentational information itself is generated as a result of
one dimensional convolution as well as CC transformation application over the amplified correlation
matrix which in turn was gained by the outer product and presentation pair pooling.
 Moreover, a multi-presentational loss fusion method was proposed by solving a  0 constraint
optimization to build a joint decision for inferring the category.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Previous researches</title>
      <p>
        For instance, in scientific research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] authors describe MVCNN where multi presentation images
were obtained by the 3D rotations passed through shared convolutional neural networks separately,
therefore merging at the presentation pool layer and further used as input parameter for another
convolutional neural network. Nevertheless, the disadvantage of MVCNN is that its pooling layer
disregards the divergence between different presentations, where some of the presentations are
distinctive while others have common information. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] authors proposed Group View Convolutional
Neural Network introducing the presentation, the group, and the shape level descriptor, therefore
providing a grouping scheme to divide the presentations in terms of the discrimination weights.
However, the parameters of the thresholds of grouping weights inside the grouping module cannot be
guided by more discriminative information.
      </p>
      <p>
        Convolutional Neural Networks have shown promising results for 3D geometry prediction. They
can make predictions from very little input data such as a single color image. A major limitation of such
approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface
of the objects well. We propose a general framework, called hierarchical surface prediction (HSP),
which facilitates prediction of high resolution voxel grids. The main insight is that it is sufficient to
predict high resolution voxels around the predicted surfaces. [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]
      </p>
      <p>As for the 2D image classification and data extraction where also several scientific works.</p>
      <p>
        Fei-Fei, L., Fergus, R., &amp; Perona, P. used the Caltech 101 set that was among the first standardized
datasets for multi-category image classification, it contains 101 object classes with generally 30 training
sample images per single class. Later the set was reworked by Gregory and Holub Griffin and Alex and
Perona Pietro’s and became Caltech 256, which is the set with the increased number of object classes
to 256 and added images with greater scale and background variability. However, since all of the
mentioned data sets have not been manually verified, they contain many errors that are in a way making
it unsuitable for precise algorithm evaluation activities.[
        <xref ref-type="bibr" rid="ref28 ref33">28, 33</xref>
        ]
      </p>
      <p>
        Few publications [
        <xref ref-type="bibr" rid="ref31 ref32">31, 32</xref>
        ] states that it is possible to generate a 3D model, using a single input image
and convolutional neural networks (CNN) (Figure 1). One can assume that it can be considered as
simple enough for the third consumer-friendly requirement. The assumption that requires testing is the
limitation of the output resolution and swiftness of generation time, which has not been ever specified
in any of the papers mentioned above. The key purpose of this work is to define the possible limitations
of convolutional neural networks usage for 3D models generation, taking into account output resolution
and generation swiftness. Moreover, this work also focuses on balancing, between an increase in the
resolution quality and time consumption for output generation.
      </p>
      <sec id="sec-2-1">
        <title>Input Image</title>
      </sec>
      <sec id="sec-2-2">
        <title>Convolutional layer</title>
      </sec>
      <sec id="sec-2-3">
        <title>Pooling layer</title>
      </sec>
      <sec id="sec-2-4">
        <title>Convolutional layer</title>
      </sec>
      <sec id="sec-2-5">
        <title>Pooling layer</title>
      </sec>
      <sec id="sec-2-6">
        <title>Fully-connected layer</title>
      </sec>
      <sec id="sec-2-7">
        <title>Predicted output</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Formulation of the proposed method</title>
      <p>
        As it was mentioned above, the goal in this research is to develop a presentation based descriptor
for 3D models that are trainable, produce informative representations for recognition and evaluation
problems, that won't be complex and will be efficient to compute. In this section of the article, the
proposed method is illustrated in detail, which is a joint multi-presentation 2D CNN learning framework
aimed to integrate the cross presentation and presentation information of the 3D models by the multi
presentation convolutional representation with loss fusion. The input data of the proposed method is
rendered by multiple 2D presentations of a 3D
model, which belongs to the presentation based
approach. 12 rendered presentations were created by placing 12 virtual cameras around the mesh every
30 degrees. The reason for rendering from such viewports is that it is unknown exactly which one can
provide a good representative overview of the model. In this research one use of multiple 2D
presentations to describe a 3D model and one 2D image per presentation. It was found that the multi
presentation representation contains rich information of 3D models and can be applied to various
practical problems. For the CNN features, it used the Res Net 18 [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] as the base architecture which
consists of 17 convolutional layers followed by one CC layer, in order to capture the cross presentation
information for each presentation. The Res Net-18 was pre-trained on the ImageNet repository image
set consisting of 1000 categories and then was calibrated on all 2D presentations of 3D models in the
training set. The CNN features can capture the high-level information for each presentation, which
results in better performance on classification compared with some previously proposed descriptors
[
        <xref ref-type="bibr" rid="ref25 ref26 ref27 ref29 ref30">25-27, 29-30</xref>
        ].
      </p>
      <p>Based on the above subsection, given a 3D model, one first takes a set of two dimensional input
images captured from different angles, and each image is passed through a 2D CNN to get the high
level representation in the presentation level. Assumed that   ∈   × ×

and  
=  
(  ) ∈
  
are the input image and the learned features before CC layer by CNN from the  -th presentation,
 ̅=
set  
resulting in,
respectively, where H, W, and D determine the height, width, and channel. For the  -th presentation, a
which contains different presentation pairs with respect to the  -th presentation was defined,
where M is the number of two dimension input images and ( ,  ̅) = (  ̅,  ). Therefore, the proposed

presentation information for the  -th presentation  
the outer product, presentation pair pooling, and one dimension convolution. Described as:
across presentations can be calculated by using
 
 , ̅ =</p>
      <p>= {</p>
      <p>, ̅ }
=  


⊗</p>
      <p>̅
 ̅={1,..., }
( (</p>
      <p>))
 ⊤1 = 1, 
≥ 0, || || =</p>
      <p>0
  (  ,  ) = −
where  ∈  
is a weight vector corresponding to multiple presentations,  ∈ 
defines the
common label information of all the presentations for an object, and   (  ,  ) ∈  is the cross-entropy
loss of the  -th presentation. γ &gt; 1 is the power exponent parameter of the weight   , which adjusts the
weight distribution of different views flexibly and avoids the trivial solution of α during the
classification. || || = s is used to constrain the sparseness of the weight vector α, where  ∈  + denotes
where  
 , ̅ ∈   
×</p>
      <p>correlations by multiplying each element of  
presentation pairs of the  -th presentation,</p>
      <p>∈   
of the  -th presentation with respect to other M−1 presentations. Moreover,  ( 
maximizes the correlations of M −1 presentation pairs in   along the direction of different presentation
pairs for the  -th presentation, where  is the presentation pair pooling operation. Therefore, the
high  ) ∈   
× 

level cross presentation information  
∈   
is generated by applying  
over  ( 
  ), which
defines the outer product of a presentation pair ( ,  ̅), which captures
by each element of  
 ̅</p>
      <p>. Extending to all the
× 
×( −1) stores the correlation information
a</p>
      <p>-dimension vector ( 
consists of two steps that transforms each row of  ( 
dimensional convolution (with kernel size=1) and merging</p>
      <p>) through a CC layer.</p>
      <p>Afterwards, x v intra and x v inter were combined by a merging operation and then used as input
parameters for the CC layer in order to obtain the corresponding loss and label prediction of each
  ) into a K-dimension vector by applying a one</p>
      <p>K-dimension vectors to project into
presentation. Described as,</p>
      <p>=</p>
      <p>(</p>
      <p>,   )
  =   (</p>
      <p>)

where  
produced by  
∈</p>
      <p>with input  
+ defines the comprehensive information of each presentation and   ∈    is</p>
      <p>, indicating the probability distribution over the possible classes for
implement the multi presentation 3D model classification, which can be shown as,
each presentation, and   is the number of categories. After, it was proposed a new adaptive-weighting
loss fusion method with proper meager for multiple predictions   |  =1 to build a joint decision and
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
the number of nonzero elements in α. Crucially, the  0-norm constraint is able to capture the global
relations among different views and is able to achieve presentation-wise sparseness such that only a
few discriminative and informative views are selected during the optimization to make decisions</p>
      <p>
        Afterwards, the proposed method on the Model Net 40 dataset was evaluated and comparisons with
several state-of-the-art methods was done. As it is common - classification in 3D is mainly based on
the Computer Aided Design (CAD) model. Only one widely used repository is Model Net [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] has more
than 130 thousand 3D CAD models from over 600 categories. Model Net 40 [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] provided on the
Princeton Model Net website is a subset of the Model Net and has around 12 thousand models from 40
common categories. Figure 1 selects 4 kinds of simple categories to intuitively show 6 2D presentations
rendered from 3D models, where 6 presentations are generated from 360 degrees with an interval of 60
degrees (for the experiment itself one used 12 presentations of the same models with an interval of 30
degrees).
      </p>
      <p>Proposed method was compared with several state-of-theart methods for multi presentation 3D
models classification, including both presentation and volume-based methods MVCNN Multi Res,
Minto, common volume-based methods 3DShapeNets, common point-based methods PointNet , and
common presentation based methods MVCNN.</p>
      <p>
        With consideration of the FLOPs of the Deeper Convolutional Neural Network and a better
tradeoff between accuracy and memory consumption levels compared to other classical CNN Res Net 18 is
a good choice but not limited to this convolutional neural network architecture. To evaluate the base
architectures, the results of Multi View Convolutional Neural Network were compared with
ResNet18, whose results are shown in Table 1. Evidently, the usage of the Res Net 18 can result in the
performance increase of Multi View Convolutional Neural Network. For example, Multi View
Convolutional Neural Network (ResNet18) with 12 views achieves 3.3% improvements compared with
standalone Multi View Convolutional Neural Network. For proposed method, the parameters of
ResNet-18 were calibrated using the Model Net 40 dataset and use Adam with   = 5 ∗
10 − 6 ,  1 = 0.7,  2 = 0.933,  ℎ  = 0.0001,  ℎ  = 8,   ℎ = 60 for
optimization. Furthermore, there are two parameters  and  in the proposed method, where s represents
the number of nonzero elements in  and  is the power exponent of each element of  . s was calibrated
in the range of [
        <xref ref-type="bibr" rid="ref12 ref6">6, 12</xref>
        ] with step 1 to select a few discriminative and informative presentations to build
a joint decision during classification.  was varied from 1.5  10 with a step of 1 to explore the
influence on different values of  on classification accuracy. Based on the proper parameters  = 9
and  = 2.5, one can train an optimal model to improve the performance of classifying 3D models
tremendously (see Table 1).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>First, all of the multi presentation methods outperform the single presentation method, which verifies
the advantages of multi presentation representations. Second, the classification accuracy of proposed
( +  ) is better than that of proposed ( ) and proposed ( ), respectively. It is obvious that considering
the weight distribution and the sparsity of multiple presentations simultaneously is reasonable and
effective. Finally, proposed ( +  +  ) obtains better performance than any other method, which
shows that cross presentation information across presentations also plays an important role. The
experimental results of different methods and their comparisons are reported in Table 3. The proposed
method ( +  +  ) with 12 presentations achieves the best classification accuracy. Firstly,
compared with the ‘view,volume’- based methods, for example MVCNN-MultiRes, the proposed
method gains 0.81% improvements. It is obvious that the inputs of these methods contain both 2D and
3D information, however making them work well with each other needs to be improved. Secondly,
compared with the volume-based methods, the proposed method obtains 7.28% improvements. It is
found that these volume-based methods also cannot address 3D volumetric data processing effectively.
Thirdly, making comparisons between the points-based methods and proposed methods, the
performance of classifying 3D models can be achieved by 4.67% improvements. However, the problem
of effectively modeling point clouds still needs to be solved. This verifies the superiority of the proposed
method at merging the cross presentation information and a selective and adaptive weighting strategy
into a unified multi presentation framework. Weights of different views on Model Net 40 dataset were
evaluated. The higher weight indicates that the presentation provides more valuable information and
makes more contributions during the multi presentation 3D model classification. In this paper, a new
2D CNN based multi presentation framework for 3D object classification was proposed. It takes the
multiple 2D images rendered from the 3D CAD model as the inputs. It not only contains the
discriminative information with relationships among presentations but also provides a novel
presentation merging mechanism for fusing multiple presentations to build a joint decision for
classifying 3D models. The experimental results verify the superiority and effectiveness of the proposed
method in 3D modes classification.</p>
    </sec>
    <sec id="sec-5">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , R. Cheng,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>"Hierarchical Graph Attention Based Multi-View Convolutional Neural Network for 3D Object Recognition,"</article-title>
          <source>in IEEE Access</source>
          , vol.
          <volume>9</volume>
          , pp.
          <fpage>33323</fpage>
          -
          <lpage>33335</lpage>
          ,
          <year>2021</year>
          , doi: 10.1109/ACCESS.
          <year>2021</year>
          .
          <volume>3059853</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Hang</given-names>
            <surname>Su</surname>
          </string-name>
          , Subhransu Maji, Evangelos Kalogerakis, and
          <article-title>Erik Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition</article-title>
          .
          <source>In ICCV</source>
          , pages
          <fpage>945</fpage>
          -
          <lpage>953</lpage>
          ,
          <year>2015</year>
          . https://arxiv.org/abs/1505.00880
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Johns</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leutenegger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Davison</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Pairwise Decomposition of Image Sequences for Active Multi-view Recognition</article-title>
          .
          <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <fpage>3813</fpage>
          -
          <lpage>3822</lpage>
          . DOI:
          <volume>10</volume>
          .1109/CVPR.
          <year>2016</year>
          .414
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chatfield</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Return of the Devil in the Details: Delving Deep into Convolutional Nets</article-title>
          . ArXiv, abs/1405.3531. DOI:
          <volume>10</volume>
          .5244/C.28.6
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kanezaki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsushita</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nishida</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints</article-title>
          .
          <source>2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <fpage>5010</fpage>
          -
          <lpage>5019</lpage>
          . DOI:
          <volume>10</volume>
          .1109/CVPR.
          <year>2018</year>
          .00526
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2018</year>
          ). GVCNN: Group-View
          <source>Convolutional Neural Networks for 3D Shape Recognition</source>
          .
          <source>2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <fpage>264</fpage>
          -
          <lpage>272</lpage>
          . DOI:
          <volume>10</volume>
          .1109/CVPR.
          <year>2018</year>
          .00035
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelillo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Siddiqi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition</article-title>
          . ArXiv, abs/
          <year>1906</year>
          .01592. DOI:
          <volume>10</volume>
          .5244/C.31.64
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khosla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>3D ShapeNets: A deep representation for volumetric shapes</article-title>
          .
          <source>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>1912</year>
          -
          <fpage>1920</fpage>
          . DOI:
          <volume>10</volume>
          .1109/CVPR.
          <year>2015</year>
          .7298801
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Maturana</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Scherer</surname>
          </string-name>
          ,
          <article-title>"VoxNet: A 3D Convolutional Neural Network for real-time object recognition,"</article-title>
          <source>2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>922</fpage>
          -
          <lpage>928</lpage>
          , doi: 10.1109/IROS.
          <year>2015</year>
          .
          <volume>7353481</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Brock</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritchie</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Generative and Discriminative Voxel Modeling with Convolutional Neural Networks</article-title>
          . ArXiv, abs/1608.04236.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Mengwei</surname>
            <given-names>Ren</given-names>
          </string-name>
          , Liang Niu, and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Fang</surname>
          </string-name>
          .
          <article-title>3d-a-nets: 3d deep dense descriptor for volumetric shapes with adversarial networks</article-title>
          .
          <source>arXiv preprint arXiv:1711.10108</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. K. P.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <article-title>"Extended Gaussian images,"</article-title>
          <source>in Proceedings of the IEEE</source>
          , vol.
          <volume>72</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1671</fpage>
          -
          <lpage>1686</lpage>
          , Dec.
          <year>1984</year>
          , doi: 10.1109/PROC.
          <year>1984</year>
          .
          <volume>13073</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ma</surname>
            , Lingfei &amp; Li,
            <given-names>Ying</given-names>
          </string-name>
          &amp; Li,
          <string-name>
            <surname>Jonathan</surname>
          </string-name>
          &amp; Tan,
          <string-name>
            <surname>Weikai</surname>
          </string-name>
          &amp; Yu, Yongtao &amp; Chapman,
          <string-name>
            <surname>Michael.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Multi-Scale Point-Wise Convolutional Neural Networks for 3D Object Segmentation From LiDAR Point Clouds in Large-Scale Environments</article-title>
          .
          <source>IEEE Transactions on Intelligent Transportation Systems. PP</source>
          .
          <volume>1</volume>
          -
          <fpage>16</fpage>
          . doi:
          <volume>10</volume>
          .1109/TITS.
          <year>2019</year>
          .
          <volume>2961060</volume>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Wenju</given-names>
          </string-name>
          &amp; Cai, Yu &amp; Wang,
          <string-name>
            <surname>Tao.</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Multi-view dual attention network for 3D object recognition</article-title>
          .
          <source>Neural Computing and Applications</source>
          . 1-
          <fpage>12</fpage>
          . doi:
          <volume>10</volume>
          .1007/s00521-021-06588-1.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Chaudhuri</surname>
            ,
            <given-names>Siddhartha</given-names>
          </string-name>
          &amp; Koltun,
          <string-name>
            <surname>Vladlen.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Data-Driven Suggestions for Creativity Support in 3D Modeling</article-title>
          .
          <source>ACM Transactions on Graphics. 29. doi: 10.1145/1866158</source>
          .
          <fpage>1866205</fpage>
          ..
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kokkinos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bronstein</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Litman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bronstein</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Intrinsic shape context descriptors for deformable shapes</article-title>
          .
          <source>2012 IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <fpage>159</fpage>
          -
          <lpage>166</lpage>
          . DOI:
          <volume>10</volume>
          .1109/CVPR.
          <year>2012</year>
          .6247671
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Mogalapalli</surname>
            ,
            <given-names>Harshit</given-names>
          </string-name>
          &amp; Abburi, Mahesh &amp; Balan, Nithya &amp; Bandreddi,
          <string-name>
            <surname>Surya.</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>ClassicalQuantum Transfer Learning for Image Classification</article-title>
          .
          <source>SN Computer Science</source>
          . 3. doi:
          <volume>10</volume>
          .1007/s42979-021-00888-y.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Vyas</surname>
            ,
            <given-names>Shantanu</given-names>
          </string-name>
          &amp; Chen,
          <string-name>
            <surname>Ting-Ju</surname>
            &amp; Mohanty, Ronak &amp; Jiang, Peng &amp; Krishnamurthy,
            <given-names>Vinayak.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Latent Embedded Graphs for Image and Shape Interpolation</article-title>
          . Computer-Aided Design.
          <volume>140</volume>
          . 103091. doi:
          <volume>10</volume>
          .1016/j.cad.
          <year>2021</year>
          .
          <volume>103091</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Wenju</given-names>
          </string-name>
          &amp; Cai, Yu &amp; Wang,
          <string-name>
            <surname>Tao.</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Multi-view dual attention network for 3D object recognition</article-title>
          .
          <source>Neural Computing and Applications</source>
          . 1-
          <fpage>12</fpage>
          . doi:
          <volume>10</volume>
          .1007/s00521-021-06588-1.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>Junjie</given-names>
          </string-name>
          &amp; Huang, Ningning &amp; Tang, Jing &amp; Fang, Mei-e. (
          <year>2020</year>
          ).
          <article-title>Recognition of 3D Shapes Based on 3V-DepthPano CNN</article-title>
          . Mathematical Problems in Engineering.
          <year>2020</year>
          . 1-
          <fpage>11</fpage>
          . doi:
          <volume>10</volume>
          .1155/
          <year>2020</year>
          /7584576.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Koenderink</surname>
            ,
            <given-names>J.J</given-names>
          </string-name>
          ., van
          <string-name>
            <surname>Doorn</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          <article-title>The singularities of the visual mapping</article-title>
          .
          <source>Biol. Cybernetics</source>
          <volume>24</volume>
          ,
          <fpage>51</fpage>
          -
          <lpage>59</lpage>
          (
          <year>1976</year>
          ). doi:
          <volume>10</volume>
          .1007/BF00365595
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Eitz</surname>
            ,
            <given-names>Mathias</given-names>
          </string-name>
          &amp; Hildebrand, Kristian &amp; Boubekeur, Tamy &amp; Alexa,
          <string-name>
            <surname>Marc.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Sketch-Based Shape Retrieval</article-title>
          .
          <source>ACM Transactions on Graphics - TOG</source>
          . 31. doi:
          <volume>10</volume>
          .1145/2185520.
          <fpage>2185527</fpage>
          ..
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>Rosalia</given-names>
          </string-name>
          &amp; Tuytelaars,
          <string-name>
            <surname>Tinne.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Sketch Classification and Classification-driven Analysis Using Fisher Vectors</article-title>
          .
          <source>ACM Trans. Graph.. 33</source>
          . 174:
          <fpage>1</fpage>
          -
          <lpage>174</lpage>
          :9. doi:
          <volume>10</volume>
          .1145/2661229.2661231.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          .
          <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          . DOI:
          <volume>10</volume>
          .1109/cvpr.
          <year>2016</year>
          .90
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Yue</given-names>
          </string-name>
          &amp; Wu, Yuwei &amp; Chen, Caihua &amp; Lim,
          <string-name>
            <surname>Andrew.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks</article-title>
          . CVPR2020 https://arxiv.org/abs/
          <year>2002</year>
          .12222
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Bebeshko</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khorolska</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotenko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharchenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhyrova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Use of neural networks for predicting cyberattacks</article-title>
          .
          <source>Paper presented at the CEUR Workshop Proceedings</source>
          ,
          <volume>2923</volume>
          <fpage>213</fpage>
          -
          <lpage>223</lpage>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2923</volume>
          /paper23.pdf
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Zhirong</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and
          <string-name>
            <given-names>Jianxiong</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>3d shapenets: A deep representation for volumetric shapes</article-title>
          .
          <source>In CVPR</source>
          , pages
          <fpage>1912</fpage>
          -
          <lpage>1920</lpage>
          ,
          <year>2015</year>
          . https://arxiv.org/pdf/1406.5670.pdf
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Bebeshko</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khorolska</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotenko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desiatko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sauanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagyndykova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tyshchenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>3D modelling by means of artificial intelligence</article-title>
          .
          <source>Journal of Theoretical and Applied Information Technology</source>
          ,
          <volume>99</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1296</fpage>
          -
          <lpage>1308</lpage>
          . http://www.jatit.org/volumes/Vol99No6/5Vol99No6.pdf
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>Amarjeet</given-names>
          </string-name>
          &amp; Hampson, Gary &amp; Rayment, Tom. (
          <year>2021</year>
          ).
          <article-title>Adaptive subtraction using a convolutional neural network</article-title>
          .
          <source>First Break</source>
          .
          <volume>39</volume>
          .
          <fpage>35</fpage>
          -
          <lpage>45</lpage>
          .
          <fpage>10</fpage>
          .3997/
          <fpage>1365</fpage>
          -
          <lpage>2397</lpage>
          .
          <year>fb2021066</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Khorolska</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lazorenko</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bebeshko</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desiatko</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharchenko</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yaremych</surname>
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2022</year>
          )
          <article-title>Usage of Clustering in Decision Support System</article-title>
          . In: Raj J.S.,
          <string-name>
            <surname>Palanisamy</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perikos</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            <given-names>Y</given-names>
          </string-name>
          . (eds) Intelligent
          <source>Sustainable Systems. Lecture Notes in Networks and Systems</source>
          , vol
          <volume>213</volume>
          . Springer, Singapore. https://doi.org/10.1007/
          <fpage>978</fpage>
          -981-16-2422-3_
          <fpage>49</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Wu</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freeman</surname>
            <given-names>W. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tenenbaum J. B</surname>
          </string-name>
          .
          <article-title>Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling</article-title>
          ,
          <string-name>
            <surname>In</surname>
            <given-names>NIPS</given-names>
          </string-name>
          , pages
          <fpage>82</fpage>
          -
          <lpage>90</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Häne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tulsiani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Malik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <source>Hierarchical Surface Prediction for 3D Object Reconstruction</source>
          .
          <source>2017 International Conference on 3D Vision (3DV)</source>
          ,
          <fpage>412</fpage>
          -
          <lpage>420</lpage>
          . DOI:
          <volume>10</volume>
          .1109/3DV.
          <year>2017</year>
          .00054
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Igor</surname>
            <given-names>Smirnov</given-names>
          </string-name>
          , Alexey Kutyrev,
          <string-name>
            <given-names>Nikolay</given-names>
            <surname>Kiktev</surname>
          </string-name>
          .
          <article-title>Neural network for identifying apple fruits on the crown of a tree</article-title>
          .
          <source>E3S Web Conf. 270</source>
          <volume>01021</volume>
          (
          <year>2021</year>
          ) DOI:
          <fpage>10</fpage>
          .1051/e3sconf/202127001021
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>