<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Software Package for Evaluation the Stereo Camera Calibration for 3D Reconstruction in Robotics Grasping System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alona Vitiuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatoliy Doroshenko</string-name>
          <email>doroshenkoanatoliy2@mail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>40</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Software Systems of the National Academy of Sciences of Ukraine</institution>
          ,
          <addr-line>Akademika Glushkova Avenue</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kyiv</institution>
          ,
          <addr-line>03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>37, Prosp. Peremohy</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>1</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>The approach for accuracy assessment of the object model a for the problem of stable grasping in the combined system of the proposal of grasping and the reconstruction of the three-dimensional model of the object was considered. A lot of studies indicate that robot learning from demonstration is a promising way to improve grasping performance, but complete automation of the grasping task in unforeseen circumstances remains difficult and it`s accuracy can be affected at each stage of grasp planning task. Combined system allows stable capture of objects of any shape without restrictions on the types of shapes in the training data set. Novel approaches to surface reconstruction of the object are based on restoring the depth of points from a pair of images from two cameras. The quality of the 3D reconstruction is affected by several factors: the movement of the camera and environmental objects, spatial quantization of the image coordinates, correspondence of key points, camera calibration parameters, unaccounted camera distortions, as well as numerical and statistical properties of the selected reconstruction method. Camera parameter errors can be minimized by improving the calibration procedure, so the impact of errors on the quality of the 3D model was investigated. The deviation of the model from the plane is chosen as a metric for quality assessment. For its calculation, the point cloud is processed by plane identification and segmentation. The software package for accuracy estimation experiment was conducted to obtain the dependence of the accuracy of the reconstructed planes on the errors of the camera parameters. The impact of calibration errors on 3D reconstruction was evaluated by comparing metrics for individual planes at different levels of artificial error and evaluating the impact of the error on these metrics. Modeling the error of the camera calibration parameters with a given noise level shows that the calibration parameters deteriorate as the noise level increases. In particular, it was established that an increase in error contributes to an increase in the error of estimation of calibration parameters. In addition, orientation parameters (rotation and translation) are more complex and therefore more sensitive to measurement noise than other parameters. three-dimensional reconstruction, camera calibration, stable grip, point cloud, manipulator</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Planning of stable grasping and efficient movement of objects in an unstructured environment is
an urgent problem
in robotics. In
man-made conditions, the
process of three-dimensional
reconstruction based on multiple images can be used to help robots determine their position in space,
build a three-dimensional map of the environment, and recognize surrounding objects. In addition,
robust grasp planning can be applied to build a complex system that includes gesture recognition and</p>
      <p>
        2022 Copyright for this paper by its authors.
prediction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The system for capturing an unknown object by a mobile robot consists of a camera, a
robotic arm, and a gripping limb. Based on the video stream from the camera, a map of the
environment and reconstruction of the model of the object is constructed. In the process of scanning,
the system receives important information about the object's structure, shape, size and orientation in
space. Such processing and model building should run in real time.
      </p>
      <p>
        Recently data-driven approaches that perform grasp planning directly on the basis of sensor data
(without intermediate state) have led to significant progress in the implementation of grasps and
generalization of their movement planning approaches. Existing methods use convolutional neural
network architectures and expect an input RGB-D image [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. They can be applied in a high-reliability
robotic system, but require large volumes of grasping data and 3D object models. The size of this data
directly affects the percentage of successful captures. The Contact-GraspNet system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses this
approach to efficiently generate the distribution of parallel grasps for 6 degrees of freedom directly
from scene depth data. Other systems take a perceptually driven approach and often use a
representation of the problem in pixel space, such as learning pixel accessibility maps [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or
constraining capture to the normal of the image plane [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Pixel-space representations have obvious computational advantages, but there are also obvious
physical advantages to generating full 6-degree-of-freedom grips when interacting with real objects.
In addition, in some cases it is useful to have information about the presence of a grasping point that
is not directly visible in the observed image, primarily because it represents the best opportunity for
grasping (e.g., a cup handle) or because of a grasping constraint (e.g., grasping by only visible points
can make it difficult to place the object in the required configuration after grasp). Many existing
datadriven methods generate a grasp by selecting a visible pixel as the attachment point of the working
limb, limiting the grasping planning to visible points on the object.</p>
      <p>
        This shortcoming can be eliminated using an integrated approach to capture planning with the use
of grasping offer subsystems and reconstruction of an object of known shape. However, existing
systems using machine learning techniques for two modules (a trained grasper proposal network and a
trained shape object reconstruction network [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) have limitations in grasping objects of unknown
shapes that were not used during the training of the 3D reconstruction network. The use of analytical
approaches to 3D reconstruction combined with learning approaches for grasp suggestion will allow
planning the grasp of objects of any shape to increase the accuracy of the manipulator. A sequence of
frames (video) is fed to the input of such a reconstruction system, with the help of which it is possible
to obtain an image of the object from different angles and calculate a three-dimensional model of the
object of sufficient density.
      </p>
      <p>A monocular camera provides information from sensors in the form of two-dimensional images.
Therefore, the depth of each pixel is estimated from the ratio of real-world point coordinates between
images from different camera positions. Such correspondences are detected by comparing
photometric patterns on neighboring pixels of each individual pixel. When using such an approach,
inaccuracies arise: pixels on low-texture areas cannot be accurately mapped on images, and accurate
3D reconstruction is usually limited to areas of images with large gradients. Photometric calibration
can significantly improve the performance of direct visual odometry methods to improve
reconstruction quality.</p>
      <p>This article demonstrates how stereo camera calibration errors affect the overall quality of building
a three-dimensional model. After all, both internal and external parameters of the camera can
introduce an additional error during the reprojection of pixels. In addition, it must be taken into
account that pixels corresponding to the same 3D point may have different intensities in different
images due to camera optical vignetting, automatic gain and automatic exposure settings. Existing
photometric calibration approaches are reviewed in order to restore image intensity values and
establish pixel-to-pixel correspondence for stereo images. An analysis of the reconstruction obtained
using a mobile robot positioning algorithm and an analysis of the impact of photometric calibration
errors on the quality of direct visual odometry methods as part of a three-dimensional reconstruction
system was carried out.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Grasp planning system</title>
      <p>The ability to autonomously grasp unknown objects can greatly assist robots in performing a wide
range of tasks. However, a robot in an unstructured environment may encounter objects about which
it has only limited experience or knowledge. In such cases, successful grasp requires complex
perception, planning and control. However, because each of these problems is complex, fully
autonomous capture of unpredicted objects in an unstructured environment remains an unsolved
problem.</p>
      <p>This section is devoted to the consideration of the model of the combined system of the offer of
grasping and reconstruction of a three-dimensional model of an object of unknown shape for the
implementation of stable grasping and subsequent manipulation of the object in the environment.</p>
      <p>The system consists of two modules (Figure 1: Combined capture planning system), the results of
which are combined by a refinement module for planning capture. The input data is a stereo image.
The segmented image from the first camera and the depth map are fed to the capture proposal network
for processing, and the stereo pair with camera parameters are fed to the 3D reconstruction module.</p>
      <p>For the first module, an implementation of the GPNet capture proposal network was taken, which
outputs the capture position relative to the camera frame . The 3D reconstruction
subsystem outputs a reconstructed point cloud of the object's surface, providing information about the
object's shape and visible parts. The outputs of the two subsystems are combined by projecting the
capture proposal onto the nearest point of the reconstructed point cloud, yielding the improved
grasp proposal . Since the position of the camera relative to the robot is known, the grasp in
the camera coordinate system can be translated into the robot coordinate system for execution by the
manipulator: .</p>
      <p>The image acquisition module represents the connection between the main system and the camera.
The service sends color images in a format supported by the Robot Operating System (ROS). The
image received from the camera is compressed for low bandwidth transmission. In addition, this
module also renders the received images.</p>
      <p>
        To rectify the original image, a three-dimensional calibration procedure is performed. It is a
calculation of the external and internal parameters of the camera. To find the projection of a
threedimensional point on the image plane, firstly needed to convert the point from the world coordinate
system to the camera coordinate system using external parameters (rotation R and translation T). Next,
using the camera's internal parameters, we project a point onto the image plane [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Let P be a three-dimensional point with coordinates in the world coordinate system. Coordinate
vector of point P in the camera system: . Here, R is the rotation matrix corresponding to
the rotation vector om: R = rodrigues(om). Then we have the coordinates : ( ) = (x, y,
z).</p>
      <p>The coordinates of the pinhole projection of the point P(a,b) can be represented as:
Let's consider
.</p>
      <p>. Then the distortion model [14]:</p>
      <sec id="sec-2-1">
        <title>The coordinates of the distorted point represent a vector :</title>
      </sec>
      <sec id="sec-2-2">
        <title>The projection of point P on the image is a point</title>
        <p>:</p>
        <p>The purpose of the calibration process is to find the matrix , the rotation matrix and the
translation vector using a set of known three-dimensional points and their
corresponding image coordinates . When the values of the internal and external parameters are
obtained, the camera is considered calibrated.</p>
        <p>There are three types of camera calibration methods: template-based, geometric-keyed, and
deeplearning-based. The first approach is to obtain a set of images of a template object with known
dimensions and configuration. As the data set is collected, the camera moves and rotates around the
pattern to capture images from different viewpoints. This method is best suited for laboratory
conditions where it is possible to use a template object manufactured with high geometric accuracy.
Various schemes can be used as a template object pattern: checkerboard, circles and more complex
ArUco-type markers. The checkerboard patterns are clear and easy to recognize in the image. The
corners of the squares on the chessboard are ideal for their localization, as they have sharp gradients
in two directions. In addition, these corners are at the intersection of chess lines, and therefore form a
repeating structure. All these facts are used to reliably arrange the corners of the squares in a
checkerboard pattern.</p>
        <p>Calibration by geometric keys is possible when there are geometric clues on the scene under
investigation: straight lines, planes or vanishing points of the horizon. Deep learning-based methods
are appropriate when sufficient control over the image collection process cannot be exercised (for
example, when a single image of a scene is available). The accuracy of methods based on deep
learning is much lower.</p>
        <p>To obtain the parameters of the camera, it is necessary to collect a set of images of the calibration
template in different positions of the camera. Using methods from the OpenCV library, internal
camera parameters are obtained and applied to each input image to remove lens distortions. The
calibration algorithm is presented in Figure 2.</p>
        <p>The world coordinate system is defined by a template object with the image of a chessboard,
which is reliably fixed in scene. The three-dimensional object points are the corners of the squares on
the chessboard. Any corner of the board can be chosen as the center of the world coordinate system.
The axes and are located along the mounting plane, and the axis is perpendicular to the
plane. Therefore, all points on the chessboard are on the XY plane (ie for a flat template).</p>
        <p>Thus, the camera calibration algorithm has the following inputs and outputs:
 Input data: a set of images with key points where two-dimensional coordinates in the
image system and three-dimensional coordinates in the world system are known.
 Output data: camera matrix with internal parameters, rotation and translation of each
image.</p>
        <p>Obtaining three-dimensional information about the scene and estimating the current position of the
camera module can be done using the simultaneous localization and mapping module.
Implementation of such a system based on LSD-SLAM allows obtaining a dense cloud of points. The
average rate of receiving a new key frame in the system and estimating the position is approximately
5 and 10 Hz. As shown in Figure 3, depth estimation occurs mainly on contours in the image. This
result is typical for the LSD-SLAM library. The principle of its operation consists in finding the
difference in image intensity and finding correspondences on texture contours depending on the
contrast of the scene.</p>
        <p>For optimal reconstruction of the scene, it is necessary to ensure its stability. This module provides
collection and fusion of reconstructed point clouds from different keyframes. Further optimization of
the reconstruction can be done by post-processing the obtained point cloud.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Construction of a plane model from a cloud of points and its accuracy</title>
      <p>This section deals with the processing of the cloud of points, which represents the surface of the
target object. To assess the quality of the built surface model, separate planes were selected, since the
deviation of the model from the plane was chosen as the quality criterion. To process the point cloud,
planes are identified and segmented, for which an algorithm based on the Random Sample Consensus
(RANSAC) method is considered.</p>
      <p>Sparse methods for surface reconstruction estimate the surface from a sparse point cloud. It is
sparse because it is computed from points of interest that have a non-uniform and sparse distribution
in the images. However, the points have decent confidence due to a standard pipeline including point
selection, robust and optimal reconstruction using RANSAC and nonlinear least squares.</p>
      <p>Sparse methods are useful for its spatial and temporal complexity, especially for obtaining
compact models of large-scale environments. This is good for computing with limited hardware or for
applications that require high scalability and do not require the level of detail provided by dense
stereo. Second, they can initialize dense stereo methods to improve accuracy and obtain more detailed
reconstructions if the experimental conditions are favorable to obtain sufficient texture in the images.</p>
      <p>A classical method of plane segmentation from a point cloud is the RANSAC algorithm. This
method estimates the parameters of a mathematical model for a set of observed data containing a large
number of outliers. It randomly selects a minimal set of points to estimate model parameters. From
the random samples, it selects the one that best matches the full set of points. According to its general
formulation, RANSAC can be easily applied to describe any primitive geometric shapes. However,
the basic RANSAC approach assumes that the input data can belong to only one model.</p>
      <p>The principle of the RANSAC algorithm is to find the best plane among a 3D cloud of points. At
the same time, it reduces the number of iterations, even if the number of points is very large. To do
this, it randomly selects three points and calculates the parameters of the corresponding plane. It then
detects all points of the seed cloud belonging to the computed plane according to a given threshold.
After that, it repeats these procedures N times; in each of them it compares the obtained result with
the last saved one. If the new result is better, the saved result is replaced by the new one.</p>
      <p>
        A prioritization function with a soft threshold [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], based on two weighting functions is used to
improve the segmentation quality, which takes into account both the distance from the points to the
plane and the consistency between the normal vectors. However, this requires estimating the normal
vector at each point, which is inefficient in dense point clouds.
      </p>
      <p>According to estimates, the time complexity of RANSAC depends on the size of the subset, the
proportion of outliers, and the number of points in the set. RANSAC runtime can be excessively long
in some cases. Therefore, a modification of the algorithm is considered for more effective detection of
shapes in point clouds – including flat shapes. Octotrees are used to establish spatial proximity
between samples and their scoring function considers only a local subset of samples. Local sampling
by selecting points inside each node is used to avoid incorrect results.</p>
      <p>The quality of 3D reconstruction is affected by several factors: the movement of the camera and
environmental objects, spatial quantization of the image coordinates, correspondence of key points,
camera calibration parameters, unaccounted camera distortions, as well as numerical and statistical
properties of the selected reconstruction method. The impact of calibration errors on
threedimensional reconstruction was evaluated by comparing metrics for individual planes at different
levels of artificially introduced error and evaluating the impact of the error on these metrics. For this
purpose, the mean square deviation of the cloud of points was calculated and flatness was estimated
based on it.</p>
      <p>RANSAC algorithm requires four inputs:
 Three-dimensional point cloud;
 Tolerance threshold of the distance t between the selected plane and other points. It is related
to the accuracy of the cloud of points;
 The forseeable-support (maximum probable number of points belonging to a single plane) -- it
is derived from the density of points and the maximum estimated surface of the plane.
 The probability α (minimum probability of finding at least one good set of observations in N
trials) -- it is usually between 0.90 and 0.99.</p>
      <p>We have a cloud of points, which is a set of points in the plane coordinate system.
This coordinate system is such that the z-axis is perpendicular to the plane. The transformation that
translates a point from the global coordinate system of the reconstruction to the local coordinate
system of the plane can be represented as .</p>
      <p>We define a point such that:</p>
      <p>Consider the point O as the center of coordinates of the new local system. The coordinates of
points in the new system can be represented:</p>
      <p>Let us represent the plane in the local coordinate system as z=ax+by, where a and b can be
estimated from the following expressions (assuming that the deviations are measured along the z
axis):</p>
      <sec id="sec-3-1">
        <title>Calculate the deviation between the measured points and the segmented plane:</title>
        <p>The flatness deviation (FD) can be determined by the sum of the values of the maximum positive
local deviation (TP) and the maximum value of the modulus of negative local deviation (FP):</p>
        <p>Completeness (C1), correctness (C2) and quality (Q) metrics for evaluating the presented method
are expressed by the following representations:</p>
        <p>Here, TP is the number of valid planes that are correctly detected, FN is the number of planes that
are unrecognized, FP is the number of incorrectly recognized planes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results of experiments</title>
      <p>This section presents the results of point cloud reconstruction experiments on three different
scenes. For each case, the segmentation of the studied plane was carried out and its model was
obtained.</p>
      <p>The main parameters of the data sets are presented in Table 1.
Index Length, m Width, m Height, m Number of Average
points density
(points/m3)
1 7 5.2 3.2 45361 389
2 8.5 9.4 2.6 38265 184
3 7.3 7.2 2.5 37472 285</p>
      <p>As can be seen from Fig. 2, the obtained three-dimensional models have sufficient density. This is
due to the fact that the method used by the algorithm for obtaining a cloud of points uses all the
information in the image, including contours. This provides high accuracy and reliability in
lowtextured environments using a single monocamera.</p>
      <p>For each scene, plane segmentation was performed and point clouds, which represent sets of
measurements belonging to one plane, were isolated. Next, the parameters of the mathematical model
of each of the planes were evaluated. An example of a separate cloud of points corresponding to a
noisy plane model is shown in Figure 4.</p>
      <p>For each obtained plane and its corresponding point cloud, flatness deviations were evaluated and
indicators of metric completeness, correctness and quality were calculated. Next, the effect of
changing the calibration parameters on the metric data was investigated. For this, an analysis of the
sensitivity of the camera parameters was carried out, in which the pixel values on the image plane
were distorted by noise with a standard deviation of 0.05 to 1.0 pixels. Table 2 shows the sample
results of the sensitivity analysis. In the simulated system, the camera was positioned in such a way
that the direction of the z axis of the global and local coordinate systems coincided.</p>
      <p>On the basis of the obtained noisy camera parameters, the process of reconstruction and
segmentation of planes from the obtained models was carried out. For each such set of input data,
flatness deviations were estimated. The dependence of the flatness deviation on the level of the
introduced error for each of the three data sets is presented in the Figure 5.</p>
      <p>This experiment demonstrated the dependence of the accuracy of the reconstructed point cloud on
the deviations of the internal and external parameters of the cameras in the stereo system. It was found
that increasing the accuracy of camera calibration can provide an opportunity to obtain an increase in
the accuracy of a three-dimensional model by up to 60%.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The method of assessing the accuracy of the object model for the problem of stable grasping using
a combined system of grasping proposal and reconstruction of the three-dimensional model of the
object, which allows stable grasping of objects of unknown shape, is considered. The deviation of the
model from the plane is chosen as the metric for accuracy assessment, so the point cloud is processed
by plane identification and segmentation, for which an algorithm based on the RANSAC method is
considered. An experiment was conducted to obtain the dependence of the accuracy of the
reconstructed planes on the errors of the camera parameters. The impact of calibration errors on
threedimensional reconstruction was evaluated by comparing metrics for individual planes at different
levels of artificially introduced error and evaluating the impact of the error on these metrics. Modeling
the error of the camera calibration parameters with a given noise level shows that the calibration
parameters deteriorate as the noise level increases. In particular, after analyzing the established
flatness metrics, it was established that the error in determining the center of the image is proportional
to the measurement error. It follows that an increase in the error contributes to an increase in the error
in the estimation of the calibration parameters. In addition, orientation parameters (rotation and
translation) are more complex and therefore more sensitive to measurement noise than other
parameters.</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doroshenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Novak</surname>
          </string-name>
          ,
          <article-title>Gesture simulator programming using statistical modeling</article-title>
          .
          <source>Problems of programming</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <source>Deep Residual Learning for Image Recognition</source>
          ,
          <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          , doi: 10.1109/CVPR.
          <year>2016</year>
          .
          <volume>90</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Sundermeyer</surname>
          </string-name>
          , Martin, Mousavian et al.,
          <source>Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes</source>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/pdf/2103.14127.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Zeng</surname>
          </string-name>
          , Andy, Song et al,
          <article-title>Robotic Pick-and-Place of Novel Objects in Clutter with MultiAffordance Grasping</article-title>
          and
          <string-name>
            <surname>Cross-Domain Image</surname>
            <given-names>Matching</given-names>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/1710.01330
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Staub</surname>
          </string-name>
          et al.,
          <string-name>
            <surname>Dex-Net</surname>
            <given-names>MM</given-names>
          </string-name>
          :
          <article-title>Deep Grasping for Surface Decluttering with a Low-Precision Mobile Manipulator</article-title>
          ,
          <source>2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)</source>
          , Vancouver, BC, Canada,
          <year>2019</year>
          , pp.
          <fpage>1373</fpage>
          -
          <lpage>1379</lpage>
          , doi: 10.1109/COASE.
          <year>2019</year>
          .
          <volume>8842901</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vitiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kornaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barabash</surname>
          </string-name>
          ,
          <article-title>Capturing unknown objects by a mobile robot using visual information</article-title>
          , Scientific notes of V. I. Vernadskyi Tavra National University. Series: Technical sciences,
          <year>2018</year>
          , Vol.
          <volume>29</volume>
          (
          <issue>68</issue>
          ), No.
          <volume>1</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>93</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tosun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eisner</surname>
          </string-name>
          et al.,
          <source>Robotic Grasping through Combined Image-Based Grasp Proposal and 3D Reconstruction</source>
          ,
          <source>2021 IEEE International Conference on Robotics and Automation (ICRA)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>6350</fpage>
          -
          <lpage>6356</lpage>
          , doi: 10.1109/ICRA48506.
          <year>2021</year>
          .9562046
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          et al.,
          <source>Geometrical Segmentation of Multi-Shape Point Clouds Based on Adaptive Shape Prediction and Hybrid Voting RANSAC. Remote Sens</source>
          ,
          <year>2022</year>
          . URL: https://doi.org/10.3390/rs14092024
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>