=Paper= {{Paper |id=Vol-2391/paper47 |storemode=property |title=Surface recognition of machine parts based on the results of optical scanning |pdfUrl=https://ceur-ws.org/Vol-2391/paper47.pdf |volume=Vol-2391 |authors=Michael Bolotov,Vadim Pechenin,Nikolay Ruzanov,Ekaterina Kolchina }} ==Surface recognition of machine parts based on the results of optical scanning == https://ceur-ws.org/Vol-2391/paper47.pdf

Surface recognition of machine parts based on the results of
optical scanning

M A Bolotov1, V А Pechenin1, N V Ruzanov1 and E J Kolchina2

1
Samara National Research University, Institute of Engines and Power Plants, Moskovskoe
Shosse, 34А, Samara, Russia, 443086
2
Stock company «RKC «Progress», Zemeca street, 18, Samara, Russia, 443009

e-mail: vadim.pechenin2011@yandex.ru

Abstract. To predict the quality parameters of products (in particular, the assembly
parameters) mathematical models were implemented in the form of computer models. To
ensure the adequacy of calculations, it is necessary to have information about the actual
geometry of the parts, which can be obtained using noncontact measurements of parts of the
assembly. As a result of measuring parts and components using optical or laser scanner, a large
dimension array of measured points is formed. After standard processing (e.g. noise removal,
combining the scans, smoothing, creating triangulation mesh), the recognition of individual
surfaces of parts becomes necessary. This paper presents a neural network model that allows
the recognition of elements based on an array of measured points obtained by scanning.

1. Introduction
The least automated step in the industry is the assembly of single and serial products characterized by
medium and high complexity. These products include aircraft engines. The considered products are
not made in large quantities as cars; they are characterized by a high degree of optionality and
increased requirements for complexity and accuracy. The share of labour-consuming assembly in the
total labour-intensiveness of products is up to 25% and largely determines their quality. There are
several reasons that make it difficult to fully automate the assembly of these products. One of the
significant reasons is the difficulty of determining the parameters of the operations performed by
robots, which are guaranteed to ensure the specified accuracy and quality of products. The assembly of
medium and high complexity products is a unique operation, during which the course of operations is
changed according to the results of measurements and the geometric analysis of the assembled parts.
Measurement of geometry is made by both contactless and contact methods.
To partially automate engine assembly processes, it is necessary to recognize both the individual
parts and the surfaces of the parts along which the assembly will take place. Face recognition is
possible using computer vision approaches [1,2,3]. The aim of this work is to create a model based on
the use of neural networks, designed to recognize the surfaces of engineering parts after their
measurement using an optical or laser scanner.

V International Conference on "Information Technology and Nanotechnology" (ITNT-2018)
Image Processing and Earth Remote Sensing
M A Bolotov, V А Pechenin, N V Ruzanov and E J Kolchina

2. Object of research
To test the model, a real engine parts simulator was designed and manufactured: spacers in the turbine
of an aircraft engine. A detail drawing is shown in Figure 1. The part contains cylindrical and flat
edges, as well as threaded holes.

Figure 1. Drawing part "spacer simulator".

Automated element (surface) recognition of measured parts using neural networks solves two tasks:
1) segmentation of a part’s components into types of surfaces (plane, cylinder, cone, etc.) 2) additional
refinement of the boundaries of triangulation surfaces based on deviations of the facet normal vectors.
To solve the first problem, a convolutional neural network was used.

3. Neural network model of surface recognition
Convolutional neural networks (CNN) are a very wide class of architectures which main idea is to
reuse parts of the neural network to work with different small, local input areas [4]. The main area of
application of convolutional architectures is image processing [5, 6].
At present, many approaches have been developed for recognizing three-dimensional objects in
works devoted to computer vision. These approaches can be divided into two groups: recognition of
elements, directly working on their own three-dimensional representations of objects, such as
polygonal grids, voxel representations and arrays of points, and approaches based on signs and metrics
that describe the shape of a three-dimensional object, "what it looks like" in the collection 2D
projections [7].
Except for the recent work by Wu et al. [8], who studied form descriptions from a voxel-based
object representation through three-dimensional convolutional networks, the previous three-
dimensional shape descriptions were mostly “manually developed” according to a specific geometric
property of the shape surface or volume. For example, shapes can be represented by histograms or
models with total signs of surfaces consisting of normals and curvatures [9], distances, angles, areas of
triangles or volumes of tetrahedra calculated for sample points of surfaces [10], properties of spherical
functions defined in volumetric grids [11], local shape diameters measured on tightly selected surface
points [12], and thermal core signatures on polygonal grids [13, 14]. The development of controlled
machine learning algorithms on top of such descriptions of three-dimensional figures creates several
problems. First, the size of organized databases with annotated 3D models is rather limited compared

V International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 343
Image Processing and Earth Remote Sensing
M A Bolotov, V А Pechenin, N V Ruzanov and E J Kolchina

to image data sets. For example, ModelNet contains about 150 thousand objects. In contrast, the
ImageNet database [15] already includes tens of millions of annotated images. Secondly, the
additional features and metrics of three-dimensional figures tend to be very multidimensional, which
makes the algorithms prone to retraining.
One of the latest works on the problems of object classification and the segmentation of individual
parts, in which an array of measured points is used directly at the entrance to the network, is [16].
Developed by authors from Stanford University, the network is named PointNet, the main idea of the
approach is to memorize the spatial features of each point and then merge all the individual features
into a marked general point cloud. The network is based on a convolutional architecture. The main
disadvantage of the reduced network is that at the input it is necessary to always have the same
number of points for all objects, which in practice is not possible and you will have to resort to an
artificial “distortion” of the data.
Based on the literature review and the specifics of the problem being solved, the decision was made
to use an approach based on the use of 2D projections of objects in solving the current problem. In this
approach, a convolutional neural network, U-net [17], is used for segmentation.
We will reveal the main idea and stages of the developed model for the segmentation of individual
faces. The main idea is to create projections (pictures) for the faceted model of measured data,
segmentation of faces on projections and identification of facets by segmented images. The stages of
the approach are shown in Figure 2.
1. Loading measured points and creating a triangulation grid—
loading stl file

2. Formation of file projections

3. Saving projection images

4. Image segmentation using a neural network

5. Identification of image pixels with facets in projection

6. Recreation of the complete model, additional refinement of the
surface boundaries in the direction of the normal vectors

Figure 2. A flowchart of the face recognition model using projections of measured data.
Consider the face recognition steps in more detail.

3.1. Loading measured points and creating a triangulation grid—loading stl file
As noted in the introduction, the model is designed to recognize geometry after measurement using
optical and laser scanners. After measurement using the scanner and preliminary data processing, a file
is created with the coordinates of points united into a facet surface of the *.stl format. The file contains
the following data: Vg×3 (matrix of coordinates of the vertices of the grid stl-model), Fm×3 (matrix of
combinations of three vertices forming the facets of the surfaces), Nm×3 (matrix of coordinates of facet
normals).

3.2. Formation of file projections
To enable semantic segmentation of the facets into separate surfaces using deep neural networks, it is
necessary to create projections of 3D surfaces on coordinate planes. To prepare the projections,
Roberts’ algorithm was used [18].

V International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 344
Image Processing and Earth Remote Sensing
M A Bolotov, V А Pechenin, N V Ruzanov and E J Kolchina

Roberts’ algorithm is the first known solution to the problem of removing invisible lines. This is a
mathematically elegant method that works in object space. The algorithm primarily removes from
each body the edge or edges that are screened by the body itself. Then each of the visible edges of
each body is compared with each of the remaining bodies to determine which part or parts, if any, are
shielded by these bodies. Therefore, the computational complexity of Roberts’ algorithm grows,
theoretically, with the square of the number of objects.
The operation of Roberts’ algorithm takes place in two stages:
1. Definition of non-face faces for each body separately.
2. Identify and remove invisible edges.
To prepare the data, only the first stage of the algorithm was used. The second stage is not
necessary for further decision; it is more complex and requires additional facets.
When creating projections on coordinate axes, orthogonal projections in the visual plane are
obtained. For example, when projected on the XOY plane, the structure Fm×3 is preserved, and the
matrix of the vertices Vg×3 is converted into Vg×2, having only coordinates along the x- and y-axes.
Bypassing the vertices of the obtained projections of the facets in the same sequence as in the original,
you can divide them into two types: those that are oriented counter-clockwise, which means that we
are looking at the facet from the outside of the body and those that are oriented clockwise, which
means that we are looking at the facet from the inside. The order of the vertices determines the
direction of the normal. Thus, if the component of the normal vector of the projection plane (in this
example, the component along the z-axis) is negative, we look at the facet from the inside. Since the
object is bounded by a closed surface, we cannot observe the faces from the inside—they are invisible.
Thus, it is necessary to exclude the facets identified by the above method from the Fm×3 structure,
obtaining the projection structure Fm1×3, where m1