Surface recognition of machine parts based on the results of optical scanning M A Bolotov1, V А Pechenin1, N V Ruzanov1 and E J Kolchina2 1 Samara National Research University, Institute of Engines and Power Plants, Moskovskoe Shosse, 34А, Samara, Russia, 443086 2 Stock company «RKC «Progress», Zemeca street, 18, Samara, Russia, 443009 e-mail: vadim.pechenin2011@yandex.ru Abstract. To predict the quality parameters of products (in particular, the assembly parameters) mathematical models were implemented in the form of computer models. To ensure the adequacy of calculations, it is necessary to have information about the actual geometry of the parts, which can be obtained using noncontact measurements of parts of the assembly. As a result of measuring parts and components using optical or laser scanner, a large dimension array of measured points is formed. After standard processing (e.g. noise removal, combining the scans, smoothing, creating triangulation mesh), the recognition of individual surfaces of parts becomes necessary. This paper presents a neural network model that allows the recognition of elements based on an array of measured points obtained by scanning. 1. Introduction The least automated step in the industry is the assembly of single and serial products characterized by medium and high complexity. These products include aircraft engines. The considered products are not made in large quantities as cars; they are characterized by a high degree of optionality and increased requirements for complexity and accuracy. The share of labour-consuming assembly in the total labour-intensiveness of products is up to 25% and largely determines their quality. There are several reasons that make it difficult to fully automate the assembly of these products. One of the significant reasons is the difficulty of determining the parameters of the operations performed by robots, which are guaranteed to ensure the specified accuracy and quality of products. The assembly of medium and high complexity products is a unique operation, during which the course of operations is changed according to the results of measurements and the geometric analysis of the assembled parts. Measurement of geometry is made by both contactless and contact methods. To partially automate engine assembly processes, it is necessary to recognize both the individual parts and the surfaces of the parts along which the assembly will take place. Face recognition is possible using computer vision approaches [1,2,3]. The aim of this work is to create a model based on the use of neural networks, designed to recognize the surfaces of engineering parts after their measurement using an optical or laser scanner. V International Conference on "Information Technology and Nanotechnology" (ITNT-2018) Image Processing and Earth Remote Sensing M A Bolotov, V А Pechenin, N V Ruzanov and E J Kolchina 2. Object of research To test the model, a real engine parts simulator was designed and manufactured: spacers in the turbine of an aircraft engine. A detail drawing is shown in Figure 1. The part contains cylindrical and flat edges, as well as threaded holes. Figure 1. Drawing part "spacer simulator". Automated element (surface) recognition of measured parts using neural networks solves two tasks: 1) segmentation of a part’s components into types of surfaces (plane, cylinder, cone, etc.) 2) additional refinement of the boundaries of triangulation surfaces based on deviations of the facet normal vectors. To solve the first problem, a convolutional neural network was used. 3. Neural network model of surface recognition Convolutional neural networks (CNN) are a very wide class of architectures which main idea is to reuse parts of the neural network to work with different small, local input areas [4]. The main area of application of convolutional architectures is image processing [5, 6]. At present, many approaches have been developed for recognizing three-dimensional objects in works devoted to computer vision. These approaches can be divided into two groups: recognition of elements, directly working on their own three-dimensional representations of objects, such as polygonal grids, voxel representations and arrays of points, and approaches based on signs and metrics that describe the shape of a three-dimensional object, "what it looks like" in the collection 2D projections [7]. Except for the recent work by Wu et al. [8], who studied form descriptions from a voxel-based object representation through three-dimensional convolutional networks, the previous three- dimensional shape descriptions were mostly “manually developed” according to a specific geometric property of the shape surface or volume. For example, shapes can be represented by histograms or models with total signs of surfaces consisting of normals and curvatures [9], distances, angles, areas of triangles or volumes of tetrahedra calculated for sample points of surfaces [10], properties of spherical functions defined in volumetric grids [11], local shape diameters measured on tightly selected surface points [12], and thermal core signatures on polygonal grids [13, 14]. The development of controlled machine learning algorithms on top of such descriptions of three-dimensional figures creates several problems. First, the size of organized databases with annotated 3D models is rather limited compared V International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 343 Image Processing and Earth Remote Sensing M A Bolotov, V А Pechenin, N V Ruzanov and E J Kolchina to image data sets. For example, ModelNet contains about 150 thousand objects. In contrast, the ImageNet database [15] already includes tens of millions of annotated images. Secondly, the additional features and metrics of three-dimensional figures tend to be very multidimensional, which makes the algorithms prone to retraining. One of the latest works on the problems of object classification and the segmentation of individual parts, in which an array of measured points is used directly at the entrance to the network, is [16]. Developed by authors from Stanford University, the network is named PointNet, the main idea of the approach is to memorize the spatial features of each point and then merge all the individual features into a marked general point cloud. The network is based on a convolutional architecture. The main disadvantage of the reduced network is that at the input it is necessary to always have the same number of points for all objects, which in practice is not possible and you will have to resort to an artificial “distortion” of the data. Based on the literature review and the specifics of the problem being solved, the decision was made to use an approach based on the use of 2D projections of objects in solving the current problem. In this approach, a convolutional neural network, U-net [17], is used for segmentation. We will reveal the main idea and stages of the developed model for the segmentation of individual faces. The main idea is to create projections (pictures) for the faceted model of measured data, segmentation of faces on projections and identification of facets by segmented images. The stages of the approach are shown in Figure 2. 1. Loading measured points and creating a triangulation grid— loading stl file 2. Formation of file projections 3. Saving projection images 4. Image segmentation using a neural network 5. Identification of image pixels with facets in projection 6. Recreation of the complete model, additional refinement of the surface boundaries in the direction of the normal vectors Figure 2. A flowchart of the face recognition model using projections of measured data. Consider the face recognition steps in more detail. 3.1. Loading measured points and creating a triangulation grid—loading stl file As noted in the introduction, the model is designed to recognize geometry after measurement using optical and laser scanners. After measurement using the scanner and preliminary data processing, a file is created with the coordinates of points united into a facet surface of the *.stl format. The file contains the following data: Vg×3 (matrix of coordinates of the vertices of the grid stl-model), Fm×3 (matrix of combinations of three vertices forming the facets of the surfaces), Nm×3 (matrix of coordinates of facet normals). 3.2. Formation of file projections To enable semantic segmentation of the facets into separate surfaces using deep neural networks, it is necessary to create projections of 3D surfaces on coordinate planes. To prepare the projections, Roberts’ algorithm was used [18]. V International Conference on "Information Technology and Nanotechnology" (ITNT-2018) 344 Image Processing and Earth Remote Sensing M A Bolotov, V А Pechenin, N V Ruzanov and E J Kolchina Roberts’ algorithm is the first known solution to the problem of removing invisible lines. This is a mathematically elegant method that works in object space. The algorithm primarily removes from each body the edge or edges that are screened by the body itself. Then each of the visible edges of each body is compared with each of the remaining bodies to determine which part or parts, if any, are shielded by these bodies. Therefore, the computational complexity of Roberts’ algorithm grows, theoretically, with the square of the number of objects. The operation of Roberts’ algorithm takes place in two stages: 1. Definition of non-face faces for each body separately. 2. Identify and remove invisible edges. To prepare the data, only the first stage of the algorithm was used. The second stage is not necessary for further decision; it is more complex and requires additional facets. When creating projections on coordinate axes, orthogonal projections in the visual plane are obtained. For example, when projected on the XOY plane, the structure Fm×3 is preserved, and the matrix of the vertices Vg×3 is converted into Vg×2, having only coordinates along the x- and y-axes. Bypassing the vertices of the obtained projections of the facets in the same sequence as in the original, you can divide them into two types: those that are oriented counter-clockwise, which means that we are looking at the facet from the outside of the body and those that are oriented clockwise, which means that we are looking at the facet from the inside. The order of the vertices determines the direction of the normal. Thus, if the component of the normal vector of the projection plane (in this example, the component along the z-axis) is negative, we look at the facet from the inside. Since the object is bounded by a closed surface, we cannot observe the faces from the inside—they are invisible. Thus, it is necessary to exclude the facets identified by the above method from the Fm×3 structure, obtaining the projection structure Fm1×3, where m1