Investigation of algorithms for generating surfaces of 3D models based on an unstructured point cloud E.S.Glumova, A.D.Filinskikh glumova.ek@yandex.ru | alexfil@yandex.ru Nizhny Novgorod State Technical University n a. R.Е. Alekseev Methods of 3D object model creation on the basis of unstructured (sparse) cloud of points are considered in the paper. The issues of combining point cloud compaction methods and subsequent surface generation are described. The comparative analysis of generation surfaces algorithms for the purpose of revealing of more effective method using as input data the depth maps received from the sparse cloud of points is carried out. The comparison is made by qualitative, quantitative and temporal criteria. The optimal method of 3D object model creation on the basis of unstructured (sparse) cloud of points and depth map data is chosen. The mathematical description of the point cloud compaction method on the basis of stereo-matching with application of two-phase algorithm of species search and depth map extraction from Multi-View Stereo for Community Photo Collections source image set is provided. The implementation of the method in open-source software Regard3D is realized in practice. Key words: 3D model photogrammetry, surface generation, point cloud, depth maps. contours, etc. The keypoints are described by descriptors - 1. Introduction vectors of features computed on the basis of Today, the development of computing and surface intensity/gradients or other characteristics of the restoration technologies allows to recreate 3D models of neighborhood points. objects with high accuracy and high quality. One of such The most popular feature descriptors used in modern technologies is laser scanning. With the help of laser image processing systems are given in [5]. A-KAZE scanners, it is possible to get the geometry of high (nonlinear diffusion filtering for detecting and describing accuracy, but unfortunately the devices that allow to 2D objects) is used to solve the problem of keypoint achieve accuracy in hundredths of millimeters cost tens detection. and hundreds of millions of rubles. One of the types of Then the camera position is assessed and a cloud of non-contact scanning of objects is photogrammetry. The low density points or sparse points is selected. Keypoints cost of equipment for obtaining geometric data about an in multiple images are matched using approximate nearest object is hundreds of times lower than the equipment that neighbor and ‘tracks’, linking specific keypoints in a set of uses laser technology, and the main load for obtaining pictures. Tracks comprising a minimum of two keypoints high-quality models falls on the software. and three images are used for point-cloud reconstruction, 3D objects models are widely used in the field of with those which fail to meet these criteria being parametric architecture [1], the industry of computer video automatically discarded [6]. After that triangulation is games and animation [2], in the development of scenes for used to estimate points three-dimensional positions and VR applications, as well as in mobile development [3]. gradual reconstruction scene geometry fixed into a relative The quality of the model plays an important role in any of coordinate system. these areas. It is important to distribute computational An enhanced density point-cloud can be derived by resources of software correctly. implementing the Multi-View Stereo (MVS) algorithm There are quite a lot of various software on the market [7], based on depth maps, the Clustering Views for Multi- for processing images and obtaining 3D models by series View Stereo (CMVS) [8], the Patch-based MVS algorithm of images. There are both paid software, costing about one (PMVS2) [9], the Shadow-Aware Multi-View Stereo hundred thousand rubles, and free open source software. Algorithm (SMVS) [10], that combines stereo and shape- In both cases, different algorithms are used at all stages from-shading energies into a single optimization scheme. from photo processing to obtaining a 3D model. The camera positions obtained from a sparse point cloud are used here as input data. The result of this additional 2. SfM Principles processing is a significant increase in point density. The color and texture information is then transferred to One of the photogrammetry methods is the one of a point cloud, after which the final 3D model is rendered. building a 3D structure by a set of images - Structure from Simplified process of obtaining 3D-model based on the Motion. The method feature is automatic determination of images is shown in Fig. 1. camera internal parameters [4]. This method restores such A stage of reception of surface generation on the basis camera parameters as the extrinsic calibration (the of the received unstructured cloud of points by a 3D- orientation and position of the camera) and the intrinsic reconstruction method MVS (Multi-View Stereo) are calibration (focal length, radial distortion of the lens). considered separately [7]. The first step of SfM realization is to detect and match MVS is based on reconstructing a depth map for each point features in the input images. Special points (term view (image). Despite the large redundancy of the output vary in different sources) - to put it informally - "well data, the method has proven to be well suited for restoring detectable" fragments of an image. These are points the detailed geometry of sufficiently large scenes. Another (pixels) with a characteristic (special) neighborhood - i.e. advantage of depth maps as an intermediate representation different from all neighboring points. Local features is that the geometry is parameterized in its natural domain, examples can be corner tops, isolated point features, Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) and per-view data (such as color) is directly available from ˗ SfM, which reconstructs the parameters of the the images. The excessive redundancy in the depth maps cameras; can cause problems; not so significant in terms of storage, ˗ MVS for establishing dense; but in terms of computational power [11]. ˗ surface generation (meshing), which merges the MVS MVS includes 3 stages: geometry into a globally consistent, colored mesh. Fig. 1. Simplified process of obtaining a 3D model based on a set of images Global view. For each reference view R, global view 3. Point Cloud Compression selection seeks a set N of neighboring views that are good In accordance with Fig. 1, once the camera parameters candidates for stereo matching in terms of scene content, are known, dense geometry reconstruction is performed by appearance, and scale. In addition, the neighboring views Multi-View Stereo for Community Photo Collections should provide sufficient parallax (a change in the (MVSCPC) [12], that reconstructs depth maps for each apparent position of an object relative to a distant image. The depth map represents the two-dimensional background, depending on the position of the observer) one-channel image containing the information about with respect to R and each other in order to enable a stable distance from a sensor plane to scene objects [13]. match. Here we describe a scoring function designed to The method is based on the idea of selecting images measure the quality of each candidate neighboring view from the collection so that they match both per-view and based on these desiderata. per-pixel level. Appropriate choice of views ensures Since matches and sparse point cloud extracted in the reliable matches even with strong differences in images. SfM phase are not sufficient indicators for accurate surface The stereo matching algorithm takes as input sparse 3D reconstruction (as they are extracted based on the points reconstructed from SfM and iteratively grows similarity of only the scene content), another assessment surfaces from these points. Optimizing for surface norms of image matches reliability was proposed. with a photoconsistency measure significantly improves A global score 𝑔𝑔𝑅𝑅 for each view 𝑉𝑉 within a candidate the matching results. The depth map quality is also neighborhood 𝑁𝑁 (which includes 𝑅𝑅) as a weighted sum assessed. over features shared with 𝑅𝑅 is computed as: Stereo matching is performed at each pixel by 𝑔𝑔𝑅𝑅 (𝑉𝑉) = � 𝑤𝑤𝑁𝑁 (𝑓𝑓) ∙ 𝑤𝑤𝑠𝑠 (𝑓𝑓) (1) optimizing for both depth and normal, starting from an 𝑓𝑓∈𝐹𝐹𝑉𝑉 ∩𝐹𝐹𝑅𝑅 initial estimate provided by a sparse point cloud. During where 𝐹𝐹𝑥𝑥 is the set of feature points observed in view 𝑋𝑋, stereo optimization, poorly matching views can be and the weight functions are described below. discarded and new ones added according to the local view To encourage a good range of parallax within a selection criteria. The detour Pixels can be revised and neighborhood, the weight function 𝑤𝑤𝑁𝑁 (𝑓𝑓) is defined as a their depth updated if a more accurate match is found [11]. product over all pairs of views in 𝑁𝑁: MVSCPC provides depth map assessment for each 𝑤𝑤𝑁𝑁 (𝑓𝑓) = � 𝑤𝑤𝛼𝛼 (𝑓𝑓, 𝑉𝑉𝑖𝑖 , 𝑉𝑉𝑗𝑗 ) input image - each image serves as a reference view only 𝑉𝑉𝑖𝑖 ,𝑉𝑉𝑗𝑗 ∈𝑁𝑁 (2) once, after which a two-level view selection algorithm is 𝑖𝑖 ≠ 𝑗𝑗, 𝑓𝑓 ∈ 𝐹𝐹𝑉𝑉𝑖𝑖 ∩ 𝐹𝐹𝑉𝑉𝑗𝑗 implemented. At the image level, global view selection 2 determines for each reference view a set of good neighbor where 𝑤𝑤𝛼𝛼 �𝑓𝑓, 𝑉𝑉𝑖𝑖 , 𝑉𝑉𝑗𝑗 � = min(�𝛼𝛼�𝛼𝛼𝑚𝑚𝑚𝑚𝑚𝑚 � , 1) and 𝛼𝛼 is the images to use for stereo matching. angle between the lines of sight from 𝑉𝑉𝑖𝑖 and 𝑉𝑉𝑗𝑗 to 𝑓𝑓. The function 𝑤𝑤𝛼𝛼 �𝑓𝑓, 𝑉𝑉𝑖𝑖 , 𝑉𝑉𝑗𝑗 � downweights triangulation During stereo matching, 𝐴𝐴 is iteratively updated using angles below 𝛼𝛼𝑚𝑚𝑚𝑚𝑚𝑚 , which is usuall set to 10 degrees. The a set of local view selection criteria designed to select quadratic weight function serves to counteract the trend of views that, given a current depth and normal pixel greater numbers of features in common with decreasing estimates, are photometrically consistent and provide a angle. sufficiently wide range of observation directions. To The weighting function 𝑤𝑤𝑠𝑠 (𝑓𝑓) measures similarity in measure the photometric consistency, the mean-removed resolution of images 𝑅𝑅 and 𝑉𝑉 at feature 𝑓𝑓. The diameter normalized cross correlation (NCC) between pixels within 𝑠𝑠𝑉𝑉 (𝑓𝑓) of a sphere centered at 𝑓𝑓 whose projected diameter a window about the given pixel in 𝑅𝑅 and the corresponding in 𝑉𝑉 equals the pixel spacing in 𝑉𝑉 is computed to estimate window in V is used. If the NCC score is above a fixed the 3D sampling rate of 𝑉𝑉 in the vicinity of the feature 𝑓𝑓. threshold, then 𝑉𝑉 is a candidate for addition to 𝐴𝐴. Similarly, 𝑠𝑠𝑅𝑅 (𝑓𝑓) is calculated for 𝑅𝑅 and the scale You can measure the angular distribution by looking at 𝑠𝑠 (𝑓𝑓) gaps of directions from which the given scene point (based weight 𝑤𝑤𝑠𝑠 is defined based on the ratio 𝑟𝑟 = 𝑅𝑅 �𝑠𝑠 (𝑓𝑓) on the current depth estimation for the reference pixel) is 𝑉𝑉 using observed. In practice, the angular spread of the epipolar 2� , 2 ≤ 𝑟𝑟 𝑟𝑟 line [15] is considered instead, obtained by projecting each 𝑤𝑤𝑠𝑠 (𝑓𝑓) = �1,1 ≤ 𝑟𝑟 < 2 (3) viewing ray passing through the reference point to the 𝑟𝑟, 𝑟𝑟 < 1 reference view. When deciding whether to add view 𝑉𝑉 to This weight function prefers views with equal or the active set 𝐴𝐴, the local score is calculated as higher resolution than a reference view. Having defined a 𝑙𝑙𝑅𝑅 (𝑉𝑉) = 𝑔𝑔𝑅𝑅 (𝑉𝑉) ⋅ � 𝑤𝑤𝑒𝑒 (𝑉𝑉, 𝑉𝑉′) (5) global estimate of species 𝑉𝑉 and neighbors 𝑁𝑁, one can find 𝑉𝑉 ′ ∈𝐴𝐴 the best 𝑁𝑁 of a given size (usually |𝑁𝑁| = 10) by the sum 𝛾𝛾 where 𝑤𝑤𝑒𝑒 (𝑉𝑉, 𝑉𝑉 ′ ) = min ( �𝛾𝛾𝑚𝑚𝑚𝑚𝑚𝑚 , 1) and 𝛾𝛾 is the acute of species estimates ∑𝑉𝑉∈𝑁𝑁 𝑔𝑔𝑅𝑅 (𝑣𝑣). For efficiency, a "greedy angle between the pair of epipolar lines in the reference algorithm" [14] is used and grow the neighborhood view as described above. Accept 𝛾𝛾𝑚𝑚𝑚𝑚𝑚𝑚 =10 degrees. incrementally by iterative adding to 𝑁𝑁 the highest scoring Then the local view selection algorithm is performed view, taking into account the current 𝑁𝑁 (which initially in the following way. Taking the initial depth of the pixel, contains only 𝑅𝑅). the view 𝑉𝑉 with the highest 𝑙𝑙𝑅𝑅 (𝑉𝑉) value is found. If this Rescaling Views. Although global view selection view has a sufficiently high NCC score (threshold 5 is algorithm tries to select neighboring views with used), it is added to 𝐴𝐴; otherwise, the view is rejected. The compatible scale, some inconsistencies in scale are process is repeated until either set 𝐴𝐴 reaches the desired unavoidable due to differences in resolution within the size or the view remains undecided. During stereo collection of photos, which may negatively affect stereo matching, the depth (and normal) are optimized, and a matching. There are methods to adapt the scale of all views view may be removed (and marked as rejected). Then a by filtering to a common, narrow range or global, pixel- replaced view is added. The algorithm completes as the based view. The first method is used in this research to deflected views are never revised. avoid resizing of the matching window in different areas of the depth map. This approach finds a view with the 4. Surface Generation lowest-resolution 𝑉𝑉𝑚𝑚𝑚𝑚𝑚𝑚 ∈ 𝑁𝑁 relative to 𝑅𝑅, resamples 𝑅𝑅 to approximately match that lower resolution, and then After computing arrays containing the best matching resamples higher resolution to match 𝑅𝑅. candidates for each image, you can move towards the step In particular, the assessment the resolution scale of a of surface generation. Merging the individual depth maps view 𝑉𝑉 relative 𝑅𝑅 is based on their common features into a single polygonal surface is a labor intensive task. 1 𝑠𝑠𝑅𝑅 (𝑓𝑓) The depth maps inherit information about the multi-scale 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑅𝑅 (𝑉𝑉) = � (4) properties of the original images, which leads to vastly |𝐹𝐹𝑉𝑉 ∩ 𝐹𝐹𝑅𝑅 | 𝑠𝑠𝑉𝑉 (𝑓𝑓) 𝑓𝑓∈𝐹𝐹𝑉𝑉 ∩𝐹𝐹𝑅𝑅 different sampling rates of the research surfaces. Then 𝑉𝑉𝑚𝑚𝑚𝑚𝑚𝑚 simply equals 𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚 𝑉𝑉∈𝑁𝑁 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑅𝑅 (𝑉𝑉). Many approaches for depth maps fusion have been If 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑅𝑅 (𝑉𝑉) is less than the threshold value 𝑡𝑡 (𝑡𝑡 = 1, proposed [16-20]. Among them FSSR (Floating Scale which is close to the 5x5 of the reference window on a 3x3 Surface Reconstruction) [18] and SPSR (Screened Poisson window in the neighboring view with the lowest relative Surface Reconstruction) [19] were considered as methods scale), the reference view is rescaled so that, after rescaling of surface generation, as they provide high detail of the 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑅𝑅 (𝑉𝑉) = 𝑡𝑡. Then all neighboring views with reconstructed 3D model. 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑅𝑅 (𝑉𝑉) > 2 to match the scale of the reference view FSSR is widely used as outdoor scene reconstruction, (which itself may have been changed in the previous step). when data is too sparse for a reliable reconstruction. In this It is important that all modified versions of the images are case the method does not hallucinate geometry in discarded when moving to the depth map computation for incomplete regions, requiring manual intervention, but the next reference view. leaves in these areas holes (i.e. these areas have gaps). Local View. Global view selection determines a set of The approach draws upon a simple yet efficient 𝑁𝑁 well suited candidates for a reference view and matches mathematical formulation to construct an implicit function their scale. Instead of using all of these views for stereo as the sum of compactly supported basis functions. The matching at a specific location in the reference view, the implicit function has spatially continuous “floating” scale smallest set 𝐴𝐴 ⊂ 𝑁𝑁 of active views is selected (usually and can be readily evaluated without any preprocessing. |𝐴𝐴| = 4). Using this subset naturally speeds up the The final surface is extracted as the zero-level set of the computation of the depth map. implicit function. One of the key properties of the approach is that it is virtually parameter-free even for retains the same finite-element discretization, the sparse complex, mixed-scale datasets [18]. structure is unchanged and the system can still be resolved The FSSR method combines all depth maps in one using a multi-mesh approach. large point cloud. At this stage, the scale value is attached In addition, Poisson's surface reconstruction presents to each point, indicating the factual size of the surface area several algorithmic improvements that together reduce the in which the point was measured. This value is derived time complexity of the solution to linear in the number of from the size of the regions identified in the MVS phase. points, thus enabling faster and better surface Then FSSR tools calculate a multi-scale 3D surface. reconstruction [19]. SPSR is an improvement of the approach that considers surface reconstruction as a spatial Poisson 5. Algorithm Comparison problem [20]. The approach explicitly incorporates the Consider a combination of MVS-FSSR and MVS- point as interpolation constraints. Unlike other methods of SPSR approaches here in more detail. image processing and geometry processing, the term Implementation is studied on the example of 21 photos screening is defined for a sparse set of points rather than of the statuette (Fig. 2) and freely distributed software for the whole area. These rare constraints, however, can be Regard3D and MeshLab. effectively integrated. Since the modified linear system Fig. 2. Set of original photos In Fig. 3. shows the detection of keypoints by require the invariance to the image scaling, rotation, noise Regard3D. This image contains 14,486 keypoints. and changes in illumination. These key points are then matched to establish sparse matches between images (Fig. 4). The image features Fig. 3. Object keypoints Fig. 4. The result of key point comparison for a pair of original images The results of the pairing are then combined and reconstruction until all the reconstructed views become unfolded into multiple views, creating functional tracks. part of the scene. Parameters of lens distortion are The next step in SfM implementation is incremental evaluated during the reconstruction. The performance of triangulation algorithm. It assesses the relative position of the following algorithms is significantly improved by a well-matched original pair of the image, and then all removing distortions from the original images. tracks visible in both images are triangulated. The In Regard3D's "ideology", this method is called New matching next images are incrementally added to the Incremental. The result is a sparse point cloud (Fig. 5). Fig. 5. The result of triangulated point cloud computing using the New Incremental method 21 cameras (according to the number of uploaded compaction using the MVS method is shown in the figure. images) have been calibrated by the program, i.e. 3D 6. The computation time was 40.42 minutes; 4 756 185 positions and parameters of all images have been found. points were created. As you can see, the point cloud has 13 583 points has received that match not only the model, holes on the side of the figure (Fig. 6). but also some part of the environment. The calculation The corresponding depth map was obtained using the time of a point cloud has made slightly less than 30 s. MeshLab program (Fig.7). Further we will proceed to compression of the sparse point cloud by MVS method. The result of point Fig. 6. Dense point cloud using MVS method Fig. 7. Depth map Surface generation by FSSR and SPSR. In Fig. 8. the Fig. 9 shows the result of FSSR method surface results of calculations in the Regard3D command line are construction. The calculation time was 17.02 min. The presented, illustrating the iterative algorithm of finding the final surface contains 1,369,758 points. The model also best candidates for comparison described above. contains small noises and has gaps. In the right picture, you can see that the model has a big hole. This is due to the fact that a shadow falls on this area in the original images. The lighting change is interpreted by the program as a lack of data for point reconstruction, because the shaded area is found in only 2- 3 species out of 21, which was a rejection of its revision and surface reconstruction. Fig. 10 shows the result of surface reconstruction using the SPSR method. The calculation time was 1.22 min. The final surface contains 301,497 points. The model also Fig. 8. Regard3D command line. FSSR implementation contains little noise and has gaps. In the right picture, you can see that the model has an In general, 21 reports were produced - according to the even greater gap than the previous method. number of uploaded images. You can find the views We will compare the obtained models by several recommended for comparison view, as well as the number indicators (Table 1). of optimized points, i.e. points that have updated the depth map data and normal in accordance with the described algorithm. Fig. 9. MVS model - Floating Scale Surface Reconstruction Fig. 10. MVS model - Screened Poisson Surface Reconstruction Table 1. Comparison of final models. FSSR SPSR Visual assessment of details Denser mesh with lower gap area Less dense grid with larger gap area Calculation time 17,02 min 1,22 min Number of points 1 369 758 301 497 Model size.obj 401 Mb 85,1 Mb 6. Conclusion Society Conference on Computer Vision and Pattern Recognition (2007). Two surface generation algorithms were considered [10] Langguth F. et al.: Shading-aware multi-view stereo. during the research: Screened Poisson Surface European Conference on Computer Vision, Springer, Reconstruction point approach and Floating Scale Surface Cham, pp. 469-485 (2016). Reconstruction approach. In connection with the method [11] Seitz S. M. et al.: A comparison and evaluation of of point compression, the considered algorithms showed multi-view stereo reconstruction algorithms. 2006 different temporal and quantitative results. The result of IEEE computer society conference on computer comparison of final 3D-models generated by these vision and pattern recognition (CVPR'06), Т.1, pp. methods is shown, reduction of time expenses in SPSR 519-528 (2006). method does not give qualitative result. Model MVS - [12] Goesele M. et al.: Multi-view stereo for community Screened Poisson Surface Reconstruction is a much less photo collections.2007 IEEE 11th International dense mesh than model MVS - Floating Scale Surface Conference on Computer Vision, pp. 1-8 (2007). Reconstruction. On the basis of the received data it is [13] Voronin, V, Fisunov, A., Marchuk, V., Svirin, I., possible to draw a conclusion that for reception of Petrov, S.: Restoration of the depth map based on the qualitative 3D-models on the basis of not structured point combined processing of a multi-channel image (in cloud it is necessary to use the algorithm of generation of Russian). Modern problems of science and education, the surface based on changing scale of images. The surface 6, (2014). generation algorithm based on a point approach can be [14] Greedy algorithms, https://habr.com/ru/post/120343/. used for small collections of photos that do not contain Last accessed 26 June 2020. multiscale images. Reduction of computational power in [15] Basics of Stereo Vision, model preparation, as well as their small volume can be https://habr.com/ru/post/130300/. Last accessed 26 used, for example, for low-polygonal modeling in the May 2020. mobile applications. [16] Curless B., Levoy M.: A volumetric method for In the future, it is planned to conduct a comparative building complex models from range images. analysis of existing algorithms based on depth map data, Proceedings of the 23rd annual conference on as well as approaches that take into account changes in Computer graphics and interactive techniques, pp. illumination in photographs. 303-312 (1996). [17] Fuhrmann S., Goesele M.: Fusion of depth maps with References multiple scales. ACM Transactions on Graphics [1] Sosnina, O., Filinskikh, A.; Lozhkina, N.: Analysis of (TOG), Т. 30 (6), pp. 1-8 (2011). the virtual nontrivial forms models creation methods [18] Fuhrmann S., Goesele M.: Floating scale surface (in Russian). Information technologies, Т. 25. (11), reconstruction. ACM Transactions on Graphics pp. 679-681 (2019). (ToG), Т. 33 (4), pp. 1-11 (2014). [2] Sosnina, O., Filinskikh, A., Korotaeva, A.S.: [19] Kazhdan M., Hoppe H.: Screened poisson surface Comparison of the low-polygonal 3D model creation reconstruction. ACM Transactions on Graphics methods (in Russian). Information technologies, Т. (ToG), Т. 32 (3), pp. 1-13 (2013). 23(8), pp. 564-568 (2017). [20] Kazhdan M., Bolitho M., Hoppe H.: Poisson surface [3] Malysheva, A., Tomchinskaya, T.: Features of the reconstruction. Proceedings of the fourth low-polygon modeling and texturing in the mobile Eurographics symposium on Geometry processing, Т. applications (in Russian). CONFERENCE 7, (2006) KOGRAF-2019, ISBN 978-5-502-01200-3, p. 51-54. [4] Westoby, Matthew J., et al.: Structure-from- About the authors Motion’photogrammetry: A low-cost, effective tool Glumova Ekaterina S. – student of Nizhny Novgorod State for geoscience applications. Geomorphology, 179, pp. Technical University n.a. R.E. Alekseev, e-mail: 300-314 (2012). glumova.ek@yandex.ru. [5] Kublanov V. and others: Biomedical signals and Filinskikh Aleksandr D., Ph.D. in Technology, Associate images in digital healthcare: storage, processing and Professor, Nizhny Novgorod State Technical University n.a. R.E. analysis: a training manual (in Russian), pp. 193-195, Alekseev. E-mail: alexfil@yandex.ru (2020). [6] Snavely K.: Scene reconstruction and visualization from internet photo collections. USA : University of Washington, (2008). [7] Fuhrmann, Simon, Fabian Langguth, and Michael Goesele: Mve-a multi-view reconstruction environment. GCH (2014). [8] Clustering views for multiple stereo views (CMVS), https://www.di.ens.fr/cmvs/.Last accessed 10 May 2020. [9] Furukawa Y., Ponce J.: Accurate, Dense, and Robust Multi-View Stereopsis (PMVS). IEEE Computer