An Approach for Managing Hybrid Supercomputer Resources in Photogrammetric Tasks Nikita Voinov1[0000−0002−0140−1178] , Ivan Selin1 , Pavel Drobintsev1 , and Vsevolod Kotlyarov1 Peter the Great St.Petersburg Polytechnic University, Saint Petersburg, Russia voinov@ics2.ecd.spbstu.ru Abstract. This paper describes an approach to managing resources of the supercomputer for effective execution of stereo photogrammetric tasks with Agisoft PhotoScan software. A research was made to establish the performance characteristics in order to obtain proper deployment. Keywords: Photogrammetry · High performance computing · Hybrid supercomputer 1 Introduction Stereo photogrammetry tasks are a part of a Big Data analyzing and processing task classes. The aim of the photogrammetry is to build 3-dimensional (3D) model based on the set of 2-dimensional (2D) images (photos). Precise modelling of terrain/land environment from the data acquired by drones is extra hard to perform due to huge amount of input data, which can be measured in tens and hundreds of thousands of photos. A supercomputer is needed to effectively process described task, so that the result will be obtained in a reasonable time. An approach to managing the supercomputer resources is studied on the Agisoft PhotoScan Software [1], which is used as a benchmark (this software was used due to requirements of the customer sponsored this project). PhotoScan allows user to build different types of 3D models, such as elevation, tiled and polygonal models from the set of 2D images. The process itself is very complex and divided into several stages with different requirements for a computer [2]. Performance analysis of different deployment configurations helps to determine bottlenecks and ways to overcome them. Therefore, proper deploy can result in a significant performance and stability gain on this type of tasks. 2 Obtaining Models in PhotoScan Looking at the PhotoScan calculating process, there are two mandatory steps (aligning photos and building model): Managing Hybrid Supercomputer Resources in Photogrammetric Tasks 13 1. Determining camera positions. The first stage is camera alignment. At this stage PhotoScan searches for common points on photographs and matches them, as well as it finds the position of the camera for each picture and refines camera calibration pa- rameters. As a result a sparse point cloud and a set of camera positions are formed. The sparse point cloud represents the results of photo alignment and will not be directly used in the further 3D model construction procedure (except for the sparse point cloud based reconstruction method). However it can be exported for further usage in external programs. For instance, the sparse point cloud model can be used in a 3D editor as a reference. On the contrary, the set of camera positions is required for further 3D model reconstruction by PhotoScan. 2. Building Dense Cloud. The next stage is building dense point cloud. Based on the estimated camera positions and pictures themselves a dense point cloud is built by PhotoScan. Dense point cloud may be edited and classified prior to export or proceeding to 3D mesh model generation. 3. Next steps depend on the type of model user wants to obtain and have their own specifics. In general, PhotoScan reconstructs a 3D model representing the object sur- face based on the dense or sparse cloud according to user’s choice. There are two algorithmic methods available in PhotoScan that can be applied to 3D mesh generation: Height Field – for planar type surfaces and Arbitrary – for any kind of object [3]. In this study we are focusing on the obtaining the orthomosaic, therefore these steps will present in the calculation process: 1. Match Photos. 2. Align Photos. 3. Build Dense Cloud. 4. Build DEM (Digital Elevation Model). 5. Build Orthomosaic. From all stages of processing, building dense cloud is the most resource de- manding [2, 4]. Other steps do not affect overall time this much. Known PhotoScan performance studies are conducted for single-machine use case, when all the processing is being executed on the same machine where user works. This is unsuitable for big projects because of inability to process large sets of data. 3 Hybrid Supercomputer “Polytechnic” Supercomputer “Polytechnic” is a hybrid complex with peak performance more than 1 PFlops. Supercomputer was developed by Russian company RSC [6]. The aims of the supercomputer (one of the most modern in Russia) are to improve 14 N. Voinov et al. the efficiency of the fundamental and applied scientific research of SPbPU; train engineers with a high level of competence in the use of supercomputer technol- ogy for developing high-tech products; set up a university-based regional center of competence in the field of using supercomputer technology in knowledge- intensive sectors of the economy (power plant engineering, aircraft engineering, bioengineering, radioelectronics); etc. Hybrid supercomputer consists of three clusters: Tornado [7], Numascale [8] and PetaStream [9] (Fig. 1). In our study we deploy PhotoScan on Tornado and Numascale clusters. Fig. 1. Hybrid supercomputer “Polytechnic” Tornado specs: 668 nodes with two Intel E5-2697 v3 CPUs; 64 GB RAM; IB FDR; 56 nodes also have 2 NVIDIA Tesla K40 GPGPU each. Overall: 1336 CPUs; 18704 x-86 cores; 112 GPGPU; 42752 GB RAM. Numascale specs: 64 nodes with three AMD Opteron 6380 CPUs; 192 GB RAM, cache-coherent access to non-uniform memory (CC-NUMA) [5]; IB FDR, 3-dimensional torus topology. Overall: 192 CPUs; 3072 x-86 cores; 12288 GB RAM. Nodes can be casted to groups of four and more nodes with memory access between nodes. Data Storage System consists of two parts: parallel Lustre Data Storage System (DSS) with 1 PB capacity and modular DSS for cloud with 0.5 PB capacity. 4 Performance Analysis and Deployment Suggested approach is based on the ability of the supercomputer to connect clusters through the Ininiband. By dividing the tasks of analyzing and process- ing data on several chunks we can combine supercomputer resources in a more efficient way, which results in faster processing. Application of this approach on Agisoft PhotoScan software can be seen below, with different deploy configurations of supercomputer clusters. Size of Managing Hybrid Supercomputer Resources in Photogrammetric Tasks 15 input data used for the experiment varies from 10000 photos (100 GB) to 100000 (1 TB). 4.1 10000-50000 Photos Processing Results Following configurations were used during this experiment: 1. 8 Numascale nodes. 2. 8 Tornado nodes. 3. 8 Tornado nodes with GPGPU (Tornado-k40). In Table 1 you can see the processing times for each step of 10000 photos project with different configurations. The time is shown in minutes. Table 1. Processing times for 10000 photos project Step 8xNUMA 8xTornado 8xTornado-k40 Match Photos 149 73 73 Align Photos 82 32 32 Build Dense Cloud 667 399 118 Build DEM 19 10 10 Build Orthomosaic 50 33 33 Overall 967 547 266 Tornado and Tornado-k40 differ only in Build Dense Cloud Step, because GPGPU is only used on this step [5]. Using of GPGPU greatly increases perfor- mance on this step and thus overall performance. Final processing time is nearly halved, because building dense cloud is the heaviest step in the line. Despite the fact that Numascale nodes have three times more RAM, it does not boost the performance. Numascale nodes are 1.7-4 times slower than Tornado nodes. Trend holds for projects with size from 10000 to 50000 photos. Obtained performance evaluation will be later used for proper deployment. 4.2 Processing More Than 50000 Photos Some problems can be encountered when processing large projects, which contain more than 50000 photos. These issues appear because of the lack of RAM on Tornado nodes, because only 64 GB is available. Paper [4] shows that steps of the PhotoScan process are dependent on each other and thus cannot be processed in parallel. It is a thing to be reckoned with while resource management. During the experiments, it was observed that pro- cessing steps are consist of subtasks that are not equal. Some of these subtasks require sequential processing only on one node. In most cases, in this subtask results from all processing nodes are merged or other subtasks that require syn- chronization with previous pieces of work. 16 N. Voinov et al. It was found that during these subtasks Tornado nodes were suspended. Thanks to log analysis routine, the cause of suspend was found. It was happening due to memory overflow. Also, problematic step and subtask were determined. Problems were occur- ring in the Align Photos stage. Some experiments were successfully held in order to confirm the theory. To overcome the issue an alteration in cluster configu- ration was made. Numascale node was added to processing nodes, because it has three times as much memory somewhat comparable performance. Numas- cale node was included in the nodes list with top priority, so that the server which distributes the tasks will always address the problematic subtasks to the Numascale node. On the plus side, after adding the Numascale node it became possible to process projects up to 75000 photos. On the other side, performance slightly dropped because Numascale node is slower than Tornado and Tornado-k40. Keeping in mind the Amdal’s law [10], inclusion of Numascale node reduces the overall performance of the system as it is the slower than others and the overall computing time can’t be smaller than the computing time of the slowest node. But single Numascale node was not enough to process 100000 photos project. So, four Numascale nodes were unified into a group with summary memory of 768 GB. The project was completed successfully, but it was noted that perfor- mance of Numascale group is slower than single Numascale node, reducing the performance even more. Results of experiments can be seen in Table 2. The time is shown in minutes. Table 2. Align Photos performance on different projects and configurations Step 50000 75000 100000 (tornado-k40) (numa+ +40*tornado-k40) (4*numa+ +40*tornado-k40) Match Photos 99 178 458 Align Photos 498 1159 3087 Memory requirements and processing times can be seen on Fig. 2. From the left to the right (blue colon is processing time in hours, orange – peak memory consumption in GB): 1. 50000 photos project; 40 Tornado-k40 nodes. 2. 75000 photos project; 40 Tornado-k40 nodes and one Numascale node. 3. 100000 photos project; 40 Tornado-k40 nodes and one Numascale node; No time presented due to fail, there was not enough memory to finish the process. 4. 100000 photos project; 40 Tornado-k40 nodes and four Numascale nodes unified into group with shared memory. Managing Hybrid Supercomputer Resources in Photogrammetric Tasks 17 Fig. 2. Performance of Align Photos step in different projects The distribution of memory consumption can be seen on Fig. 3. Data acquired from Numascale node. Usually memory consumption is low enough to be run on Tornado nodes, but there are peak(s) that requires a lot more memory, just like on the Fig. 3, where it hits the mark of 120 GB. Fig. 3. Memory consumption on Numascale node during Align Photos step of 75000 photos project 4.3 Processing Several Projects Simultaneously Overall performance could be raised by processing several projects at ones. Thanks to the conducted study, following heuristics were made: top priority 18 N. Voinov et al. projects should be placed on Tornado-k40, other projects must run on Tornado. To enhance the stability and be able to process large projects it is better to include one Numascale node for each project, so that most memory demanding subtasks will be run on it. 5 Conclusion In this paper, an approach was suggested to managing the resources of hybrid supercomputer for maximizing its efficiency and reducing the overall process- ing time. Results of applying the approach are presented on Agisoft Photo- Scan software. PhotoScan performance was analyzed on different configurations and projects. Effective configuration was determined, which can process large projects in reasonable time. It was noted that the same project will run faster on Tornado cluster, rather than on Numascale. However, because using only Tor- nado nodes it is impossible to process large projects a combined configuration with Numascale node was suggested. Using of shared memory cluster allows overcoming issues with memory overflow by reducing the performance of the system. Without nodes grouping losses aren’t that much – only about 10%, but with grouped nodes performance drops significantly lower. Also, performance could be boosted by processing several projects simultaneously. Overall optimal configuration is to process several projects at the same time, where each project is computed on Tornado nodes and one Numascale node to avoid memory con- sumption issues. Acknowledgments. This work was financially supported by the Min- istry of Education and Science of the Russian Federation in the framework of the Federal Targeted Programme for Research and Development in Priority Areas of Advancement of the Russian Scientific and Technological Complex for 2014- 2020 (project No. 14.584.21.0022, ID RFMEFI58417X0022) and in the frame- work of the state assignment No. 2.9517.2017/8.9 (the project theme ”Methods and technologies for verification and development of software for modeling and calculations using HPC platform with extramassive parallelism”). References 1. Agisoft PhotoScan. http://www.agisoft.com 2. Ian Cutress. Scientific and Synthetic Benchmarks. 2D to 3D rednering – Agisoft PhotoScan. http://www.anandtech.com/show/7852/intel-xeon-e52697-v2-and -xeon-e52687w-v2-review-12-and-8-cores/4 3. Agisoft PhotoScan User Manual. http://www.agisoft.com/pdf/photoscan-pro 1 2 en.pdf 4. Matt Bach. Agisoft PhotoScan Multi Core Performance. https://www.pugetsys tems.com/labs/articles/Agisoft-PhotoScan-Multi-Core-Performance-709/ 5. Matt Bach. Agisoft PhotoScan GPU Acceleration. https://www.pugetsystems.c om/labs/articles/Agisoft-PhotoScan-GPU-Acceleration-710/ Managing Hybrid Supercomputer Resources in Photogrammetric Tasks 19 6. SPBSTU HPC Center Open Day. http://www.spbstu.ru/media/news/nauka i i nnovatsii/spbspu-open-day-supercomputer-center-polytechnic/ 7. Creating ”Polytechnic RSC Tornado” supercomputer for St. Petersburg State Poly- technical University. http://www.rscgroup.ru/ru/our-projects/240-sozdanie -superkompyutera-politehnik-rsk-tornado-dlya-spbpu 8. Einar Rustad. NumaConnect White Paper: A high level technical overview of the NumaConnect technology and products. https://www.numascale.com/numa pdf s/numaconnect-white-paper.pdf 9. Creating ”Polytechnic RSC PetaStream” supercomputer for St. Petersburg State Polytechnical University. http://www.rscgroup.ru/ru/our-projects/242-sozd anie-superkompyutera-politehnik-rsc-petastream-dlya-spbpu 10. Amdahl, Gene M. (1967). ”Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities” (PDF). AFIPS Conference Proceedings (30): 483–485. doi:10.1145/1465482.1465560.