<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Approach for Managing Hybrid Supercomputer Resources in Photogrammetric Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hybrid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Obtaining Models in PhotoScan</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Peter the Great St.Petersburg Polytechnic University</institution>
          ,
          <addr-line>Saint Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>12</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>This paper describes an approach to managing resources of the supercomputer for e ective execution of stereo photogrammetric tasks with Agisoft PhotoScan software. A research was made to establish the performance characteristics in order to obtain proper deployment. Looking at the PhotoScan calculating process, there are two mandatory steps (aligning photos and building model):</p>
      </abstract>
      <kwd-group>
        <kwd>Photogrammetry supercomputer</kwd>
        <kwd>High performance computing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Stereo photogrammetry tasks are a part of a Big Data analyzing and processing
task classes. The aim of the photogrammetry is to build 3-dimensional (3D)
model based on the set of 2-dimensional (2D) images (photos). Precise modelling
of terrain/land environment from the data acquired by drones is extra hard to
perform due to huge amount of input data, which can be measured in tens
and hundreds of thousands of photos. A supercomputer is needed to e ectively
process described task, so that the result will be obtained in a reasonable time.</p>
      <p>
        An approach to managing the supercomputer resources is studied on the
Agisoft PhotoScan Software [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is used as a benchmark (this software
was used due to requirements of the customer sponsored this project). PhotoScan
allows user to build di erent types of 3D models, such as elevation, tiled and
polygonal models from the set of 2D images. The process itself is very complex
and divided into several stages with di erent requirements for a computer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Performance analysis of di erent deployment con gurations helps to determine
bottlenecks and ways to overcome them. Therefore, proper deploy can result in
a signi cant performance and stability gain on this type of tasks.
      </p>
      <sec id="sec-1-1">
        <title>1. Determining camera positions.</title>
        <p>The rst stage is camera alignment. At this stage PhotoScan searches for
common points on photographs and matches them, as well as it nds the
position of the camera for each picture and re nes camera calibration
parameters. As a result a sparse point cloud and a set of camera positions
are formed. The sparse point cloud represents the results of photo alignment
and will not be directly used in the further 3D model construction procedure
(except for the sparse point cloud based reconstruction method). However
it can be exported for further usage in external programs. For instance, the
sparse point cloud model can be used in a 3D editor as a reference. On
the contrary, the set of camera positions is required for further 3D model
reconstruction by PhotoScan.
2. Building Dense Cloud.</p>
        <p>The next stage is building dense point cloud. Based on the estimated camera
positions and pictures themselves a dense point cloud is built by PhotoScan.
Dense point cloud may be edited and classi ed prior to export or proceeding
to 3D mesh model generation.
3. Next steps depend on the type of model user wants to obtain and have their
own speci cs.</p>
        <p>
          In general, PhotoScan reconstructs a 3D model representing the object
surface based on the dense or sparse cloud according to user's choice. There are
two algorithmic methods available in PhotoScan that can be applied to 3D
mesh generation: Height Field { for planar type surfaces and Arbitrary { for
any kind of object [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>In this study we are focusing on the obtaining the orthomosaic, therefore
these steps will present in the calculation process:</p>
      </sec>
      <sec id="sec-1-2">
        <title>1. Match Photos.</title>
        <p>2. Align Photos.
3. Build Dense Cloud.
4. Build DEM (Digital Elevation Model).
5. Build Orthomosaic.</p>
        <p>
          From all stages of processing, building dense cloud is the most resource
demanding [
          <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
          ]. Other steps do not a ect overall time this much.
        </p>
        <p>Known PhotoScan performance studies are conducted for single-machine use
case, when all the processing is being executed on the same machine where user
works. This is unsuitable for big projects because of inability to process large
sets of data.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Hybrid Supercomputer \Polytechnic"</title>
      <p>
        Supercomputer \Polytechnic" is a hybrid complex with peak performance more
than 1 PFlops. Supercomputer was developed by Russian company RSC [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
aims of the supercomputer (one of the most modern in Russia) are to improve
the e ciency of the fundamental and applied scienti c research of SPbPU; train
engineers with a high level of competence in the use of supercomputer
technology for developing high-tech products; set up a university-based regional center
of competence in the eld of using supercomputer technology in
knowledgeintensive sectors of the economy (power plant engineering, aircraft engineering,
bioengineering, radioelectronics); etc.
      </p>
      <p>
        Hybrid supercomputer consists of three clusters: Tornado [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Numascale [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and PetaStream [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (Fig. 1). In our study we deploy PhotoScan on Tornado and
Numascale clusters.
      </p>
      <p>Tornado specs: 668 nodes with two Intel E5-2697 v3 CPUs; 64 GB RAM;
IB FDR; 56 nodes also have 2 NVIDIA Tesla K40 GPGPU each. Overall: 1336
CPUs; 18704 x-86 cores; 112 GPGPU; 42752 GB RAM.</p>
      <p>
        Numascale specs: 64 nodes with three AMD Opteron 6380 CPUs; 192 GB
RAM, cache-coherent access to non-uniform memory (CC-NUMA) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; IB FDR,
3-dimensional torus topology. Overall: 192 CPUs; 3072 x-86 cores; 12288 GB
RAM. Nodes can be casted to groups of four and more nodes with memory
access between nodes.
      </p>
      <p>Data Storage System consists of two parts: parallel Lustre Data Storage
System (DSS) with 1 PB capacity and modular DSS for cloud with 0.5 PB
capacity.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Performance Analysis and Deployment</title>
      <p>Suggested approach is based on the ability of the supercomputer to connect
clusters through the Ininiband. By dividing the tasks of analyzing and
processing data on several chunks we can combine supercomputer resources in a more
e cient way, which results in faster processing.</p>
      <p>Application of this approach on Agisoft PhotoScan software can be seen
below, with di erent deploy con gurations of supercomputer clusters. Size of
input data used for the experiment varies from 10000 photos (100 GB) to 100000
(1 TB).
4.1</p>
      <p>10000-50000 Photos Processing Results</p>
      <sec id="sec-3-1">
        <title>Following con gurations were used during this experiment:</title>
      </sec>
      <sec id="sec-3-2">
        <title>1. 8 Numascale nodes.</title>
        <p>2. 8 Tornado nodes.
3. 8 Tornado nodes with GPGPU (Tornado-k40).</p>
        <p>In Table 1 you can see the processing times for each step of 10000 photos
project with di erent con gurations. The time is shown in minutes.</p>
        <p>
          Tornado and Tornado-k40 di er only in Build Dense Cloud Step, because
GPGPU is only used on this step [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Using of GPGPU greatly increases
performance on this step and thus overall performance. Final processing time is nearly
halved, because building dense cloud is the heaviest step in the line.
        </p>
        <p>Despite the fact that Numascale nodes have three times more RAM, it does
not boost the performance. Numascale nodes are 1.7-4 times slower than Tornado
nodes.</p>
        <p>Trend holds for projects with size from 10000 to 50000 photos. Obtained
performance evaluation will be later used for proper deployment.
4.2</p>
        <sec id="sec-3-2-1">
          <title>Processing More Than 50000 Photos</title>
          <p>Some problems can be encountered when processing large projects, which contain
more than 50000 photos. These issues appear because of the lack of RAM on
Tornado nodes, because only 64 GB is available.</p>
          <p>
            Paper [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] shows that steps of the PhotoScan process are dependent on each
other and thus cannot be processed in parallel. It is a thing to be reckoned with
while resource management. During the experiments, it was observed that
processing steps are consist of subtasks that are not equal. Some of these subtasks
require sequential processing only on one node. In most cases, in this subtask
results from all processing nodes are merged or other subtasks that require
synchronization with previous pieces of work.
          </p>
          <p>It was found that during these subtasks Tornado nodes were suspended.
Thanks to log analysis routine, the cause of suspend was found. It was happening
due to memory over ow.</p>
          <p>Also, problematic step and subtask were determined. Problems were
occurring in the Align Photos stage. Some experiments were successfully held in order
to con rm the theory. To overcome the issue an alteration in cluster con
guration was made. Numascale node was added to processing nodes, because it
has three times as much memory somewhat comparable performance.
Numascale node was included in the nodes list with top priority, so that the server
which distributes the tasks will always address the problematic subtasks to the
Numascale node.</p>
          <p>On the plus side, after adding the Numascale node it became possible to
process projects up to 75000 photos.</p>
          <p>
            On the other side, performance slightly dropped because Numascale node is
slower than Tornado and Tornado-k40. Keeping in mind the Amdal's law [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ],
inclusion of Numascale node reduces the overall performance of the system as it
is the slower than others and the overall computing time can't be smaller than
the computing time of the slowest node.
          </p>
          <p>But single Numascale node was not enough to process 100000 photos project.
So, four Numascale nodes were uni ed into a group with summary memory of
768 GB. The project was completed successfully, but it was noted that
performance of Numascale group is slower than single Numascale node, reducing the
performance even more. Results of experiments can be seen in Table 2. The time
is shown in minutes.</p>
          <p>Memory requirements and processing times can be seen on Fig. 2. From the
left to the right (blue colon is processing time in hours, orange { peak memory
consumption in GB):</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>1. 50000 photos project; 40 Tornado-k40 nodes.</title>
        <p>2. 75000 photos project; 40 Tornado-k40 nodes and one Numascale node.
3. 100000 photos project; 40 Tornado-k40 nodes and one Numascale node; No
time presented due to fail, there was not enough memory to nish the process.
4. 100000 photos project; 40 Tornado-k40 nodes and four Numascale nodes
uni ed into group with shared memory.</p>
        <p>The distribution of memory consumption can be seen on Fig. 3. Data acquired
from Numascale node. Usually memory consumption is low enough to be run on
Tornado nodes, but there are peak(s) that requires a lot more memory, just like
on the Fig. 3, where it hits the mark of 120 GB.</p>
        <sec id="sec-3-3-1">
          <title>Processing Several Projects Simultaneously</title>
          <p>Overall performance could be raised by processing several projects at ones.
Thanks to the conducted study, following heuristics were made: top priority
projects should be placed on Tornado-k40, other projects must run on Tornado.
To enhance the stability and be able to process large projects it is better to
include one Numascale node for each project, so that most memory demanding
subtasks will be run on it.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, an approach was suggested to managing the resources of hybrid
supercomputer for maximizing its e ciency and reducing the overall
processing time. Results of applying the approach are presented on Agisoft
PhotoScan software. PhotoScan performance was analyzed on di erent con gurations
and projects. E ective con guration was determined, which can process large
projects in reasonable time. It was noted that the same project will run faster
on Tornado cluster, rather than on Numascale. However, because using only
Tornado nodes it is impossible to process large projects a combined con guration
with Numascale node was suggested. Using of shared memory cluster allows
overcoming issues with memory over ow by reducing the performance of the
system. Without nodes grouping losses aren't that much { only about 10%, but
with grouped nodes performance drops signi cantly lower. Also, performance
could be boosted by processing several projects simultaneously. Overall optimal
con guration is to process several projects at the same time, where each project
is computed on Tornado nodes and one Numascale node to avoid memory
consumption issues.</p>
      <p>Acknowledgments. This work was nancially supported by the
Ministry of Education and Science of the Russian Federation in the framework of the
Federal Targeted Programme for Research and Development in Priority Areas
of Advancement of the Russian Scienti c and Technological Complex for
20142020 (project No. 14.584.21.0022, ID RFMEFI58417X0022) and in the
framework of the state assignment No. 2.9517.2017/8.9 (the project theme "Methods
and technologies for veri cation and development of software for modeling and
calculations using HPC platform with extramassive parallelism").</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Agisoft PhotoScan. http://www.agisoft.com</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Ian</given-names>
            <surname>Cutress</surname>
          </string-name>
          .
          <article-title>Scienti c and Synthetic Benchmarks. 2D to 3D rednering { Agisoft PhotoScan</article-title>
          . http://www.anandtech.com/show/7852/intel-xeon-e52697
          <string-name>
            <surname>-</surname>
          </string-name>
          v2
          <string-name>
            <surname>-</surname>
          </string-name>
          and
          <string-name>
            <surname>-</surname>
          </string-name>
          xeon
          <string-name>
            <surname>-</surname>
          </string-name>
          e52687w
          <source>-v2-review-12-and-8-cores/4</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <article-title>Agisoft PhotoScan User Manual</article-title>
          . http://www.agisoft.com/pdf/photoscan
          <article-title>-pro 1 2 en</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Matt</given-names>
            <surname>Bach</surname>
          </string-name>
          .
          <article-title>Agisoft PhotoScan Multi Core Performance</article-title>
          . https://www.pugetsys tems.com/labs/articles/Agisoft-PhotoScan
          <string-name>
            <surname>-Multi-</surname>
          </string-name>
          Core-Performance-
          <volume>709</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Matt</given-names>
            <surname>Bach</surname>
          </string-name>
          .
          <article-title>Agisoft PhotoScan GPU Acceleration</article-title>
          . https://www.pugetsystems.c om/labs/articles/Agisoft-PhotoScan
          <string-name>
            <surname>-GPU-</surname>
          </string-name>
          Acceleration-
          <volume>710</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>SPBSTU</given-names>
            <surname>HPC Center Open</surname>
          </string-name>
          <article-title>Day</article-title>
          . http://www.spbstu.ru/media/news
          <article-title>/nauka i i nnovatsii/spbspu-open-day-supercomputer-center-polytechnic/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Creating</surname>
          </string-name>
          <article-title>"Polytechnic RSC Tornado" supercomputer for St</article-title>
          . Petersburg State Polytechnical University. http://www.rscgroup.ru/ru/our-projects/240-sozdanie
          <string-name>
            <surname>-</surname>
          </string-name>
          superkompyutera
          <article-title>-politehnik-rsk-tornado-dlya-spbpu</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Einar</given-names>
            <surname>Rustad. NumaConnect White</surname>
          </string-name>
          <article-title>Paper: A high level technical overview of the NumaConnect technology and products</article-title>
          . https://www.numascale.
          <article-title>com/numa pdf s/numaconnect-white-paper</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Creating</surname>
          </string-name>
          <article-title>"Polytechnic RSC PetaStream" supercomputer for St</article-title>
          . Petersburg State Polytechnical University. http://www.rscgroup.ru/ru/our-projects/242
          <article-title>-sozd anie-superkompyutera-politehnik-rsc-petastream-dlya-spbpu</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Amdahl</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gene</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1967</year>
          ).
          <article-title>"Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities" (PDF)</article-title>
          .
          <source>AFIPS Conference Proceedings (30):</source>
          <volume>483</volume>
          {
          <fpage>485</fpage>
          . doi:
          <volume>10</volume>
          .1145/1465482.1465560.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>