=Paper= {{Paper |id=Vol-3762/531 |storemode=property |title=3D reconstruction methods in industrial settings: a comparative study for COLMAP, NeRF and 3D Gaussian Splatting |pdfUrl=https://ceur-ws.org/Vol-3762/531.pdf |volume=Vol-3762 |authors=Zeno Sambugaro,Lorenzo Orlandi,Nicola Conci |dblpUrl=https://dblp.org/rec/conf/ital-ia/SambugaroOC24 }} ==3D reconstruction methods in industrial settings: a comparative study for COLMAP, NeRF and 3D Gaussian Splatting== https://ceur-ws.org/Vol-3762/531.pdf
                                3D reconstruction methods in industrial settings: a
                                comparative study for COLMAP, NeRF and 3D Gaussian
                                Splatting
                                Zeno Sambugaro1,† , Lorenzo Orlandi2,*,† and Nicola Conci3
                                DISI, University of Trento, via Sommarive, 5, Povo, 38123, Italy


                                                Abstract
                                                3D rendering techniques have undergone a rapid evolution with the emergence of novel and advanced methodologies,
                                                redefining the boundaries of realism and computational efficiency. This study explores recent advancements in the field,
                                                comparing established approaches like photogrammetry with software such as COLMAP against the new frontiers opened
                                                by emerging view synthesis approaches like Neural Radiance Fields (NeRF), and 3D Gaussian Splatting. In this paper, we
                                                present a comprehensive comparison of the described methods tailored for industrial applications, where the data acquisition
                                                is generally conducted by human operators employing handheld devices.

                                                Keywords
                                                Photogrammetry, NeRF, Gaussian Splatting, 3D Reconstruction



                                1. Introduction                                                                                          construction enables the development of augmented real-
                                                                                                                                         ity (AR) technologies, thus offering an additional layer of
                                In recent years, the advancement of 3D reconstruction                                                    information, enhancing operational safety and efficiency.
                                technologies has opened new avenues in the documen-                                                      By overlaying digital models onto the physical world,
                                tation and analysis of urban landscapes, such as work-                                                   operators can gain real-time insights, further preventing
                                ing, industrial and archaeological sites. Among these,                                                   the accidental severing of critical infrastructure.
                                photogrammetry has long been established as the base-                                                       This paper investigates the strengths and limitations
                                line for precise, high-resolution mapping and modeling.                                                  of photogrammetry, NeRF, and 3D Gaussian Splatting
                                However, recent advent of Artificial Intelligence (AI) in                                                in excavations, where geographical positioning data is
                                the 3D field, thanks to the introduction of Neural Radi-                                                 essential. Utilizing datasets of images and precise coordi-
                                ance Fields (NeRF) [1] and more recently 3D Gaussian                                                     nates, it aims to measure each method’s efficacy, particu-
                                Splatting [2] techniques presents a novel paradigm, po-                                                  larly where traditional photogrammetry is not accurate.
                                tentially overcoming some of the inherent limitations                                                    Implicit methods like NeRF show promise in rendering
                                faced by traditional methods. This paper aims to pro-                                                    complex scenes with diverse surface properties, while
                                vide a comprehensive comparison between these cutting-                                                   3D Gaussian splatting provide very accurate estimation
                                edge techniques, focusing on their application in the                                                    of the surfaces and fine structures, areas challenging for
                                industrial context of excavation sites. Excavation sites                                                 conventional methods.
                                present unique challenges for 3D reconstruction due to                                                      NeRF-based methods have recently proven to be a
                                their dynamic nature and intricate details. Operators                                                    valuable alternative to traditional photogrammetry in
                                data collection methods must adapt to ensure fidelity in                                                 the field of image-based 3D reconstruction. This innova-
                                reconstructing occluded regions. Integrating geo-spatial                                                 tion is especially significant for the challenging scenarios
                                data with 3D reconstructions aids utility companies in                                                   of excavation sites, where the accuracy and detail of 3D
                                locating subsurface infrastructure accurately. This en-                                                  models are crucial. This research is motivated by the
                                hances worksite planning, management, and reduces the                                                    potential of NeRF to enhance the precision and reliability
                                risk of accidental damage during future excavation.                                                      of reconstructions in such complex scenario. By com-
                                   Moreover, the use of geo-referenced data in the 3D re-                                                paring NeRF with traditional photogrammetry across
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-                                  varied scenes, this study aims to comprehensively assess
                                nized by CINI, May 29-30, 2024, Naples, Italy                                                            their performances in capturing intricate details, surface
                                *
                                  Corresponding author.                                                                                  textures, and overall geometric accuracy. Our goal is to
                                †
                                  These authors contributed equally.                                                                     evaluate the suitability of NeRF techniques to be adopted
                                $ zeno.sambugaro@unitn.it (Z. Sambugaro);                                                                in real-world applications with a particular focus on ex-
                                lorenzo.orlandi@unitn.it (L. Orlandi); nicola.conci@unitn.it
                                                                                                                                         cavation sites.
                                (N. Conci)
                                 0000-0002-4541-4528 (Z. Sambugaro); 0000-0002-8376-043X
                                (L. Orlandi); 0000-0002-7858-0928 (N. Conci)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                           Photogrammetry    Dense point cloud      Mesh Generation




       Geo referenced     Image position     nerfstudio     Neural radiance field     Point cloud                            Georeferencing
      Image acquisition     refinement                                                                Mesh Generation
                                                                                      Generation

                          Colmap Sparse

                                                              Gaussian scene                            Point cloud
                                               3DGS                                 Mesh Generation
                                                              representation                            Generation




                                                                                                                        Comparison of the methods




Figure 1: Overview of the proposed methodology



2. Background                                                     tion. This relation is formulated through the Multi-Layer
                                                                  Perceptron (MLP) 𝐹𝜃 , expressed as:
3D reconstruction is crucial in fields like construction,
excavation, and worksite management. Employing multi-                                           𝐹𝜃 : (x, d) → (c, 𝜎)                                (1)
view reconstruction techniques, scenes are captured from
various angles using 2D images. This enables detailed             where x = (𝑥, 𝑦, 𝑧) denotes the coordinates within the
monitoring of the project progress and provides the               scene, and d (𝜃, 𝜑) represents the 3D Cartesian unit vec-
ability to virtually navigate sites, both during and after        tor indicating the direction. The color c = (𝑟, 𝑔, 𝑏) shifts
completion, utilizing geo-referencing and virtual reality.        with the viewing angle, while 𝜎, denoting volume density,
Among the most common photogrammetric solutions                   remains invariant. The usage of neural volume rendering
for 3D view reconstruction, we focus on COLMAP [3]                pipelines, over traditional point clouds or meshes, en-
for its open-access policy and continual improvements.            able the modeling of variations in color and illumination.
COLMAP enables the conversion of 2D images into com-              InstantNGP [4], short for Instant Neural Graphics Prim-
prehensive 3D models, including point clouds and tex-             itives, is a variant that enhances NeRF’s framework to
tured meshes, enabling advanced spatial analyses. How-            expedite scene reconstruction significantly. By refining
ever, the application of photogrammetric reconstruction           the neural network’s architecture and computations, In-
encounters several challenges, particularly when dealing          stantNGP facilitates quicker achievement of high-quality
with objects characterized by complex optical properties          results, positioning it as a viable option for real-time
such as high absorbency, reflectivity, or scattering. These       applications.
methods can also suffer from variance in lighting con-               NeRFStudio introduces an innovative platform, lever-
ditions, including shadows, glare, or inconsistent illumi-        aging the Nerfacto model, to streamline NeRF-based
nation, as well as by surfaces with uniform or repetitive         model creation and manipulation. Nerfacto integrates
textures and complex shapes or geometries.                        insights from very recent research, including MipNeRF-
   NeRF-based technologies offer cutting-edge solutions           360 [8], Instant-NGP [4], and Ref-NeRF [7], focusing on
to overcome limitations in scene representation by                optimizing camera views and sampling processes.
resembling the scene with particles characterized by
density and color. This study compares two neural         3D Gaussian Splatting for Real-Time Radiance Field
radiance-based techniques, Nerfacto (a variation of       Rendering 3D Gaussian Splatting [2], a novel ap-
InstantNGP [4] in Nerfstudio [5]) and SuGaR [6]           proach to scene representation, contrasts with neural
(a variation of 3D Gaussian Splatting [2]), against       fields by optimizing an explicit point-based scene model.
traditional photogrammetry methods.                       Each point in this representation is associated with var-
                                                          ious attributes: a position 𝑝 ∈ R3 , opacity 𝑜 ∈ [0, 1],
                                                          third-degree spherical harmonics (SH) coefficients 𝑘 ∈
Neural Radiance Fields Neural Radiance Fields R , 3D scale 𝑠 ∈ R , and 3D rotation                 𝑅 ∈ 𝑆𝑂(3) rep-
                                                            16                  3


(NeRF) have emerged as a significant advancement in       resented  by  4D quaternions  𝑞 ∈ R4 . Rendering to the
the field of 3D scene reconstruction. The scene is repre- image plane involves accumulating the color 𝑐𝐺𝑆 from
sented with a novel 5D function. This function correlates correctly-sorted points using the equation:
each spatial point (𝑥, 𝑦, 𝑧) with the radiance emitted
in any direction, defined by azimuthal and polar angles               𝑁𝑝                          𝑗−1
(𝜃, 𝜑). The outcome, characterized by volume density 𝜎
                                                                     ∑︁                           ∏︁
                                                              𝑐𝐺𝑆 =      𝑐𝑗 𝛼𝑗 𝜏𝑖 where 𝜏𝑖 =          (1 − 𝛼𝑖 ) (2)
and RGB color values 𝑐, varies with the viewing direc-               𝑗=1                          𝑖=1
with 𝑐𝑗 determined by SH coefficients 𝑘 and 𝛼𝑗 calcu-        selected some simple playground games mixed with
lated from the projected 2D Gaussian with covariance         real excavation scenarios where the reconstruction is
Σ′ = 𝐽𝑀 Σ𝑀 𝑇 𝐽 𝑇 , incorporating per-point opacity 𝑜,        more challenging. Our datasets consist of 7 playground
viewing transformation 𝑀 , and Jacobian 𝐽 of the affine      scenarios and 3 excavation scenarios.
approximation of the projective transformation. The 3D
covariance matrix Σ ensures positive semi-definiteness          Acquisition Process. The dataset has been acquired
through the scale matrix 𝑆 = diag(𝑠1 , 𝑠2 , 𝑠3 ) and rota-   following the standard procedure that an operator would
tion 𝑅, following Σ = 𝑅𝑆𝑆 𝑇 𝑅𝑇 .                             follow when working in a given site. The trajectory
   Building upon the principles of 3D Gaussian Splatting,    reflects a rotation around the object, maintaining the cap-
Surface Gaussian Approximation for Rendering (SuGaR)         ture at eye level. During acquisition, the frame rate is set
[6] leverages Gaussian functions to model object surfaces    at 5 frames per second with a resolution of 1280 x 720.
within a scene, achieving precision in handling occlu-       The accuracy of the geopose data is always less than 3
sions and detailed surface texturing through Gaussian        cm in traslation and less than 1 degree for each acquired
"splats" projected onto a volume grid. Each splat influ-     image. We maintain a uniform velocity during acqui-
ences the volume’s density and color, based on its spatial   sition, so that the number of images for each scenario
location and Gaussian distribution, described mathemat-      depends on the length of the trajectory. The playground
ically as:                                                   dataset comprises approximately 200 images, while the
                                                             excavation dataset contains around 500 images, which
                                  (︂                         reflects
                                                              )︂       longer trajectories.
                      1               1
𝐺(x; 𝜇, Σ) =          3     1 exp    − (x − 𝜇)𝑇 Σ−1 (x − 𝜇)
                 (2𝜋) 2 |Σ| 2         2
                                                         (3) 3.2. Methodologies Employed
   In Eq. (3) x denotes a point in space, 𝜇 the mean lo- Three distinct reconstruction methodologies were ap-
cation (center of the splat), and Σ the covariance matrix plied to the captured datasets; an overview is shown in
shaping the Gaussian distribution. SuGaR’s method for Figure 1:
accumulating multiple splats across a scene constructs
a volumetric representation capturing density and color           1. Photogrammetry: The classical photogrammet-
information, enabling a precise shading and depth ren-                ric procedure involves estimating camera orien-
dering.                                                               tation parameters for sparse point cloud con-
                                                                      struction, generating a dense point cloud; mesh
                                                                      creation and texture extraction complete the re-
3. Methodology                                                        construction process. For this purpose we used
                                                                      COLMAP, with all phases conducted in high-
This study aims to evaluate the effectiveness and poten-              quality mode to ensure maximum detail and ac-
tial benefits of Neural Radiance Fields (NeRF) against tra-           curacy.
ditional image-based reconstruction techniques, particu-          2. NeRF-Based Reconstruction: The training of
larly photogrammetry, in the context of augmented/vir-                Neural Radiance Field reconstruction requires
tual reality applications. Our focus is on challenging                known camera poses as input. We use nerfstudio
outdoor scenarios. We include excavation sites and play-              [5], and in particular "nerfacto", a model strongly
ground objects, which are characterized by unbounded                  based on InstantNGP [4], used for its fast training
environments and non-Lambertian surfaces. To facilitate               and inference. We then extract the dense point
a direct comparison, the same dataset of images, captured             clouds and textured mesh from nerfstudio’s API;
with geo-referencing, is utilized across all reconstruction           in particular, for mesh extraction we exploited
methods. This standardized approach ensures that differ-              Poisson reconstruction.
ences in the reconstruction quality and efficiency can be         3. Gaussian Splatting (SuGaR): Similarly to NeRF
attributed solely to the methodologies rather than due to             this method requires known camera poses as in-
a bad alignment.                                                      put. This explicit model is then trained to ap-
                                                                      proximate the radiance field of the scene. The
3.1. Dataset acquisition                                              training of SuGaR involves more than one step.
                                                                      The training starts with 7k iterations of normal
The datasets are collected using a system comprised                   3D Gaussian Splatting and 7k iterations of SuGaR
of two devices: a smartphone and an RTK-GNSS                          finetuning to extract a more precise geometry.
spatially calibrated as can be seen from [9]. These
devices ensure highly accurate pose information for             The acquisition of our dataset incorporated geo-
all the collected scenes. The study aims to analyze referencing, so as to simplify the alignment process for
industrial applications, therefore, as scenarios, we have the reconstructions. The only exception is NeRF, as an
             Generated meshes




                                Colmap                nerfstudio                        SuGaR

            Rendering results




             NeRF                                         SuGaR

             Cloud to Cloud distance
                                                                                                         10 cm




                                                                                                         0 cm

            Colmap - Nerf                                Colmap - SuGaR


Figure 2: Comparison of the mesh obtained with the proposed methodologies on the playgrounds dataset. Other scenes can
be found at: https://zenos4mbu.github.io/photogrammetry_nerf.github.io/



implicit framework this model normalizes its coordinates      meshes. The point clouds are easily exported since the
between -1 and 1. This aspect of NeRF requires an ad-         neural representation can be inspected at any 3D point.
ditional step to calibrate the model, to incorporate scale    For the meshes this conversion employs the marching
and translation derived from the geo-referenced input to      cubes algorithm and the Poisson surface reconstruction
ensure accurate alignment. For the dataset to be used in      method. In the SuGaR framework the mesh extraction
training, we first need to estimate the camera parame-        phase it also done through marching cubes or Possian
ters from the input images. This estimation is necessary      surface reconstruction. In this case the reconstruction is
because the neural network requires knowledge of both         enhanced thanks to the precise estimation of the normals
the camera’s positions and the corresponding images to        of the sampled points. To obtain an accuracy metric we
accurately generate the scene representation. To achieve      derive a cloud-to-cloud comparison using the CloudCom-
this, we utilized COLMAP, a known software for its ap-        pare software.
plication of Structure from Motion (SfM) techniques [3],
for estimating three-dimensional structures from two-         3.3. Comparative Analysis Framework
dimensional image sequences.
   To facilitate comparison, given that outputs from pho-    The comparative analysis between these methods focuses
togrammetry are not directly comparable with those from      on the following key metrics: (i) Accuracy and Detail
neural fields or Gaussian splatting, we incorporate an       Resolution, to evaluate the fidelity of the reconstructed
additional conversion phase. NeRFstudio provides func-       models to the original scenes, and (ii) Processing Time,
tionality to convert NeRF outputs into point clouds and      to assess the efficiency of each methodology in terms
         Generated meshes




                              Colmap                    nerfstudio                          SuGaR
          Rendering results




          NeRF                                            SuGaR

          Cloud to Cloud distance                                                                             10 cm




                                                                                                              0 cm

         Colmap - Nerf                                    Colmap - SuGaR


Figure 3: Comparison of the cloud to cloud distance of the proposed methodologies on the excavation sites dataset. Other
scenes can be found at: https://zenos4mbu.github.io/photogrammetry_nerf.github.io/



of computational resources and time required for recon-        4. Discussion
struction. To compare the level of fidelity of the recon-
structed models we propose using the point clouds gen-        We show a comparison of NeRF-based techniques against
erated by the studied methods. In this way we can obtain      traditional photogrammetry utilizing COLMAP. All mod-
a quantitative metric. To be more specific we measure         els are trained on an NVIDIA RTX 3090 GPU. The assess-
the cloud to cloud deviation of the methods based on          ment focuses on their effectiveness in view synthesis and
radiance fields with respect to the reconstruction using      3D reconstruction, particularly in expansive, unbounded
classical photogrammetry. This measure is an absolute         environments. The results of our analysis highlights
value, which doesn’t tell which method is performing          that the three methodologies produce high quality point
better; it only informs about the deviation from one re-      clouds, with very close results especially in the fine struc-
construction to the other. Therefore, we also show the        tures of the 3D scene, as illustrated in in Figure 2. No-
rendering results in order to see the performances in         tably, NeRF’s output shows a denser point cloud around
graphical terms, in Figure 3. In addition to this quantita-   high-frequency scene features but has gaps in smoother
tive result we also propose a qualitative comparison of       regions. The radiance field rendering results show that
the resulting meshes, comparing the proposed method-          the quality of the reconstructed views is really high and
ologies in Figure 2.                                          is very difficult to say if nerfstudio or SuGaR presents
                                                              the best result. However, the comparison illustrated in
                                                              Figure 2 highlights a failure case of nerfstudio, with a
                                                              red area within the scene’s object of interest indicating
a high cloud-to-cloud distance. This issue not only pro-      analyzed is the reliance of the current rendering pipelines
duce a discrepancy in the point cloud representation but      for virtual and augmented reality on meshes represen-
also results in blurring within the targeted region of the    tations. This advantages the classical photogrammetry
neural reconstruction.                                        since its final goal is to obtain a mesh representation. In
Considering the extensive usage of meshes in VR and           contrast, neural rendering technologies focus primarily
AR applications, for their simplicity and low memory          on view synthesis, offering an alternative that eliminates
footprint, we present a comparison of the meshes pro-         the need for mesh generation. SuGaR and more in gen-
duced with the three methodologies. In Figure 2 we show       eral 3D Gaussian Splatting techniques produce an explicit
the obtained meshes also showing a detail of the recon-       representation that allow for the splatting of Gaussians
struction in the region of the 3D scene with finer details.   in the same way traditional methods splat triangle. This
As depicted in Figure 2, there’s a noticeable variance in     feature enable SuGaR to render the scene in real time,
detail and texture among the outputs. The COLMAP              making it possible to use it into existing pipelines. In the
mesh, while being consistent, falls short on represent-       future, we see 3D Gaussian Splatting to be a potential
ing thin structures. In contrast, the NeRF mesh shows         replacement for for meshes representations, especially in
greater detail but presents some holes. The SuGaR mesh        scenarios requiring the realistic reconstruction of com-
stands out for its superior detail, accurately capturing      plex environment.
complex structures where others falter, thanks to its pre-
cise normal calculations. Another point to consider is        This research is supported by the project DIMOTY,
the difference in accuracy between the two scenarios we       funded by the Autonomous Province of Trento un-
have examined. The playground scene is easier and, in         der the LP6/99 framework
fact, has better results compared to the case of excava-
tions. The complexity of the excavation scenario reduces
the performance in reconstruction, especially with the        References
SuGaR and NeRF method. It is noticeable in the figure 3
that there are many artifacts on the road surface visible      [1] Mildenhall, Ben, et al. "Nerf: Representing scenes
on the Cloud to Cloud distance analysis, especially in the         as neural radiance fields for view synthesis." Com-
case of SuGaR, and there are also many holes, especially           munications of the ACM 65.1 (2021): 99-106.
in the excavation bottom. Finally, we analyze the process-     [2] Kerbl, Bernhard, et al. "3d gaussian splatting for
ing time for each method. Regarding this aspect, there is          real-time radiance field rendering." ACM Transac-
no difference between SuGaR and COLMAP. Instead, the               tions on Graphics 42.4 (2023): 1-14.
best performance is observed with InstantNGP, which            [3] Schonberger et al. "Structure-from-motion revis-
takes about a quarter of the time compared to the other            ited." Proceedings of the IEEE conference on com-
methods. Additional materials regarding to our analysis,           puter vision and pattern recognition. 2016.
they can be accessed through this link 1 .                     [4] Müller et al. "Instant neural graphics primitives
                                                                   with a multiresolution hash encoding." ACM trans-
                                                                   actions on graphics (TOG) 41.4 (2022): 1-15.
5. Conclusions                                                 [5] Tancik et al. "Nerfstudio: A Modular Framework
                                                                   for Neural Radiance Field Development." ACM SIG-
In this paper we provide a comparative analysis of Neural          GRAPH 2023.
radiance fields based reconstruction methods and classi-       [6] Chen et al., C. "SuGaR: Pre-training 3D Vi-
cal photogrammetry for unbounded scenarios. We show                sual Representations for Robotics." arXiv preprint
results in playgrounds and excavations sites, to access            arXiv:2404.01491, 2024.
the performances in easy and complex scenarios. In our         [7] Verbin et al. "Ref-nerf: Structured view-dependent
set-up, photogrammetry has provided superior reliabil-             appearance for neural radiance fields." 2022
ity in complex scenes, especially on the excavation sites.         IEEE/CVF Conference on Computer Vision and Pat-
Proving also better results in modeling completely flat            tern Recognition (CVPR).
area which in the NeRF methods presents some artifacts.        [8] Barron et al.. "Mip-nerf 360: Unbounded anti-
Although training/reconstruction times are generally not           aliased neural radiance fields." Proceedings of the
the main concern in the reconstruction of working areas,           IEEE/CVF Conference on Computer Vision and Pat-
some application might benefit from fast reconstruction            tern Recognition, pp. 5470-5479, 2022.
times. In this aspect nerfstudio provided the best speed       [9] Lorenzo O., Kevin D., et al. "Spatial-Temporal Cal-
in the reconstruction, requiring just 15 minutes for the           ibration for Outdoor Location-Based Augmented
training of a scene. An important aspect that needs to be          Reality’. IEEE Sensor Journal (2024): “accepted for
                                                                   publication”.
1
    https://zenos4mbu.github.io/photogrammetry_nerf.github.
    io/