=Paper=
{{Paper
|id=Vol-3762/531
|storemode=property
|title=3D reconstruction methods in industrial settings: a comparative study for COLMAP, NeRF and 3D Gaussian Splatting
|pdfUrl=https://ceur-ws.org/Vol-3762/531.pdf
|volume=Vol-3762
|authors=Zeno Sambugaro,Lorenzo Orlandi,Nicola Conci
|dblpUrl=https://dblp.org/rec/conf/ital-ia/SambugaroOC24
}}
==3D reconstruction methods in industrial settings: a comparative study for COLMAP, NeRF and 3D Gaussian Splatting==
3D reconstruction methods in industrial settings: a
comparative study for COLMAP, NeRF and 3D Gaussian
Splatting
Zeno Sambugaro1,† , Lorenzo Orlandi2,*,† and Nicola Conci3
DISI, University of Trento, via Sommarive, 5, Povo, 38123, Italy
Abstract
3D rendering techniques have undergone a rapid evolution with the emergence of novel and advanced methodologies,
redefining the boundaries of realism and computational efficiency. This study explores recent advancements in the field,
comparing established approaches like photogrammetry with software such as COLMAP against the new frontiers opened
by emerging view synthesis approaches like Neural Radiance Fields (NeRF), and 3D Gaussian Splatting. In this paper, we
present a comprehensive comparison of the described methods tailored for industrial applications, where the data acquisition
is generally conducted by human operators employing handheld devices.
Keywords
Photogrammetry, NeRF, Gaussian Splatting, 3D Reconstruction
1. Introduction construction enables the development of augmented real-
ity (AR) technologies, thus offering an additional layer of
In recent years, the advancement of 3D reconstruction information, enhancing operational safety and efficiency.
technologies has opened new avenues in the documen- By overlaying digital models onto the physical world,
tation and analysis of urban landscapes, such as work- operators can gain real-time insights, further preventing
ing, industrial and archaeological sites. Among these, the accidental severing of critical infrastructure.
photogrammetry has long been established as the base- This paper investigates the strengths and limitations
line for precise, high-resolution mapping and modeling. of photogrammetry, NeRF, and 3D Gaussian Splatting
However, recent advent of Artificial Intelligence (AI) in in excavations, where geographical positioning data is
the 3D field, thanks to the introduction of Neural Radi- essential. Utilizing datasets of images and precise coordi-
ance Fields (NeRF) [1] and more recently 3D Gaussian nates, it aims to measure each method’s efficacy, particu-
Splatting [2] techniques presents a novel paradigm, po- larly where traditional photogrammetry is not accurate.
tentially overcoming some of the inherent limitations Implicit methods like NeRF show promise in rendering
faced by traditional methods. This paper aims to pro- complex scenes with diverse surface properties, while
vide a comprehensive comparison between these cutting- 3D Gaussian splatting provide very accurate estimation
edge techniques, focusing on their application in the of the surfaces and fine structures, areas challenging for
industrial context of excavation sites. Excavation sites conventional methods.
present unique challenges for 3D reconstruction due to NeRF-based methods have recently proven to be a
their dynamic nature and intricate details. Operators valuable alternative to traditional photogrammetry in
data collection methods must adapt to ensure fidelity in the field of image-based 3D reconstruction. This innova-
reconstructing occluded regions. Integrating geo-spatial tion is especially significant for the challenging scenarios
data with 3D reconstructions aids utility companies in of excavation sites, where the accuracy and detail of 3D
locating subsurface infrastructure accurately. This en- models are crucial. This research is motivated by the
hances worksite planning, management, and reduces the potential of NeRF to enhance the precision and reliability
risk of accidental damage during future excavation. of reconstructions in such complex scenario. By com-
Moreover, the use of geo-referenced data in the 3D re- paring NeRF with traditional photogrammetry across
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- varied scenes, this study aims to comprehensively assess
nized by CINI, May 29-30, 2024, Naples, Italy their performances in capturing intricate details, surface
*
Corresponding author. textures, and overall geometric accuracy. Our goal is to
†
These authors contributed equally. evaluate the suitability of NeRF techniques to be adopted
$ zeno.sambugaro@unitn.it (Z. Sambugaro); in real-world applications with a particular focus on ex-
lorenzo.orlandi@unitn.it (L. Orlandi); nicola.conci@unitn.it
cavation sites.
(N. Conci)
0000-0002-4541-4528 (Z. Sambugaro); 0000-0002-8376-043X
(L. Orlandi); 0000-0002-7858-0928 (N. Conci)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Photogrammetry Dense point cloud Mesh Generation
Geo referenced Image position nerfstudio Neural radiance field Point cloud Georeferencing
Image acquisition refinement Mesh Generation
Generation
Colmap Sparse
Gaussian scene Point cloud
3DGS Mesh Generation
representation Generation
Comparison of the methods
Figure 1: Overview of the proposed methodology
2. Background tion. This relation is formulated through the Multi-Layer
Perceptron (MLP) 𝐹𝜃 , expressed as:
3D reconstruction is crucial in fields like construction,
excavation, and worksite management. Employing multi- 𝐹𝜃 : (x, d) → (c, 𝜎) (1)
view reconstruction techniques, scenes are captured from
various angles using 2D images. This enables detailed where x = (𝑥, 𝑦, 𝑧) denotes the coordinates within the
monitoring of the project progress and provides the scene, and d (𝜃, 𝜑) represents the 3D Cartesian unit vec-
ability to virtually navigate sites, both during and after tor indicating the direction. The color c = (𝑟, 𝑔, 𝑏) shifts
completion, utilizing geo-referencing and virtual reality. with the viewing angle, while 𝜎, denoting volume density,
Among the most common photogrammetric solutions remains invariant. The usage of neural volume rendering
for 3D view reconstruction, we focus on COLMAP [3] pipelines, over traditional point clouds or meshes, en-
for its open-access policy and continual improvements. able the modeling of variations in color and illumination.
COLMAP enables the conversion of 2D images into com- InstantNGP [4], short for Instant Neural Graphics Prim-
prehensive 3D models, including point clouds and tex- itives, is a variant that enhances NeRF’s framework to
tured meshes, enabling advanced spatial analyses. How- expedite scene reconstruction significantly. By refining
ever, the application of photogrammetric reconstruction the neural network’s architecture and computations, In-
encounters several challenges, particularly when dealing stantNGP facilitates quicker achievement of high-quality
with objects characterized by complex optical properties results, positioning it as a viable option for real-time
such as high absorbency, reflectivity, or scattering. These applications.
methods can also suffer from variance in lighting con- NeRFStudio introduces an innovative platform, lever-
ditions, including shadows, glare, or inconsistent illumi- aging the Nerfacto model, to streamline NeRF-based
nation, as well as by surfaces with uniform or repetitive model creation and manipulation. Nerfacto integrates
textures and complex shapes or geometries. insights from very recent research, including MipNeRF-
NeRF-based technologies offer cutting-edge solutions 360 [8], Instant-NGP [4], and Ref-NeRF [7], focusing on
to overcome limitations in scene representation by optimizing camera views and sampling processes.
resembling the scene with particles characterized by
density and color. This study compares two neural 3D Gaussian Splatting for Real-Time Radiance Field
radiance-based techniques, Nerfacto (a variation of Rendering 3D Gaussian Splatting [2], a novel ap-
InstantNGP [4] in Nerfstudio [5]) and SuGaR [6] proach to scene representation, contrasts with neural
(a variation of 3D Gaussian Splatting [2]), against fields by optimizing an explicit point-based scene model.
traditional photogrammetry methods. Each point in this representation is associated with var-
ious attributes: a position 𝑝 ∈ R3 , opacity 𝑜 ∈ [0, 1],
third-degree spherical harmonics (SH) coefficients 𝑘 ∈
Neural Radiance Fields Neural Radiance Fields R , 3D scale 𝑠 ∈ R , and 3D rotation 𝑅 ∈ 𝑆𝑂(3) rep-
16 3
(NeRF) have emerged as a significant advancement in resented by 4D quaternions 𝑞 ∈ R4 . Rendering to the
the field of 3D scene reconstruction. The scene is repre- image plane involves accumulating the color 𝑐𝐺𝑆 from
sented with a novel 5D function. This function correlates correctly-sorted points using the equation:
each spatial point (𝑥, 𝑦, 𝑧) with the radiance emitted
in any direction, defined by azimuthal and polar angles 𝑁𝑝 𝑗−1
(𝜃, 𝜑). The outcome, characterized by volume density 𝜎
∑︁ ∏︁
𝑐𝐺𝑆 = 𝑐𝑗 𝛼𝑗 𝜏𝑖 where 𝜏𝑖 = (1 − 𝛼𝑖 ) (2)
and RGB color values 𝑐, varies with the viewing direc- 𝑗=1 𝑖=1
with 𝑐𝑗 determined by SH coefficients 𝑘 and 𝛼𝑗 calcu- selected some simple playground games mixed with
lated from the projected 2D Gaussian with covariance real excavation scenarios where the reconstruction is
Σ′ = 𝐽𝑀 Σ𝑀 𝑇 𝐽 𝑇 , incorporating per-point opacity 𝑜, more challenging. Our datasets consist of 7 playground
viewing transformation 𝑀 , and Jacobian 𝐽 of the affine scenarios and 3 excavation scenarios.
approximation of the projective transformation. The 3D
covariance matrix Σ ensures positive semi-definiteness Acquisition Process. The dataset has been acquired
through the scale matrix 𝑆 = diag(𝑠1 , 𝑠2 , 𝑠3 ) and rota- following the standard procedure that an operator would
tion 𝑅, following Σ = 𝑅𝑆𝑆 𝑇 𝑅𝑇 . follow when working in a given site. The trajectory
Building upon the principles of 3D Gaussian Splatting, reflects a rotation around the object, maintaining the cap-
Surface Gaussian Approximation for Rendering (SuGaR) ture at eye level. During acquisition, the frame rate is set
[6] leverages Gaussian functions to model object surfaces at 5 frames per second with a resolution of 1280 x 720.
within a scene, achieving precision in handling occlu- The accuracy of the geopose data is always less than 3
sions and detailed surface texturing through Gaussian cm in traslation and less than 1 degree for each acquired
"splats" projected onto a volume grid. Each splat influ- image. We maintain a uniform velocity during acqui-
ences the volume’s density and color, based on its spatial sition, so that the number of images for each scenario
location and Gaussian distribution, described mathemat- depends on the length of the trajectory. The playground
ically as: dataset comprises approximately 200 images, while the
excavation dataset contains around 500 images, which
(︂ reflects
)︂ longer trajectories.
1 1
𝐺(x; 𝜇, Σ) = 3 1 exp − (x − 𝜇)𝑇 Σ−1 (x − 𝜇)
(2𝜋) 2 |Σ| 2 2
(3) 3.2. Methodologies Employed
In Eq. (3) x denotes a point in space, 𝜇 the mean lo- Three distinct reconstruction methodologies were ap-
cation (center of the splat), and Σ the covariance matrix plied to the captured datasets; an overview is shown in
shaping the Gaussian distribution. SuGaR’s method for Figure 1:
accumulating multiple splats across a scene constructs
a volumetric representation capturing density and color 1. Photogrammetry: The classical photogrammet-
information, enabling a precise shading and depth ren- ric procedure involves estimating camera orien-
dering. tation parameters for sparse point cloud con-
struction, generating a dense point cloud; mesh
creation and texture extraction complete the re-
3. Methodology construction process. For this purpose we used
COLMAP, with all phases conducted in high-
This study aims to evaluate the effectiveness and poten- quality mode to ensure maximum detail and ac-
tial benefits of Neural Radiance Fields (NeRF) against tra- curacy.
ditional image-based reconstruction techniques, particu- 2. NeRF-Based Reconstruction: The training of
larly photogrammetry, in the context of augmented/vir- Neural Radiance Field reconstruction requires
tual reality applications. Our focus is on challenging known camera poses as input. We use nerfstudio
outdoor scenarios. We include excavation sites and play- [5], and in particular "nerfacto", a model strongly
ground objects, which are characterized by unbounded based on InstantNGP [4], used for its fast training
environments and non-Lambertian surfaces. To facilitate and inference. We then extract the dense point
a direct comparison, the same dataset of images, captured clouds and textured mesh from nerfstudio’s API;
with geo-referencing, is utilized across all reconstruction in particular, for mesh extraction we exploited
methods. This standardized approach ensures that differ- Poisson reconstruction.
ences in the reconstruction quality and efficiency can be 3. Gaussian Splatting (SuGaR): Similarly to NeRF
attributed solely to the methodologies rather than due to this method requires known camera poses as in-
a bad alignment. put. This explicit model is then trained to ap-
proximate the radiance field of the scene. The
3.1. Dataset acquisition training of SuGaR involves more than one step.
The training starts with 7k iterations of normal
The datasets are collected using a system comprised 3D Gaussian Splatting and 7k iterations of SuGaR
of two devices: a smartphone and an RTK-GNSS finetuning to extract a more precise geometry.
spatially calibrated as can be seen from [9]. These
devices ensure highly accurate pose information for The acquisition of our dataset incorporated geo-
all the collected scenes. The study aims to analyze referencing, so as to simplify the alignment process for
industrial applications, therefore, as scenarios, we have the reconstructions. The only exception is NeRF, as an
Generated meshes
Colmap nerfstudio SuGaR
Rendering results
NeRF SuGaR
Cloud to Cloud distance
10 cm
0 cm
Colmap - Nerf Colmap - SuGaR
Figure 2: Comparison of the mesh obtained with the proposed methodologies on the playgrounds dataset. Other scenes can
be found at: https://zenos4mbu.github.io/photogrammetry_nerf.github.io/
implicit framework this model normalizes its coordinates meshes. The point clouds are easily exported since the
between -1 and 1. This aspect of NeRF requires an ad- neural representation can be inspected at any 3D point.
ditional step to calibrate the model, to incorporate scale For the meshes this conversion employs the marching
and translation derived from the geo-referenced input to cubes algorithm and the Poisson surface reconstruction
ensure accurate alignment. For the dataset to be used in method. In the SuGaR framework the mesh extraction
training, we first need to estimate the camera parame- phase it also done through marching cubes or Possian
ters from the input images. This estimation is necessary surface reconstruction. In this case the reconstruction is
because the neural network requires knowledge of both enhanced thanks to the precise estimation of the normals
the camera’s positions and the corresponding images to of the sampled points. To obtain an accuracy metric we
accurately generate the scene representation. To achieve derive a cloud-to-cloud comparison using the CloudCom-
this, we utilized COLMAP, a known software for its ap- pare software.
plication of Structure from Motion (SfM) techniques [3],
for estimating three-dimensional structures from two- 3.3. Comparative Analysis Framework
dimensional image sequences.
To facilitate comparison, given that outputs from pho- The comparative analysis between these methods focuses
togrammetry are not directly comparable with those from on the following key metrics: (i) Accuracy and Detail
neural fields or Gaussian splatting, we incorporate an Resolution, to evaluate the fidelity of the reconstructed
additional conversion phase. NeRFstudio provides func- models to the original scenes, and (ii) Processing Time,
tionality to convert NeRF outputs into point clouds and to assess the efficiency of each methodology in terms
Generated meshes
Colmap nerfstudio SuGaR
Rendering results
NeRF SuGaR
Cloud to Cloud distance 10 cm
0 cm
Colmap - Nerf Colmap - SuGaR
Figure 3: Comparison of the cloud to cloud distance of the proposed methodologies on the excavation sites dataset. Other
scenes can be found at: https://zenos4mbu.github.io/photogrammetry_nerf.github.io/
of computational resources and time required for recon- 4. Discussion
struction. To compare the level of fidelity of the recon-
structed models we propose using the point clouds gen- We show a comparison of NeRF-based techniques against
erated by the studied methods. In this way we can obtain traditional photogrammetry utilizing COLMAP. All mod-
a quantitative metric. To be more specific we measure els are trained on an NVIDIA RTX 3090 GPU. The assess-
the cloud to cloud deviation of the methods based on ment focuses on their effectiveness in view synthesis and
radiance fields with respect to the reconstruction using 3D reconstruction, particularly in expansive, unbounded
classical photogrammetry. This measure is an absolute environments. The results of our analysis highlights
value, which doesn’t tell which method is performing that the three methodologies produce high quality point
better; it only informs about the deviation from one re- clouds, with very close results especially in the fine struc-
construction to the other. Therefore, we also show the tures of the 3D scene, as illustrated in in Figure 2. No-
rendering results in order to see the performances in tably, NeRF’s output shows a denser point cloud around
graphical terms, in Figure 3. In addition to this quantita- high-frequency scene features but has gaps in smoother
tive result we also propose a qualitative comparison of regions. The radiance field rendering results show that
the resulting meshes, comparing the proposed method- the quality of the reconstructed views is really high and
ologies in Figure 2. is very difficult to say if nerfstudio or SuGaR presents
the best result. However, the comparison illustrated in
Figure 2 highlights a failure case of nerfstudio, with a
red area within the scene’s object of interest indicating
a high cloud-to-cloud distance. This issue not only pro- analyzed is the reliance of the current rendering pipelines
duce a discrepancy in the point cloud representation but for virtual and augmented reality on meshes represen-
also results in blurring within the targeted region of the tations. This advantages the classical photogrammetry
neural reconstruction. since its final goal is to obtain a mesh representation. In
Considering the extensive usage of meshes in VR and contrast, neural rendering technologies focus primarily
AR applications, for their simplicity and low memory on view synthesis, offering an alternative that eliminates
footprint, we present a comparison of the meshes pro- the need for mesh generation. SuGaR and more in gen-
duced with the three methodologies. In Figure 2 we show eral 3D Gaussian Splatting techniques produce an explicit
the obtained meshes also showing a detail of the recon- representation that allow for the splatting of Gaussians
struction in the region of the 3D scene with finer details. in the same way traditional methods splat triangle. This
As depicted in Figure 2, there’s a noticeable variance in feature enable SuGaR to render the scene in real time,
detail and texture among the outputs. The COLMAP making it possible to use it into existing pipelines. In the
mesh, while being consistent, falls short on represent- future, we see 3D Gaussian Splatting to be a potential
ing thin structures. In contrast, the NeRF mesh shows replacement for for meshes representations, especially in
greater detail but presents some holes. The SuGaR mesh scenarios requiring the realistic reconstruction of com-
stands out for its superior detail, accurately capturing plex environment.
complex structures where others falter, thanks to its pre-
cise normal calculations. Another point to consider is This research is supported by the project DIMOTY,
the difference in accuracy between the two scenarios we funded by the Autonomous Province of Trento un-
have examined. The playground scene is easier and, in der the LP6/99 framework
fact, has better results compared to the case of excava-
tions. The complexity of the excavation scenario reduces
the performance in reconstruction, especially with the References
SuGaR and NeRF method. It is noticeable in the figure 3
that there are many artifacts on the road surface visible [1] Mildenhall, Ben, et al. "Nerf: Representing scenes
on the Cloud to Cloud distance analysis, especially in the as neural radiance fields for view synthesis." Com-
case of SuGaR, and there are also many holes, especially munications of the ACM 65.1 (2021): 99-106.
in the excavation bottom. Finally, we analyze the process- [2] Kerbl, Bernhard, et al. "3d gaussian splatting for
ing time for each method. Regarding this aspect, there is real-time radiance field rendering." ACM Transac-
no difference between SuGaR and COLMAP. Instead, the tions on Graphics 42.4 (2023): 1-14.
best performance is observed with InstantNGP, which [3] Schonberger et al. "Structure-from-motion revis-
takes about a quarter of the time compared to the other ited." Proceedings of the IEEE conference on com-
methods. Additional materials regarding to our analysis, puter vision and pattern recognition. 2016.
they can be accessed through this link 1 . [4] Müller et al. "Instant neural graphics primitives
with a multiresolution hash encoding." ACM trans-
actions on graphics (TOG) 41.4 (2022): 1-15.
5. Conclusions [5] Tancik et al. "Nerfstudio: A Modular Framework
for Neural Radiance Field Development." ACM SIG-
In this paper we provide a comparative analysis of Neural GRAPH 2023.
radiance fields based reconstruction methods and classi- [6] Chen et al., C. "SuGaR: Pre-training 3D Vi-
cal photogrammetry for unbounded scenarios. We show sual Representations for Robotics." arXiv preprint
results in playgrounds and excavations sites, to access arXiv:2404.01491, 2024.
the performances in easy and complex scenarios. In our [7] Verbin et al. "Ref-nerf: Structured view-dependent
set-up, photogrammetry has provided superior reliabil- appearance for neural radiance fields." 2022
ity in complex scenes, especially on the excavation sites. IEEE/CVF Conference on Computer Vision and Pat-
Proving also better results in modeling completely flat tern Recognition (CVPR).
area which in the NeRF methods presents some artifacts. [8] Barron et al.. "Mip-nerf 360: Unbounded anti-
Although training/reconstruction times are generally not aliased neural radiance fields." Proceedings of the
the main concern in the reconstruction of working areas, IEEE/CVF Conference on Computer Vision and Pat-
some application might benefit from fast reconstruction tern Recognition, pp. 5470-5479, 2022.
times. In this aspect nerfstudio provided the best speed [9] Lorenzo O., Kevin D., et al. "Spatial-Temporal Cal-
in the reconstruction, requiring just 15 minutes for the ibration for Outdoor Location-Based Augmented
training of a scene. An important aspect that needs to be Reality’. IEEE Sensor Journal (2024): “accepted for
publication”.
1
https://zenos4mbu.github.io/photogrammetry_nerf.github.
io/