<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Ital-IA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>3D reconstruction methods in industrial settings: a comparative study for COLMAP, NeRF and 3D Gaussian Splatting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zeno Sambugaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Orlandi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Conci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DISI, University of Trento</institution>
          ,
          <addr-line>via Sommarive, 5, Povo, 38123</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>4</volume>
      <fpage>29</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>3D rendering techniques have undergone a rapid evolution with the emergence of novel and advanced methodologies, redefining the boundaries of realism and computational eficiency. This study explores recent advancements in the field, comparing established approaches like photogrammetry with software such as COLMAP against the new frontiers opened by emerging view synthesis approaches like Neural Radiance Fields (NeRF), and 3D Gaussian Splatting. In this paper, we present a comprehensive comparison of the described methods tailored for industrial applications, where the data acquisition is generally conducted by human operators employing handheld devices.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Photogrammetry</kwd>
        <kwd>NeRF</kwd>
        <kwd>Gaussian Splatting</kwd>
        <kwd>3D Reconstruction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Geo referenced
Image acquisition
Image position
re nement
Colmap Sparse
Photogrammetry
Dense point cloud
nerfstudio
3DGS
Neural radiance eld
Gaussian scene
representation
Point cloud
Generation
Mesh Generation
Mesh Generation
Georeferencing
Point cloud
Generation
Comparison of the methods</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        tion. This relation is formulated through the Multi-Layer
Perceptron (MLP)  , expressed as:
3D reconstruction is crucial in fields like construction,
excavation, and worksite management. Employing multi-  : (x, d) → (c,  ) (1)
view reconstruction techniques, scenes are captured from
various angles using 2D images. This enables detailed where x = (, , ) denotes the coordinates within the
monitoring of the project progress and provides the scene, and d (,  ) represents the 3D Cartesian unit
vecability to virtually navigate sites, both during and after tor indicating the direction. The color c = (, , ) shifts
completion, utilizing geo-referencing and virtual reality. with the viewing angle, while  , denoting volume density,
Among the most common photogrammetric solutions remains invariant. The usage of neural volume rendering
for 3D view reconstruction, we focus on COLMAP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] pipelines, over traditional point clouds or meshes,
enfor its open-access policy and continual improvements. able the modeling of variations in color and illumination.
COLMAP enables the conversion of 2D images into com- InstantNGP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], short for Instant Neural Graphics
Primprehensive 3D models, including point clouds and tex- itives, is a variant that enhances NeRF’s framework to
tured meshes, enabling advanced spatial analyses. How- expedite scene reconstruction significantly. By refining
ever, the application of photogrammetric reconstruction the neural network’s architecture and computations,
Inencounters several challenges, particularly when dealing stantNGP facilitates quicker achievement of high-quality
with objects characterized by complex optical properties results, positioning it as a viable option for real-time
such as high absorbency, reflectivity, or scattering. These applications.
methods can also sufer from variance in lighting con- NeRFStudio introduces an innovative platform,
leverditions, including shadows, glare, or inconsistent illumi- aging the Nerfacto model, to streamline NeRF-based
nation, as well as by surfaces with uniform or repetitive model creation and manipulation. Nerfacto integrates
textures and complex shapes or geometries. insights from very recent research, including
MipNeRF
      </p>
      <p>
        NeRF-based technologies ofer cutting-edge solutions 360 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Instant-NGP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and Ref-NeRF [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], focusing on
to overcome limitations in scene representation by optimizing camera views and sampling processes.
resembling the scene with particles characterized by
density and color. This study compares two neural 3D Gaussian Splatting for Real-Time Radiance Field
radiance-based techniques, Nerfacto (a variation of Rendering 3D Gaussian Splatting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a novel
apInstantNGP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in Nerfstudio [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) and SuGaR [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proach to scene representation, contrasts with neural
(a variation of 3D Gaussian Splatting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), against ifelds by optimizing an explicit point-based scene model.
traditional photogrammetry methods. Each point in this representation is associated with
various attributes: a position  ∈ R3, opacity  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ],
third-degree spherical harmonics (SH) coeficients  ∈
R16, 3D scale  ∈ R3, and 3D rotation  ∈ (3)
represented by 4D quaternions  ∈ R4. Rendering to the
image plane involves accumulating the color  from
correctly-sorted points using the equation:
Neural Radiance Fields Neural Radiance Fields
(NeRF) have emerged as a significant advancement in
the field of 3D scene reconstruction. The scene is
represented with a novel 5D function. This function correlates
each spatial point (, , ) with the radiance emitted
in any direction, defined by azimuthal and polar angles
(,  ). The outcome, characterized by volume density 
and RGB color values , varies with the viewing
direc
 = ∑︁     
=1
      </p>
      <p>− 1
where   = ∏︁(1 −  )
=1
(2)
with  determined by SH coeficients</p>
      <p>and   calcu- selected some simple playground games mixed with
lated from the projected 2D Gaussian with covariance
Σ ′ =   Σ     , incorporating per-point opacity ,
viewing transformation  , and Jacobian  of the afine
approximation of the projective transformation. The 3D
covariance matrix Σ ensures positive semi-definiteness
real excavation scenarios where the reconstruction is
more challenging. Our datasets consist of 7 playground
scenarios and 3 excavation scenarios.</p>
      <sec id="sec-2-1">
        <title>Acquisition Process. The dataset has been acquired tion , following Σ =</title>
        <p>.
through the scale matrix  = diag(1, 2, 3) and rota- following the standard procedure that an operator would
follow when working in a given site. The trajectory</p>
        <p>Building upon the principles of 3D Gaussian Splatting, reflects a rotation around the object, maintaining the
cap</p>
      </sec>
      <sec id="sec-2-2">
        <title>Surface Gaussian Approximation for Rendering (SuGaR)</title>
        <p>
          ture at eye level. During acquisition, the frame rate is set
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] leverages Gaussian functions to model object surfaces
at 5 frames per second with a resolution of 1280 x 720.
within a scene, achieving precision in handling
occluThe accuracy of the geopose data is always less than 3
sions and detailed surface texturing through Gaussian
cm in traslation and less than 1 degree for each acquired
"splats" projected onto a volume grid. Each splat
influimage. We maintain a uniform velocity during
acquiences the volume’s density and color, based on its spatial
sition, so that the number of images for each scenario
location and Gaussian distribution, described
mathematdepends on the length of the trajectory. The playground
ically as:
(x; , Σ) =
dataset comprises approximately 200 images, while the
excavation dataset contains around 500 images, which
︂)
reflects longer trajectories.
(3)
        </p>
        <sec id="sec-2-2-1">
          <title>3.2. Methodologies Employed</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Three distinct reconstruction methodologies were applied to the captured datasets; an overview is shown in Figure 1:</title>
        <p>3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>This study aims to evaluate the efectiveness and
potential benefits of Neural Radiance Fields (NeRF) against
traditional image-based reconstruction techniques,
particularly photogrammetry, in the context of
augmented/virtual reality applications. Our focus is on challenging
outdoor scenarios. We include excavation sites and
playground objects, which are characterized by unbounded
environments and non-Lambertian surfaces. To facilitate
a direct comparison, the same dataset of images, captured
with geo-referencing, is utilized across all reconstruction
methods. This standardized approach ensures that
diferences in the reconstruction quality and eficiency can be
attributed solely to the methodologies rather than due to
a bad alignment.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset acquisition</title>
        <sec id="sec-3-1-1">
          <title>The datasets are collected using a system comprised</title>
          <p>of two devices:</p>
          <p>
            a smartphone and an RTK-GNSS
spatially calibrated as can be seen from [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
These
devices ensure highly accurate pose information for
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>1. Photogrammetry: The classical photogrammet</title>
          <p>ric procedure involves estimating camera
orientation parameters for sparse point cloud
construction, generating a dense point cloud; mesh
creation and texture extraction complete the
reconstruction process. For this purpose we used</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>COLMAP, with all phases conducted in highquality mode to ensure maximum detail and accuracy.</title>
          <p>
            2. NeRF-Based Reconstruction: The training of
Neural Radiance Field reconstruction requires
known camera poses as input. We use nerfstudio
[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], and in particular "nerfacto", a model strongly
based on InstantNGP [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], used for its fast training
and inference. We then extract the dense point
clouds and textured mesh from nerfstudio’s API;
in particular, for mesh extraction we exploited
          </p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Poisson reconstruction.</title>
          <p>3. Gaussian Splatting (SuGaR): Similarly to NeRF
this method requires known camera poses as
input. This explicit model is then trained to
approximate the radiance field of the scene. The
training of SuGaR involves more than one step.</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>The training starts with 7k iterations of normal</title>
        </sec>
        <sec id="sec-3-1-6">
          <title>3D Gaussian Splatting and 7k iterations of SuGaR ifnetuning to extract a more precise geometry.</title>
        </sec>
        <sec id="sec-3-1-7">
          <title>The acquisition of our dataset incorporated geo</title>
          <p>all the collected scenes.</p>
          <p>
            The study aims to analyze
referencing, so as to simplify the alignment process for
industrial applications, therefore, as scenarios, we have
the reconstructions. The only exception is NeRF, as an
Rendering results
NeRF
implicit framework this model normalizes its coordinates meshes. The point clouds are easily exported since the
between -1 and 1. This aspect of NeRF requires an ad- neural representation can be inspected at any 3D point.
ditional step to calibrate the model, to incorporate scale For the meshes this conversion employs the marching
and translation derived from the geo-referenced input to cubes algorithm and the Poisson surface reconstruction
ensure accurate alignment. For the dataset to be used in method. In the SuGaR framework the mesh extraction
training, we first need to estimate the camera parame- phase it also done through marching cubes or Possian
ters from the input images. This estimation is necessary surface reconstruction. In this case the reconstruction is
because the neural network requires knowledge of both enhanced thanks to the precise estimation of the normals
the camera’s positions and the corresponding images to of the sampled points. To obtain an accuracy metric we
accurately generate the scene representation. To achieve derive a cloud-to-cloud comparison using the
CloudComthis, we utilized COLMAP, a known software for its ap- pare software.
plication of Structure from Motion (SfM) techniques [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ],
for estimating three-dimensional structures from two- 3.3. Comparative Analysis Framework
dimensional image sequences.
          </p>
          <p>To facilitate comparison, given that outputs from pho- The comparative analysis between these methods focuses
togrammetry are not directly comparable with those from on the following key metrics: (i) Accuracy and Detail
neural fields or Gaussian splatting, we incorporate an Resolution, to evaluate the fidelity of the reconstructed
additional conversion phase. NeRFstudio provides func- models to the original scenes, and (ii) Processing Time,
tionality to convert NeRF outputs into point clouds and to assess the eficiency of each methodology in terms
Rendering results</p>
          <p>Colmap
nerfstudio</p>
          <p>SuGaR
NeRF
Cloud to Cloud distance</p>
          <p>SuGaR
10cm
0cm
4. Discussion
of computational resources and time required for
reconstruction. To compare the level of fidelity of the
reconstructed models we propose using the point clouds gen- We show a comparison of NeRF-based techniques against
erated by the studied methods. In this way we can obtain traditional photogrammetry utilizing COLMAP. All
moda quantitative metric. To be more specific we measure els are trained on an NVIDIA RTX 3090 GPU. The
assessthe cloud to cloud deviation of the methods based on ment focuses on their efectiveness in view synthesis and
radiance fields with respect to the reconstruction using 3D reconstruction, particularly in expansive, unbounded
classical photogrammetry. This measure is an absolute environments. The results of our analysis highlights
value, which doesn’t tell which method is performing that the three methodologies produce high quality point
better; it only informs about the deviation from one re- clouds, with very close results especially in the fine
strucconstruction to the other. Therefore, we also show the tures of the 3D scene, as illustrated in in Figure 2.
Norendering results in order to see the performances in tably, NeRF’s output shows a denser point cloud around
graphical terms, in Figure 3. In addition to this quantita- high-frequency scene features but has gaps in smoother
tive result we also propose a qualitative comparison of regions. The radiance field rendering results show that
the resulting meshes, comparing the proposed method- the quality of the reconstructed views is really high and
ologies in Figure 2. is very dificult to say if nerfstudio or SuGaR presents
the best result. However, the comparison illustrated in
Figure 2 highlights a failure case of nerfstudio, with a
red area within the scene’s object of interest indicating
a high cloud-to-cloud distance. This issue not only pro- analyzed is the reliance of the current rendering pipelines
duce a discrepancy in the point cloud representation but for virtual and augmented reality on meshes
represenalso results in blurring within the targeted region of the tations. This advantages the classical photogrammetry
neural reconstruction. since its final goal is to obtain a mesh representation. In
Considering the extensive usage of meshes in VR and contrast, neural rendering technologies focus primarily
AR applications, for their simplicity and low memory on view synthesis, ofering an alternative that eliminates
footprint, we present a comparison of the meshes pro- the need for mesh generation. SuGaR and more in
genduced with the three methodologies. In Figure 2 we show eral 3D Gaussian Splatting techniques produce an explicit
the obtained meshes also showing a detail of the recon- representation that allow for the splatting of Gaussians
struction in the region of the 3D scene with finer details. in the same way traditional methods splat triangle. This
As depicted in Figure 2, there’s a noticeable variance in feature enable SuGaR to render the scene in real time,
detail and texture among the outputs. The COLMAP making it possible to use it into existing pipelines. In the
mesh, while being consistent, falls short on represent- future, we see 3D Gaussian Splatting to be a potential
ing thin structures. In contrast, the NeRF mesh shows replacement for for meshes representations, especially in
greater detail but presents some holes. The SuGaR mesh scenarios requiring the realistic reconstruction of
comstands out for its superior detail, accurately capturing plex environment.
complex structures where others falter, thanks to its
precise normal calculations. Another point to consider is This research is supported by the project DIMOTY,
the diference in accuracy between the two scenarios we funded by the Autonomous Province of Trento
unhave examined. The playground scene is easier and, in der the LP6/99 framework
fact, has better results compared to the case of
excavations. The complexity of the excavation scenario reduces
the performance in reconstruction, especially with the References
SuGaR and NeRF method. It is noticeable in the figure 3
that there are many artifacts on the road surface visible
on the Cloud to Cloud distance analysis, especially in the
case of SuGaR, and there are also many holes, especially
in the excavation bottom. Finally, we analyze the
processing time for each method. Regarding this aspect, there is
no diference between SuGaR and COLMAP. Instead, the
best performance is observed with InstantNGP, which
takes about a quarter of the time compared to the other
methods. Additional materials regarding to our analysis,
they can be accessed through this link 1.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions</title>
      <p>In this paper we provide a comparative analysis of Neural
radiance fields based reconstruction methods and
classical photogrammetry for unbounded scenarios. We show
results in playgrounds and excavations sites, to access
the performances in easy and complex scenarios. In our
set-up, photogrammetry has provided superior
reliability in complex scenes, especially on the excavation sites.
Proving also better results in modeling completely flat
area which in the NeRF methods presents some artifacts.
Although training/reconstruction times are generally not
the main concern in the reconstruction of working areas,
some application might benefit from fast reconstruction
times. In this aspect nerfstudio provided the best speed
in the reconstruction, requiring just 15 minutes for the
training of a scene. An important aspect that needs to be</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Mildenhall</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
          </string-name>
          , et al.
          <article-title>"Nerf: Representing scenes as neural radiance fields for view synthesis</article-title>
          .
          <source>" Communications of the ACM 65.1</source>
          (
          <year>2021</year>
          ):
          <fpage>99</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Kerbl</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bernhard</surname>
          </string-name>
          , et al.
          <article-title>"3d gaussian splatting for real-time radiance field rendering</article-title>
          .
          <source>" ACM Transactions on Graphics 42.4</source>
          (
          <year>2023</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Schonberger</surname>
          </string-name>
          et al.
          <article-title>"Structure-from-motion revisited</article-title>
          .
          <source>" Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Müller</surname>
          </string-name>
          et al.
          <article-title>"Instant neural graphics primitives with a multiresolution hash encoding</article-title>
          .
          <source>" ACM transactions on graphics (TOG) 41.4</source>
          (
          <year>2022</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Tancik</surname>
          </string-name>
          et al.
          <article-title>"Nerfstudio: A Modular Framework for Neural Radiance Field Development."</article-title>
          <source>ACM SIGGRAPH</source>
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Chen</surname>
            et al.,
            <given-names>C.</given-names>
          </string-name>
          <article-title>"SuGaR: Pre-training 3D Visual Representations for Robotics."</article-title>
          <source>arXiv preprint arXiv:2404.01491</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Verbin</surname>
          </string-name>
          et al.
          <article-title>"Ref-nerf: Structured view-dependent appearance for neural radiance fields." 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Barron</surname>
          </string-name>
          et al..
          <source>"Mip-nerf 360: Unbounded antialiased neural radiance fields." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          , pp.
          <fpage>5470</fpage>
          -
          <lpage>5479</lpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Lorenzo</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kevin</surname>
            <given-names>D.</given-names>
          </string-name>
          , et al.
          <article-title>"Spatial-Temporal Calibration for Outdoor Location-Based Augmented Reality'</article-title>
          .
          <source>IEEE Sensor Journal</source>
          (
          <year>2024</year>
          )
          <article-title>: “accepted for publication”</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>