<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improvement of Precise Vehicle Location in Urban Areas Using Video-based Photogrammetry</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>André Pinhal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Gonçalves</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIIMAR, Interdisciplinary Centre of Marine and Environmental Research</institution>
          ,
          <addr-line>4450-208 Matosinhos</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Porto</institution>
          ,
          <addr-line>Observatório Astronómico, 4430-146 Vila Nova de Gaia</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a system under development at the University of Porto, which integrates a Septentrio MosaicGO GNSS receiver, with a helical antenna, associated with a Gopro action camera. Video frames can be processed by a Structure from Motion (SfM) approach to do an alignment of sequential frames, derive relative position of the camera projection centres, and eventually create point clouds of the surrounding environment. The system being developed has the camera and the antenna mounted in a block, which is placed on a rear-view mirror of a car. The camera acquires a video in 4K mode, at 60 frames per second. Frames are extracted from the video at a frequency of between 2Hz to 10 Hz, depending on the car speed. The camera collects GNSS data with its own navigation receiver, allowing to tag all extracted video frames with GPS time and position. Due to the low accuracy of the camera receiver, its positions are discarded and only GPS time is kept. This time is used to synchronize with much more accurate positions obtained by the RTK receiver, connected to a CORS network. Many times, the obstacles present in urban areas do not allow for the high accuracy positioning. A trajectory made within an urban area will have some frames with very precise positions and other with much higher errors. All frames are photogrammetrically processed. Standard deviations of camera positions are considered in the bundle adjustment, allowing for the improvement of the camera positions of lower accuracy. The SfM processing also generates a point cloud of the surrounding objects, which can be densi ed. Objects identi ed on this point cloud can be used to assess the location accuracy of the process. Several experiments carried with the system in the city of Porto allowed to con rm that a positional accuracy of 10 cm can be achieved.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RTK</kwd>
        <kwd>ambiguity x</kwd>
        <kwd>action camera</kwd>
        <kwd>MMS</kwd>
        <kwd>structure from motion</kwd>
        <kwd>point cloud</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Precise positioning by GNSS RTK (Real Time Kinematics) is now very common in Surveying and GIS
data collection for 3D city models. There are many surveying solutions for use in mobile mapping
systems (MMS), integrating GNSS positioning, inertial navigation systems (INS) and data acquisition
sensors, such as cameras or laser scanners [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Popular brands, as Riegl, Leica Geosystems or Trimble,
among others, provide solutions that allow for the generation of very dense and accurate point clouds
in urban environments, but with costs of several hundred thousand euros, and whose use is restricted to
professional markets. Many users will be interested in having cheaper systems, making use of small and
low-cost GNSS receivers now available, which can have good performance in urban areas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. There
are several devices for development, costing less than 1000 euros, which have triple frequency and all
the capacity for high-precision di erential positioning, in real time or in post-processing. Regardless of
the additional sensors associated with the GNSS receiver, it is currently possible to have high-precision
kinematic positioning in a motor vehicle with this type of low-cost receivers.
      </p>
      <p>As inevitably happens with GNSS positioning in urban environments, or generally in situations
of major obstructions to signal propagation, RTK positioning, with ambiguity xing (“ x” solution),
with errors of a few centimeters, will not be possible in signi cant parts of trajectories made in this
type of environment. “Float” type solutions will often result, with precision on the order of a few
decimeters, or even “single point” solutions (SPP), with errors of a few meters. This limitation is
normally overcome with inertial navigation systems, signi cantly increasing the costs of an MMS. The
system under development aims to solve this problem through the use of sequences of images acquired
by a small video camera. It is possible to apply to the image sequences, photogrammetric techniques
that determine the relative orientation of the images, thus contributing to a more complete positioning
solution.</p>
      <p>
        The system is intended to use cameras classi ed as “action camera”. These cameras are compact,
rugged, and versatile to capture both video and still images in outdoor conditions. They are very
popular among adventure enthusiasts, many times also interested in geolocating images and videos.
For this reason, there are several models that incorporate a GPS navigation receiver, allowing to geotag
photographs, and video as well. That is the case of cameras of the brand Gopro, since the launch of
model Hero 5. These cameras acquire video in format MP4, which can accommodate GPS data, to be
later extracted using speci c software [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] provided by the camera manufacturer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It is possible to
obtain in this way GPS position and GPS time of every individual video frames.
      </p>
      <p>
        Photogrammetry has seen major developments in recent years, through the incorporation of
methodologies derived from computer vision. It was mainly with an algorithm developed by David
Lowe [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] – SIFT (Scale Invariant Feature transform) – that made the automatic extraction of common
points between images much easier, even for images with variations of orientation and scale. This
availability of conjugate points between images of large blocks is combined with the bundle adjustment
methods already possible for large blocks. This process is also applicable to cameras for which
only approximate internal orientation parameters are known, integrating in the bundle adjustment
a self-calibration process. This largely automated methodology of image orientation started to be
designated in some communities as “Structure from Motion” (SfM), allowing the processing of aerial or
terrestrial images.
      </p>
      <p>In the present system a video is acquired, frames are extracted and are oriented (“aligned”, in the
terminology associated to SfM) in relative terms. Provided that, for some of the images, coordinates of
projection centers are known, all images in the block will have their positions determined. In this way
it is possible to complete or correct the camera trajectory.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Description of the hardware implemented</title>
      <p>The system described in this paper incorporates a Septentrio Mosaic GO X5 GNSS receiver, that has a
compact and lightweight design, suitable for integration into platforms such as drones and handheld
devices. It supports multiple satellite constellations, including GPS, GLONASS, Galileo and BeiDou,
and does phase measurement in three di erent frequencies. The receiver is mounted in a box, which is
attached to the back of the camera, and a helical antenna is placed on top of the box. A small power
bank is mounted next to the receiver, which once turned on starts recording a le in the Septentrio
Binary Format (SBF). When connected to a CORS (Continuous Operation Reference Station) network
the receiver provides reliable real time kinematic (RTK) positioning. The receiver was used in 10 Hz,
providing positions with enough density to properly model curved trajectories of a car moving in a city.</p>
      <p>The operation of the system requires the use of a smartphone, which connects to the internet.
An application runs on the smartphone, which allows for controlling the receiver, receive an NTRIP
correction from a CORS network, carry out the di erential correction and monitor the receiver’s
performance. In the present case an Android smartphone, was used, running the SW-Maps application1,
which is free software and includes all those capabilities. Although positions may be recorded in the
app, they are retrieved from the receiver memory, in the form of the SBF le and in NMEA format
(National Marine Electronics Association), for the RTK positions.</p>
      <p>The camera has a xing piece to be mounted, together with the receiver, on the side rear view mirror
of a car. Once the camera and the receiver are turned on, both can be controlled from inside the vehicle,
1http://softwel.com.np/mobile_products
with the smartphone. Figure 1 shows the block mounted on the right rear-view mirror of a car. The
camera optical axis is horizontal, rotated by 15 degrees to the right of the vehicle axis.</p>
      <p>The camera has a GNSS receiver, which works at a frequency of 18 Hz. After the camera is turned
on, the user should check when GPS position is available and only then start the video recording. The
receiver must be connected beforehand and should, preferably, have RTK xed position when the
camera starts the video. Note that there is no electronics establishing a connection between the two
devices. The way of synchronizing between image and RTK positions is done through the GPS time
recorded by the two devices.</p>
      <p>The camera acquires video in 4K resolution, with a dimension of 3840 by 2160 pixels, at a rate of
60 fps (frames per second). Action cameras are known for having a large eld of view, at the cost
of a signi cant radial deformation. Although images can be used in this way, preference was given
to the image mode called “linear”, which consists of the application of a general correction model to
produce images with a regular central projection. The resulting image has an equivalent focal distance
of approximately 1800 pixels, still a very wide angle. Small additional corrections, in the focal distance,
principal point position and radial distortion coe cients, characteristic of each camera unit, may be
necessary, but these will be handled in the processing step, through a self-calibration incorporated in
the SfM bundle-adjustment.</p>
      <p>Videos taken from the car are processed to extract navigation information from the camera, which
is carried out by programs from the GPMF library2. Information is extracted in groups of 60 frames,
that is, every 1.001 seconds of the video, and includes video time of each group, position, speed and
UTC time, in seconds of the day. Table 1 shows an example of this information. Positions result from
the camera navigation receiver and will, in fact, be discarded. The main information is the video time,
which is transformed into frame number, and UTC.</p>
      <p>Frames are extracted from the video, in JPEG format, at a suitable cadence so that the overlaps
between consecutive images are adequate for the alignment process to be successful. For the conditions
under which the system was operated, the 4Hz cadence proved to be adequate. In situations of higher
speed, and especially when there is rotation, the cadence may be increased. In cases where the vehicle
is stopped, for example at tra c lights, the corresponding frames may be discarded if the GPMF data
has a speed lower than a tolerance, for example 0.5 m/s.</p>
      <p>
        The positions collected by the GNSS receiver are retrieved in the NMEA format, essentially through
messages GGA and GST [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], that provide results of UTC time, position, quality of the position (Q), with
values of 1 for SPP, 4 for FIX and 5 for FLOAT. Standard deviations are also obtained, which will be of
few centimetres in case of Q=4. Surveys were carried with a connection to the Portuguese permanent
station network ReNEP (“Rede Nacional de Estações Permanentes”). Although a post-processing can
be done over the SBF data, in general the RTK positioning could be obtained in those cases where
obstacles allowed for that. Table 2 shows a sample of data extracted from the NMEA les, with a loss of
ambiguity x and a sudden increase of estimated precision.
      </p>
      <p>At this point, a linear interpolation is carried in order to calculate camera position. There is a need of
a previous calibration of the relation between frame number and UTC, which is described in the next
section. There is also a small o set between the antenna phase centre and the centre of the camera,
of approximately 4 cm in the horizontal component and 4 cm in the vertical. In a rst approximation
this is not being corrected, since it is smaller than what is initially expected for the system accuracy.
The camera GPS receiver has a frequency of 18 Hz, i.e., 3.3 times smaller than the video frame rate.
Assuming an error of 1.5 frames in the synchronization, i.e., 0.25 seconds, for a car moving at a speed of
30 km/h, the error corresponds to a distance of 20 cm. For that reason, we put the expectations at the
level of 1 or 2 decimetres. However, the procedure for correction of the small lever arm is described in
the calibration section.</p>
      <p>
        Finally, all the selected frames were aligned in a photogrammetric software that applies the SfM
concept, which in our case was Agisoft Metashape [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Interpolated positions are provided only for
those images that had Q=4. Standard deviation of 0.2 m was considered for the least squares adjustment.
As a result of the bundle adjustment positions of all images are obtained. They will be analysed in an
independent manner.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. System calibration</title>
      <p>In order to process the image data and RTK positions collected, some calibration steps are necessary,
especially regarding the times to be assigned to the frames extracted from the video.</p>
      <sec id="sec-3-1">
        <title>3.1. Time calibration</title>
        <p>As seen in Table 2, the UTC time does not correspond to the exact moments of the rst frame of each
block of 60 frames, since they are not equally spaced. A linear relationship was established between
frame number, Nf , and UTC time, through,
where A0 is the time of the rst frame (starting count at frame zero) and A1 is the frame rate. This will
allow us to determine a more reliable value for the UTC time of the rst frame.</p>
        <p>An independent validation of this assessment of the time of the rst frame, was done with a small
ashlight, connected to the GNSS receiver, red in front of the camera. A precise time of the ash event
is recorded on the GNSS receiver, and with a few ashes along a video, the calibration can also be done
with the same formula. Very similar results were obtained, with di erences in the time of the rst frame
around 20 ms, i.e., only slightly more than one frame. Figure 2 shows an image of the ash in a frame.</p>
        <p>U T C = A0 + A1Nf ,
(1)</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Lever arm between receiver and camera</title>
        <p>As referred before, there is a lever arm between the antenna and the camera, which is of only a few
centimeters and, in a rst approach, is not being considered. There is no attitude assessment, so
the coordinate transportation must be done with the orientation of the trajectory. As the vehicle
moves approximately in a horizontal plane, the azimuth of the trajectory can be estimated and the
corresponding rotation applied to the vector between antenna and camera. In the vertical component
there is only a need of subtracting the height di erence between the phase center and the camera. The
actual projection center position is not known, but since the focal distance of the lens is only 3 mm, it
was assumed as the center of the lens, outside the camera.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Assessment of system performance</title>
      <p>The system was tested in some urban environments, in a rst approach, in areas without extreme
situations of di culty in signal capture. A route was taken in an urban residential area of the city of
Porto, without tall buildings, but with trees along the streets. The route began and ended at the same
point, it had some crossings and there were some repetitions of some sections along the route, which
had a total length of 3.2 km. Images were extracted at a rate of 4 Hz, in a total of 2464 images. Figure 3
shows, on the left side, the path followed, over the Google Maps image base, and on the right, two
examples of frames from the video, one in a more unobstructed area and another with more tree cover.</p>
      <sec id="sec-4-1">
        <title>4.1. Image alignment</title>
        <p>After interpolation of RTK positions for all images, it was observed that 72% were FIX-type positions.
The images were loaded into the Agisoft Mestashape program and a rst alignment was made, at this
step without coordinates of the projection centers. All were successfully aligned. The coordinates of
the images that had FIX were then inserted, with an a-priori accuracy value of 0.2 meters, in the three
coordinates. The bundle adjustment was reprocessed, resulting in adjusted coordinates for all images.
The e ect of trajectory quality improvement is observed in places where the positions were not FIX.
Figure 4 shows, on the left, an area where there is an interruption in the FIX positions (blue dots), in a
total of nearly 40 images. After aligning the images, the positions were regularized, resulting in a much
smoother trajectory that was in line with expectations.</p>
        <p>Although visually there is a qualitative improvement in the trajectory, and a very small change in
the sections where there was FIX, this assessment has some subjectivity, so it is preferable to have a
numerical evaluation of the error.</p>
        <p>The simplest way to make this assessment is through checkpoints, whose coordinates can be
determined photogrammetrically through the images. A total of 14 points were selected, on well-de ned
points, which were identi ed in the nal photogrammetric project, on the images where they are
observed with the highest quality. This results in coordinates of these points. Subsequently, the
points were surveyed on the terrain, with GNSS, and the corresponding three-dimensional errors were
evaluated. Figure 5 shows the location of the checkpoints and the location of two of them in the images.</p>
        <p>Errors were calculated in the three coordinates (ex, ey, ez), in centimetres, and the corresponding
statistics are presented in Table 3. They are: the average error (AVG), the root mean square error (RMSE)
and the maximum absolute error (MAX), according to:</p>
        <p>AV G =</p>
        <p>P ei , RM SE =
n</p>
        <p>n
r P e2
i , M AX = max(|ei|), i = 1, . . . , n.</p>
        <p>(2)</p>
        <p>Average errors are small, not evidencing systematic trends. The RMSEs are of the order of one
decimetre, agreeing with the initial expectation. These were the rst tests, which are quite promising.
More performance assessments of the system, in diverse conditions, will be carried in a near future.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and future work</title>
      <p>A positioning system for precise position determination of vehicles in urban environments is under
development at the University of Porto. The system integrates video imagery processed by SfM in
order to contribute to an integrated trajectory solution. This allows to complement the position gaps
in the urban environment. Initial tests point to a possible accuracy at decimeter level. More tests
will be carried out in order to assess system performance in a diversity of environments with strong
obstructions, both in urban areas as in forested environments.</p>
      <p>Several improvements to the system will be developed soon, namely an improvement in the temporal
synchronization process, for example using new models of action cameras with higher frame rates. It is
also intended to improve the correct reduction of the GNSS position to the camera projection center.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was developed within project 4Map4Health (CHIST-ERA/0006/2019), nanced by the
Portuguese Foundation for Science and Technology, under program ERA NET CHIST-ERA.</p>
      <p>The GNSS RTK positioning was done with the ReNEP permanent stations of the Directorate General
for Territorial Development (DGT).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Elhashash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Albanwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <article-title>A Review of Mobile Mapping Systems: From Sensors to Applications</article-title>
          , Sensors
          <volume>22</volume>
          (
          <year>2022</year>
          )
          <fpage>4262</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hamza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stopar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sterle</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Pavlov i -Pre eren, Low-Cost Dual-Frequency GNSS Receivers and Antennas for Surveying in Urban Areas</article-title>
          ,
          <source>Sensors</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>2861</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Petroskey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Tibavinsky</surname>
          </string-name>
          ,
          <article-title>Validation of Telemetry Data Acquisition Using GoPro Cameras</article-title>
          ,
          <source>Technical Report, SAE Technical Paper</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] GoPro, Metadata Format - GPMF, Processing software available at https://github.com/gopro/gpm f-parser
          <source>(v2.2.1)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Lowe</surname>
          </string-name>
          ,
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          ,
          <source>International journal of computer vision 60</source>
          (
          <year>2004</year>
          )
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ardalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Awange</surname>
          </string-name>
          ,
          <article-title>Compatibility of NMEA GGA with GPS receivers implementation</article-title>
          ,
          <source>GPS Solutions 3</source>
          (
          <year>2000</year>
          )
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Metashape</surname>
          </string-name>
          , Agisoft, Agisoft Metashape User Manual,
          <source>Professional Edition, Version</source>
          <volume>2</volume>
          .1, Available at https://www.agisoft.com/pdf/metashape-pro_
          <article-title>2_1_en</article-title>
          .pdf,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>