<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Object Geolocalization Using Consumer-Grade Devices: Algorithms and Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harald Lampesberger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eckehard Hermann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Zeilinger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department Secure Information Systems, University of Applied Sciences Upper</institution>
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this article, we present advancements in object geolocalization, originally intended for humanitarian aid, specifically for the localization of reminiscences of war, but also applicable in ecology, e.g., estimating geolocations of observed wildlife. The research utilizes consumer-grade devices such as smartphones and drones, aiming to localize objects geometrically without laser range finders. The main contributions are three geolocalization algorithms operating in a global coordinate system - Ray Marching, Ray Intersection, and Gradient Descent each suited for diferent scenarios depending on the availability and nature of measurement data. We also discuss the limitations and sources of error, such as GNSS accuracy and magnetic disturbances, and suggest a method for error correction. The implemented software prototypes were tested using the Parrot ANAFI drone and an Apple iPhone 14 smartphone, with future work focusing on improving error correction and incorporating object detectors for enhanced real-time geolocalization capabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;UAV</kwd>
        <kwd>drone</kwd>
        <kwd>smartphone</kwd>
        <kwd>geolocalization</kwd>
        <kwd>algorithms</kwd>
        <kwd>prototype</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Object geolocalization is crucial in various scenarios such as humanitarian aid and the support for
civilian and military organizations in identifying and geolocalizing landmines, unexploded ordnances
(UXOs), and unexploded bombs (UXBs). These are ecologically problematic as areas contaminated
with those reminiscences of war pose a significant risk for people and animals, making agricultural
production in these areas hazardous. As described in previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], our eforts are focused on
developing tools for identifying and locating suspicious objects.
      </p>
      <p>
        So far, we resorted to variants of YOLO [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] neural network-based object detection, e.g.,
TPHYOLOv5 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], for identifying suspicious objects in sensor data. We have accumulated experience
industrially, institutionally, and militarily, using 2D camera and 3D LIDAR systems, in building and labeling
our own 2D and 3D LIDAR datasets. This paper focuses, however, on geolocalizing objects that are
detected by aforementioned techniques. We have already outlined our initial considerations for this in
previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: Determining the geolocation of a suspicious object should be possible from a safe
distance, using a consumer-grade device, and a geolocation should be provided in three dimensions
(latitude, longitude, altitude) to include placement at the ground, in the air, in or on buildings, or on
trees. Our goal is to enable consumer-grade devices, like smartphones or drones, in localizing objects
through optical and geometrical means alone, for instance without the use of special sensors and laser
rangefinders. In addition, we assume a changing environment, e.g., caused by an earthquake, heavy
lfood, or large-scale bombing campaign, which hinders visual geolocalization techniques based on
prerecorded image datasets. In this paper, we lay the groundwork for the first supported devices in this
endeavor based on the Parrot ANAFI drone and recent Apple iPhones.
      </p>
      <p>
        Since the publication of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] there have been several enquiries regarding the use of our software
prototypes beyond acquiring locations of mines and UXOs/UXBs. Given that the implementation
supports real-time geolocalization of objects in a video feed, potential applications also include counting
livestock and wildlife. Another envisioned use case is to track the flight route and speed of bird flocks
Latitude, longitude, altitude ✓
Device attitude in space ✓
Lens properties ✓
Camera orientation w.r.t. device ✓
for preventing collisions in ofshore wind parks.
      </p>
      <p>Until now, the dual-use character resulting from these diferent application areas is the reason why
the software has only been made public in demos, presentations, and videos but not yet published as
such. This paper summarizes the technical background of our work in geolocalization so far.</p>
      <sec id="sec-1-1">
        <title>1.1. Problem Definition</title>
        <p>Figure 1 illustrates the problem setting addressed in this paper. Our considered devices possess onboard
sensors to estimate the position and attitude in space. Geographic positions are commonly acquired
from global navigation satellite systems (GNSS), e.g., expressed as latitude and longitude, relative to a
global reference frame. This study adopts the WGS84 reference frame of the Global Positioning System
(GPS) because of its widespread use and the availability in our considered devices. Regarding altitude,
options include height above ground level, mean sea level, or height above ellipsoid (HAE). In our
research, we focus on HAE for calculations to stay consistent with WGS84.</p>
        <p>
          Attitude signifies the device’s orientation within a local reference frame. This is typically determined
by an inertial measurement unit (IMU) using magnetometers, accelerometers, and Earth’s gravitational
data. The north-east-down (NED) local reference frame is widely used in aerospace applications and also
employed in this work because of the drone context [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Additionally, the intrinsic camera parameters
are well known, encompassing optical lens attributes and the orientation of the camera axis to the
device body in the NED frame.
        </p>
        <p>
          The line of sight to an object within the device’s camera field of view can therefore be fully
characterized using a position vector, originating at the global device location (origin) and extending in the
specified direction. Consequently, computing the geolocation of an object boils down to estimating the
magnitude of this position vector, which is the distance between the device and the target object. Ideally,
this should be possible in real-time for practical use in the field. Various concepts for geolocalization
are presented and discussed in literature, however, no public prototypes suitable for our intended
use cases were available to us for verification. Commercial products, such as DJI PinPoint [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and
SmartCam3D [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], typically incorporate an active range finding apparatus (i.e., laser range finder) to
measure this distance. Laser range finders are typically found in higher-priced drone models, and due to
a limited financial budget, these commercial solutions were not considered for evaluation in this paper.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Research Questions</title>
        <p>To address the problem of geolocalization, we state the following research questions:
• How can geolocalization of an object within the field of view be implemented without resorting
to a laser range finder?
• What are the potential sources of error and what magnitude of error can be expected by such an
approach?</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Methodology</title>
        <p>
          Utilizing a design science methodology [7], we address the posed questions as follows. The analysis of
related work in Section 2 reveals that existing solutions for the problem can be categorized into two
main approaches: 1.) geometric methods aligning with the problem description as illustrated in Figure 1
and 2.) machine learning-based techniques that match camera perspectives to a georeferenced image
dataset. Considering that our motivation stems from humanitarian demining, potentially in conflict
zones, where environmental changes may render prerecorded image datasets obsolete (due to factors
such as vegetation growth), the machine learning approach becomes less applicable. Consequently,
this paper focuses on the geometric approach that can complement existing object detectors based on
neural networks such as TPH-YOLOv5 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          The reference frames and coordinate systems, necessary for the geolocalization algorithms, are
in line of the works of Johnston [8] and Koks [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and presented in Section 3. For the design of our
prototypes, we consider three use cases depending on the number of diferent positions and perspectives
the camera device (i.e., drone or smartphone) needs to take. All cases require a direct line of sight from
the camera to the object in question. The main contribution of this paper are three algorithms that have
been implemented in software and enable real-time geolocalization. The algorithms are explained in
detail in Section 4, and Section 5 summarizes prototype implementation details for the Parrot ANAFI
drone and Apple iPhone smartphone. The distance between true location and estimated location is
the fundamental metric for error in the proposed geolocalization setting. To evaluate our prototypes,
Section 6 discusses preliminary results and investigates the sources and magnitudes of error in the
respective implementations.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In recent years, several projects from varying application domains that combine machine learning
object detectors with geographic localization have been published. These include cattle counting [9],
vehicle tracking [10], and surveillance for situational awareness [11].</p>
      <p>The foundations for estimating an object’s geographic location in the field of view of an unmanned
aerial vehicle (UAV) were established by Johnston [8]. A local line-of-sight vector is transformed
into a global Cartesian reference frame using rotations, facilitating further computations. The author
argues that assuming the ground levels beneath the UAV and the target object are identical simplifies
computation but only holds true on relatively flat terrain. Otherwise, estimation error occurs in uneven
landscapes [10, 11]. Johnston [8] furthermore emphasizes that a digital elevation model (DEM) can
significantly reduce the estimation error when the object is on ground level. This concept underpins in
the ray marching algorithm presented in Section 4.</p>
      <p>For high precision, errors in UAV sensors must be minimized as they directly impact geolocalization
accuracy. Therefore, filtering of UAV sensor data, as studied by Barber et al. [ 12], is essential. Our
prototypes target the Parrot ANAFI drone and a recent Apple iPhone, which already incorporate
state-of-the-art filtering techniques [13, 14]; this aspect was not further explored in our work.</p>
      <p>Visual object geolocalization represents a diferent approach to solving localization problems.
According to Wilson et al. [15], identifying an object’s location can be reformulated as either a regression or
classification problem, addressable by machine learning and deep learning models, requiring a dataset
of georeferenced images. Due to this data dependency, visual geolocalization is not considered in this
study.</p>
      <p>Visual Simultaneous Localization and Mapping (SLAM) and Structure from Motion (SfM) are
techniques for three-dimensional environment estimation, particularly relevant in robotics and augmented
reality applications. For further insights into these methods, we refer to the survey by Saputra, Markham,
and Trigoni [16]. Although geolocalization is not directly mentioned, its connection is obvious; the ray
intersection and gradient descent algorithms in Section 4 for estimating geopositions of objects, not
necessarily on ground level, are inspired by triangulation in motion segmentation [16].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Reference Frames and Coordinate Systems</title>
      <p>
        Regarding reference frames, the presented work builds upon the results published by Koks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We
assume that our considered devices, e.g., drones, have GNSS capabilities to sense the global position
in terms of latitude (), longitude ( ), and HAE altitude (ℎ) in the global reference frame WGS84 and
we expect a geolocalization algorithm to return its findings in the same format. Earth is modelled by
a reference ellipsoid in WGS84, as shown in Figure 2a. Computations prefer a metric space, and we
therefore resort to the Earth-centered, Earth-fixed (ECEF) coordinate system. The origin is situated in the
center of the reference ellipsoid,  extends the rotational axis, the  axis passes through the equator
and prime meridian ( =  = 0), and  is orthogonal to  and . The algorithms for transformation
between a geodetic point (, , ℎ ) in WGS84 and a geocentric point (, , ) are well established and
suficiently accurate [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Points near Earth’s surface entail large numbers in ECEF, and we therefore
resort to the constants and optimizations developed by Osen [17] to overcome accuracy issues with
lfoating-point numbers.
      </p>
      <p>
        Note that if we express north in the ECEF coordinate system, the direction of the vector is diferent
for every point on Earth. Attitude sensors in our considered devices measure the orientation relative to
a local reference frame, NED in our case, also including the line-of-sight vectors. The geolocalization
algorithms in Section 4 require a transformation of local vectors into the global coordinate system,
and according to Koks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], this can be achieved by constructing a NED basis in the ECEF coordinate
system. A NED basis {⃗, ⃗, ⃗} is defined by three ECEF vectors, representing north, east, and down, as
shown in Figure 2a. Intuitively, vectors ⃗ and ⃗ span a plane, tangential to the ellipsoid, at a particular
latitude and longitude. Suppose, a device is at a particular   and . First, the NED basis {⃗, ⃗, ⃗}
is defined at  =  = 0 by setting ⃗ = [
        <xref ref-type="bibr" rid="ref1">0, 0, 1</xref>
        ] parallel to the  axis and ⃗ = [
        <xref ref-type="bibr" rid="ref1">0, 1, 0</xref>
        ] parallel to the
 axis. Vector ⃗ is then rotated around ⃗ by   degrees, ⃗ is rotated around then new ⃗ by − 
degrees, and ⃗ is finally the cross product of rotated vectors ⃗ × ⃗. This basis is diferent for every point
on Earth, but due to Earth’s size, the changes of the basis are negligible within the flying range of a
commercial drone.
      </p>
      <p>We also need a model for the camera. In our work, a simple pinhole camera model, depicted in
Figure 2b, represents the viewport of a device and has a local camera reference frame, where the optical
axis is , axis  is along the width of the viewport, and  is in downward direction of the
viewport height. By directing  downward, the reference frame stays compatible with NED in later
steps. The considered intrinsic lens parameters such as horizontal and vertical field of view ( hfov, vfov)
One measurement?</p>
      <p>No</p>
      <p>No, multiple measurements
New measurement is added
to measurement set</p>
      <p>Two measurements?
Yes</p>
      <p>Identical origin or
parallel vectors?</p>
      <p>No
Yes
Yes</p>
      <p>Last measurement has identical
origin or parallel vectors in set?</p>
      <p>No
Yes
Ray Marching</p>
      <p>Delete last measurement
and fallback</p>
      <p>Ray Intersection</p>
      <p>Delete last measurement
and fallback</p>
      <p>Gradient Descent
are known, and a virtual projection plane spans in front of the origin representing the current image.
There are two use cases for a camera model: A geographic point in the world within the camera’s field
of view is mapped to a point/pixel in the image for rendering; and selecting a point/pixel at image
coordinates (, ) is considered a geolocalization measurement and translated to a line-of-sight unit
vector based on hfov and vfov.</p>
      <p>Finally, the relation between optical axis and local reference frame needs to be established. The
orientation of a device in the local reference frame does not necessarily reflect the direction of the
optical axis. While the optical axis of a smartphone camera is static with respect to the device body,
the drone camera of our Parrot ANAFI is mounted on a gimbal that smooths out movements and
enables view directions independent from the drone body orientation. From device attitude and gimbal
sensor data, the combined direction of device and optical axis can be described as a single rotation
from north ⃗ to the camera axis . Suppose we have performed a measurement and selected a
point in the viewport’s image; the corresponding line-of-sight unit vector in the local camera frame is
then transformed into the local NED reference frame by aforementioned rotation and subsequently
transformed by change of basis into the global ECEF coordinate system. On the other hand, a known
point in ECEF is mapped to a point/pixel in the viewport by reversing this process.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Geolocalization Algorithms</title>
      <p>In the previous section, a measurement of an object was already introduced by selecting a point/pixel in
the viewport. More precisely, a measurement is a position vector (, ⃗), where  is the origin (location
of the device) and ⃗ is a unit vector in the direction of an object, both in the ECEF coordinate system. A
measurement set is then a set of measurements  = {(1, ⃗1), . . . , (, ⃗)}, all referring to the same
object from diferent device positions.</p>
      <p>An initial measurement creates a new measurement set, and the number of measurements in 
dictates the choice of a suitable geolocalization algorithm. The selection process is illustrated in Figure 3.
A single measurement is only suitable for ground objects and necessitates a digital elevation model of
the terrain. Adding additional measurements is only considered when the origin is unique and ⃗ is not
parallel to existing measurements. There are two cases when parallel measurements can occur: 1.) the
same measurement is recorded twice by accident or 2.) the unlikely case, where sensor disturbances
cause parallel vectors despite diferent origins. For two measurements, the geolocation can be computed
by solving a linear equation system of two skew rays. More than two measurements overdetermines this
system and we resort to gradient descent. The iterative refinement of a solution using gradient descent
simplifies the computational process in our application scenario significantly because a measurement
set is considered dynamic, where new measurements are added and old measurements can expire over
time. Instead of repeatedly solving a closed form when the measurement set changes, gradient descent
only needs a few iterations to update the previous solution.</p>
      <sec id="sec-4-1">
        <title>4.1. One Measurement: Ray Marching</title>
        <p>Ray marching extends the line-of-sight unit vector until it intersects with an elevation model of the
terrain. A digital elevation model (DEM) is typically generated from airborne laser scanning and
captures terrain altitude data. Thanks to services like COPERNICUS [18], this data is accessible for
research. The data resolution in our experiments was 10m and 25m for the Austrian region. A DEM is
typically provided as GeoTIFF [19] and can be queried with tools such as GDAL [20]; for a particular
latitude and longitude, an interpolated altitude is returned. Algorithm 1 sketches the loop that extends
the unit vector until the altitude of a computed point matches the DEM. The algorithm requires bounds
for the acceptable error and maximum distance. The unit vector is extended by a static step size as long
as the computed point stays above the DEM surface. When the computed point is beneath the DEM
surface, the step is not taken and the step size is halved for the next iteration to achieve a similar efect
as in binary search algorithms. Setting the step size too large could lead to false results when the ray,
e.g., passes through a hill in a single step and continues marching. As a heuristic, choosing a step size
smaller than the DEM resolution is a good starting point, e.g., 24 for 25m resolution. This algorithm was
reliable in experiments but has limitations: the object needs to be on ground level, up-to-date DEM data
must be available, and localization error increases in shallow perspectives, i.e., when height diference
between device and object is small. The increased error sensitivity due to shallow perspectives has
already been described by Barber et al. [12].</p>
        <p>Algorithm 1: Ray marching algorithm</p>
        <p>Data: Measurement (, ⃗), where  is above ground level and ⃗ is a unit vector; GDAL interface
to a DEM; accuracy target and maximum distance
Result: ECEF Geolocation  and distance 
 ← 0 and  ← 24;
do
⃗ ← ⃗ · ;
 ←  + ⃗;
ℎ is gathered from transforming  into WGS84 reference frame;
ℎ is gathered from querying the DEM with current estimated point ;
 ← ℎ − ℎ;
if  &gt; 0 (computed point is still above terrain) then</p>
        <p>←  + ;
else</p>
        <p>← /2;
end
while () &gt; accuracy target and |⃗| &lt; maximum distance;</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Two Measurements: Ray Intersection</title>
        <p>For cases when an object is not on ground level, the geolocation can be estimated by two measurements
from diferent positions, i.e., triangulation, a well-known concept in geometry [ 16]. A challenge in our
setting in three-dimensional space is that two rays are extremely unlikely to intersect; two rays are
typically skew to each other. Intuitively, the point in question would be in the middle, where both rays
are nearest. More formally, we solve for a line, orthogonal to both rays, and we take the middle of this
line as geolocalization estimate. Suppose, we have two measurements {(1, ⃗1), (2, ⃗2)} in the ECEF
coordinate system; we compute ⃗ = ⃗1 × ⃗2 orthogonal to both rays. By introducing scale factors  ,  ,
and  , the following linear equation holds:</p>
        <p>1 +  · ⃗1 +  · ⃗ = 2 +  · ⃗2.</p>
        <p>By rephrasing the equation such that the three unknown scale factors are on the left side, we end up
with a linear equation system that can be solved by standard Gauss-Jordan Elimination:
 · ⃗1 +  · ⃗ −  · ⃗2 = 2 − 1.</p>
        <p>The ray intersection algorithm is straight forward by solving the equation for the three unknowns
and returning an estimated ECEF geoposition :
 = 1 +  · ⃗1 +  · ⃗ .</p>
        <p>2</p>
        <p>This method can be applied when objects are not on ground level, e.g., on a rooftop or in the air,
but it requires two measurements and movement between two positions, ideally such that the rays are
orthogonal to each other.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Three or more Measurements: Gradient Descent</title>
        <p>Suppose a drone overpasses terrain and an object detector creates a stream of measurements of the
same object seen from diferent origins at diferent moments of time. As in the previous case, we
assume that the rays are skew to each other. The previous algorithm is not applicable in this case
because the equation system would be overdetermined by more than two measurements. By building
upon the results of Traa [21] we can rephrase geolocalization as a minimization problem for a set
of measurements, numerically solvable by gradient descent. First, we define the squared orthogonal
distance  between a position vector (, ⃗) and a point  in accordance to Traa [21, Eq. 8]:
(, ⃗; ) = ( − ) · (I − ⃗ · ⃗ ) · ( − ).</p>
        <p>Suppose that  = {(1, ⃗1), . . . , (, ⃗)} is a set of measurements. In line of Traa [21, Eq. 10], the
sum of squared distances, or  in machine-learning terms, of point  is then:
( ; ) =</p>
        <p>∑︁
(,⃗)∈
(, ⃗; ) =</p>
        <p>∑︁
(,⃗)∈
( − ) · (I − ⃗ · ⃗ ) · ( − ).</p>
        <p>Finding the best estimate for point  then becomes a minimization problem ˆ = argmin (, ).
This can be computed by gradient descent when the derivative of  with respect to  is known. The
partial derivative definition is in accordance to Traa [21, Eq. 12]:


=</p>
        <p>∑︁
(,⃗)∈</p>
        <p>− 2 · (I − ⃗ · ⃗ ) · ( − ).</p>
        <p>Algorithm 2 sketches the optimization loop that continues until the estimated point ˆ stabilizes. The
learning rate can be configured and 0.065 was found to converge suficiently fast in real time given the
magnitudes of values in the ECEF coordinate system. The inner for loop can be significantly optimized
by precomputing the 3x3 matrix (I − ⃗ · ⃗ ) for every measurement in  .</p>
        <p>To summarize, the gradient descent algorithm is applicable when multiple measurements of the same
object but from diferent origins are available, e.g., from an object detector during device movement.
Geolocalization is not restricted to ground level, however, the object must not move while measurements
are collected from a single device.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Implementations</title>
      <p>The concepts and ideas in this work have been implemented in one library and two applications.</p>
      <p>Algorithm 2: Gradient descent algorithm</p>
      <p>Data: Measurement set  = {(1, ⃗1), . . . , (, ⃗)}
Result: ECEF Geoposition ˆ
Learning rate  ← 0.065;
Current and previous estimate ˆ ← ˆ ←
do
Partial derivative  ← [0, 0, 0] ;
for (, ⃗) ∈  do</p>
      <p>←  − 2 · (I − ⃗ · ⃗ ) · ( − );
end
ˆ ← ˆ −  · ;
 ← ˆ − ˆ;
ˆ ← ˆ;
while Estimated point still changes suficiently
|| &gt; 10− 7;</p>
      <sec id="sec-5-1">
        <title>5.1. libgeo2pixel</title>
        <p>The reference frames, coordinate systems, and geolocalization algorithms are implemented in the
libgeo2pixel library. For continuously updating the state of the world using device telemetry data, a
function is provided. For simplifying computations, the NED basis for translating a local direction
into the global ECEF frame is only recomputed when the device has moved more than 2 kilometers
from its initial location. The library is written in the C language with portability and speed as core
goals. The code has been optimized, e.g., rotations are solely expressed in quaternion operations, so
it can be integrated in real-time applications. Furthermore, the library is dependency free with two
exceptions: the C standard library and, for accessing DEM information, an interface for the gdalinfo [20]
tool. Everything else is self contained for portability. Consequently, in our implementations,</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. PDrAW for Parrot ANAFI (video-analyzer)</title>
        <p>The Parrot ANAFI drone was chosen for experiments because of its software design and open-source
policy. Telemetry data is already processed by a Kalman filter onboard and embedded as metadata track
in recorded video streams and live video feeds [22]. Parrot developers also provide an open-source
command line tool for video viewing named PDrAW [23] that seamlessly integrates the live video
stream on a computer through the drone controller. For integrating libgeo2pixel, we created a fork
of PDrAW, named video-analyzer, with a reduced feature set but enriched user interface, i.e., mouse
clicks for taking or deleting measurements and augmented reality overlays. For example, DEM data
can be visualized as grid overlay in the live view as shown in Figure 4. The grid overlay has a practical
purpose in error correction. Early on, experiments had shown that magnetic disturbances are the
most significant error source in the drone model by introducing an angular deviation between the
sensed north axis and the local NED reference frame. The video-analyzer app integrates a simple
form of error correction by allowing manual alignment between DEM and true terrain in terms of an
error-compensating quaternion that is considered in libgeo2pixel computations.</p>
        <p>Measurements in video-analyzer are always annotated by a number representing the line-of-sight
distance in meters to the respective point, similar to a range finder. This can be seen in the screenshots
in Figures 8a and 8b. Video-analyzer also enables automatic import and export of geographic points
via CSV files. Figure 8c is a screenshot of the mapping software QGIS [ 24] that reads the exported
measurements and displays them on a map. Our fork of PDrAW is implemented in C/C++ with a custom
cmake build system, and it can be compiled on Linux, MacOS, and Windows WSL.</p>
        <p>Furthermore, video-analyzer has been ported to the Nvidia Jetson Nano platform that runs an
embedded version of Linux. This platform is energy eficient and has artificial intelligence capabilities
that we used for video decoding. Integrating and accelerating object detectors on this platform is
planned for future work. Figure 5 shows a prototypical setup using the battery-powered Jetson platform
in field use, where the live feed of the drone, including its telemetry data, is captured and visualized by
video-analyzer instead of the smartphone controller app.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Apple iPhone 14 (GeoMatApp)</title>
        <p>The proof-of-concept implementation of a smartphone app was carried out with the goal of giving
a wide audience the ability to capture, identify, and geolocalize objects from a safe distance. We see
potential use case scenarios in natural disaster management, in the search for missing persons, and for
farmers working in areas, potentially contaminated by reminiscences of war, like in former Yugoslavia
or currently in Eastern Ukraine. The app itself was implemented in Swift 5.10, compiled for iOS 17, and
tested on an iPhone 14 Plus.</p>
        <p>The app consists mainly of three views: a CameraView for taking pictures and a MapView, as shown
in Figure 6a, for displaying the geolocations of images origins on a map using camera icons. By selecting
a camera icon on the map, an ImageView, shown in Figure 6b, is opened that displays the image taken
at that location along with the data captured during photography, which are primarily GPS data as well
as roll, pitch, and yaw of the device. Libgeo2pixel was integrated as an external service.</p>
        <p>Unlike drones, the acquisition of sensor data does not happen automatically in one shot along with
(a) The camera icon indicates the location where a</p>
        <p>picture was taken
(b) The captured image with the device Euler angles
and the GNSS data when the image was taken
the query of image sensor data, i.e., there is no global shutter on all sensor readings. Initially, this led
to significant inaccuracies when calculating the geolocation of objects identified in the image later
on. The cause of this was that even small movements of the camera in the time window between the
query of the image sensor, GPS sensor, and camera orientation via DeviceMotionManager resulted
in massive inaccuracies in the computed geolocation for distant objects. A solution for achieving
acceptable results involved setting the DeviceMotionUpdateInterval of the DeviceMotionManager to
0.01 seconds. Another notable error source was the diference between the iPhone local reference frame
and our NED frame. The CMAttitudeReferenceFrame of the DeviceMotionManager needs to be set to
xTrueNorthZVertical (the Z axis is vertical and the X axis points to the geographic north pole) [25] and
an additional transformation needs to be performed in libgeo2pixel before geolocalization calculations.</p>
        <p>Despite successful functional tests and initial experiences, the GeoMatApp was still considered a
proof of concept and not yet ready for testing as discussed in the next section.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Preliminary Evaluation and Discussion</title>
      <p>The video-analyzer prototype was evaluated in the years 2022–2023 by conducting drone flights with
two diferent Parrot ANAFI drones in the greater area of Upper Austria and Tirol. Figures 7 and 8 show
(a) Manual clicks on horse positions in an overflight
(b) Corresponding QGIS live view on Google Maps
screenshots for demonstrating functional tests of the video-analyzer app. Figure 7a is a video still of
an overflight recording, where each horse was manually clicked once for estimating the geolocation
by ray marching. The corresponding live view in QGIS in Figure 7b was updated accordingly in real
time. Figures 8a and 8b are video stills from a ray intersection measurement by two clicks on a small
reference object. Figure 8c is the according QGIS live view.</p>
      <p>
        As mentioned in our previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we randomly chose landmarks that are identifiable on maps
for geolocalization and estimated the error in meters between computed and true location. As ground
truth we used the oficial maps provided by DORIS [26]. Our findings can be summarized as follows:
• We distinguish between translational and rotational errors. Translational errors are the
consequence of dilated precision in GNSS. The estimated geolocation of an object inherits the horizontal
and vertical dilution of precision of the drone or smartphone independent from the distance to
the object. The state-of-the-art approach to increase GNSS precision is real-time kinematics (RTK)
which is not available for our devices.
• Rotational errors with respect to the local NED reference frame originate from deviations in IMU
and gimbal sensor readings. As already recognized by Nuske et al. [27], the biggest source of
error is deviation in magnetic heading. We observed errors up to 10 degrees with respect to true
north. The estimation error unfortunately grows linearly with increasing line-of-sight distance
between device and object when a rotational error is present. Switching to a diferent drone
model that utilizes a diferential GNSS with multiple antennas for sensing the attitude would
reduce rotational errors.
• We integrated a simple manual error correction method by aligning the DEM grid overlay with
the visible terrain. An error-correcting quaternion captures the manual adjustments. Without
error correction, the geolocalization error was up to 10% with respect to the line-of-sight distance,
e.g., up to 1m at 10m distance. Aligning the DEM manually reduced the error to 1-2% with respect
to the line-of-sight distance.
• For drones, ray marching was surprisingly accurate in open terrain, but unreliable next to or in
forests. Analyzing the DEM data highlighted that laser airborne scanning in principle aims for
capturing the ground level, however, dense trees can introduce height deviations that directly
translate to bad estimations. The worst observed case in experiments was at a low flight altitude
next to a forest such that the DEM elevation was greater than the drone altitude. Ray marching
failed in this case.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>To answer our research questions, this paper shows that under certain conditions it is possible to
geolocalize an object within the field of view of a camera given that GNSS and attitude sensors are
suficiently accurate and intrinsic camera parameters are accounted for. We have conceptualized three
geolocalization algorithms and implemented them in a library and two software prototypes, for the
Parrot ANAFI drone and Apple iPhone 14 respectively, to gain insight. A preliminary evaluation has
indicated that magnetic heading was the primary source of error in experiments and we therefore
integrated a simple error correction method in video-analyzer by manually aligning a DEM grid overlay
with the terrain.</p>
      <p>There are several aspects that can be improved in future work. Automating the manual error correction
would be the top priority. Also, integrating object detectors and object trackers for automating the
gradient descent algorithm for geolocalization during flight would be a further step. We currently
investigate camera calibration methods because the intrinsic model in our prototypes only considers
hfov and vfov in the simplified pinhole model. So far, this was not an issue because our drone and
smartphone perform various image enhancement procedures onboard but for increased accuracy the
optical lens properties must be considered. Another research direction that emerged from the gradient
descent algorithm is to synchronize distributed cameras, so a measurement becomes instant. An
application that was already discussed in this context is to track the flight route and altitude of bird
lfocks near ofshore wind parks.</p>
      <p>Releasing the presented tools under open-source license is not planned yet, but currently under
discussion. Given the roots of the project in humanitarian and military demining, the dual-use character
is non-repudiable and more research is needed before making the source code accessible.
[7] A. R. Hevner, S. T. March, J. Park, S. Ram, Design science in information systems research, MIS Q.</p>
      <p>28 (2004) 75—-105.
[8] M. G. Johnston, Ground object geo-location using UAV video camera, in: 2006 IEEE/AIAA 25TH
Digital Avionics Systems Conference, Portland, OR, USA, 2006, pp. 1–7. doi:10.1109/DASC.2006.
313770.
[9] V. Soares, M. Ponti, R. Gonçalves, R. Campello, Cattle counting in the wild with geolocated
aerial images in large pasture areas, Computers and Electronics in Agriculture 189 (2021) 106354.
doi:10.1016/j.compag.2021.106354.
[10] X. Zhao, F. Pu, Z. Wang, H. Chen, Z. Xu, Detection, tracking, and geolocation of moving vehicle
from UAV using monocular camera, IEEE Access 7 (2019) 101160–101170. doi:10.1109/ACCESS.
2019.2929760.
[11] R. Geraldes, A. Gonçalves, T. Lai, M. Villerabel, W. Deng, A. Salta, K. Nakayama, Y. Matsuo,
H. Prendinger, UAV-based situational awareness system using deep learning, IEEE Access 7 (2019)
122583–122594. doi:10.1109/ACCESS.2019.2938249.
[12] D. B. Barber, J. D. Redding, T. W. McLain, R. W. Beard, C. N. Taylor, Vision-based target geo-location
using a fixed-wing miniature air vehicle, Journal of Intelligent and Robotic Systems: Theory and
Applications 47 (2006) 361–382. doi:10.1007/s10846-006-9088-7.
[13] Parrot, Parrot ANAFI Technical Specifications, https://www.parrot.com/assets/s3fs-public/2020-07/
white-paper_anafi-v1.4-en.pdf, 2020. Last access: 2024-07-25.
[14] Apple Developer, Core Motion, https://developer.apple.com/documentation/coremotion, 2024.</p>
      <p>Last access: 2024-07-25.
[15] D. Wilson, X. Zhang, W. Sultani, S. Wshah, Visual and object geo-localization: A comprehensive
survey, CoRR abs/2112.15202 (2021). arXiv:2112.15202.
[16] M. R. U. Saputra, A. Markham, N. Trigoni, Visual slam and structure from motion in dynamic
environments: A survey, ACM Comput. Surv. 51 (2018). doi:10.1145/3177853.
[17] K. Osen, Accurate Conversion of Earth-Fixed Earth-Centered Coordinates to Geodetic Coordinates,
Technical Report, Norwegian University of Science and Technology, 2017. URL: https://hal.science/
hal-01704943v2, last access: 2024-07-22.
[18] The European Space Agency, Copernicus DEM - Global and European Digital Elevation Model
(COP-DEM), https://spacedata.copernicus.eu/collections/copernicus-digital-elevation-model, 2022.</p>
      <p>Last access: 2024-07-24.
[19] E. Devys, T. Habermann, C. Heazel, R. Lott, E. Rouault, OGC GeoTIFF Standard, https://docs.ogc.</p>
      <p>org/is/19-008r4/19-008r4.html, 2019. Last access: 2024-07-26.
[20] GDAL, GDAL documentation, https://gdal.org, 2024. Last access: 2024-07-21.
[21] J. Traa, Least-Squares Intersection of Lines, Technical Report, University of Illinois at
UrbanaChampaign, 2013. URL: https://silo.tips/download/least-squares-intersection-of-lines, last access:
2024-07-21.
[22] Parrot Developers, libvideo-metadata - Parrot Drones video metadata library, https://github.com/</p>
      <p>Parrot-Developers/libvideo-metadata, 2024. Last access: 2024-07-21.
[23] Parrot Developers, PDrAW - Parrot Drones Awesome Video Viewer, https://github.com/</p>
      <p>Parrot-Developers/pdraw, 2024. Last access: 2024-07-21.
[24] QGIS, QGIS documentation, https://www.qgis.org/resources/hub/, 2024. Last access: 2024-07-26.
[25] Apple Developer, xTrueNorthZVertical, https://developer.apple.com/documentation/coremotion/
cmattitudereferenceframe/1616019-xtruenorthzvertical, 2024. Last access: 2024-07-26.
[26] Digitales Oberösterreichisches Raum-Informations-System, DORIS, https://www.doris.at/, 2024.</p>
      <p>Last access: 2024-07-25.
[27] S. Nuske, M. Dille, B. Grocholsky, S. Singh, Representing substantial heading uncertainty for
accurate geolocation by small UAVs, in: AIAA Guidance, Navigation, and Control Conference,
Toronto, Ontario, Canada, 2010, pp. 1–13. doi:10.2514/6.2010-7886.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lampesberger</surname>
          </string-name>
          , E. Hermann,
          <article-title>Real-time geo-localization of unexploded ordnance</article-title>
          ,
          <source>in: 19th International Symposium Mine Action</source>
          <year>2023</year>
          , Vodice, Croatia,
          <year>2023</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Farhadi,</surname>
          </string-name>
          <article-title>Yolov3: An incremental improvement</article-title>
          , https://pjreddie.com/darknet/yolo/,
          <year>2018</year>
          . Last access:
          <fpage>2024</fpage>
          -07-25.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , TPH-YOLOv5:
          <article-title>Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (ICCV) Workshops, Montreal, BC, Canada,
          <year>2021</year>
          , pp.
          <fpage>2778</fpage>
          -
          <lpage>2788</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCVW54120.
          <year>2021</year>
          .
          <volume>00312</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Koks</surname>
          </string-name>
          ,
          <article-title>Using Rotations to Build Aerospace Coordinate Systems</article-title>
          ,
          <source>Technical Report DSTO-TN-0640, Electronic Warfare and Radar Division Systems Sciences Laboratory</source>
          ,
          <year>2008</year>
          . URL: https://apps.dtic.mil/sti/pdfs/ADA484864.pdf, last access:
          <fpage>2024</fpage>
          -07-21.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>DJI</surname>
          </string-name>
          , Zenmuse H20 Series, https://www.dji.com/at/zenmuse-h20-series,
          <year>2024</year>
          . Last access:
          <fpage>2024</fpage>
          -07- 21.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Rapid</given-names>
            <surname>Imaging</surname>
          </string-name>
          , SmartCam3D, https://www.rapidimagingtech.com/smartcam3d/,
          <year>2024</year>
          . Last access:
          <fpage>2024</fpage>
          -07-21.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>