Extended Abstract on: Minimal Structure from
Motion Representation for Image Geocoding

Nuno Amorim and Jorge Gustavo Rocha


Abstract In this extended abstract we present our early work on structure from mo-
tion data compression for image geocoding. We address the advantages of image
geocoding over standard trillateration solutions and identify desired characteristics
for an image geocoding system such as accuracy, speed and scalability. We hy-
pothesize that scalability impacts both speed and accuracy and should be further
researched. Hence, in this thesis we would like to know which is the minimal repre-
sentation of structure from motion to efficiently compute the location and orientation
of new photographs.


1 Introduction

The recent explosion of images on social media has lead Computer Vision re-
searchers to an increased interest on image processing algorithms. From those,
structure from motion (SFM) has gained an increased relevance due to its appli-
cations. With a couple of photographs from the same scene, this algorithm is able to
retrieve the 3D structure (point clouds) by processing the motion from photograph
to photograph as stereo vision. Providing the correct focal lengths and distortion
parameters, this algorithm can achieve an impressing precision on building point

Nuno Amorim
Algoritmi Research Centre, University of Minho, 4710-057 Braga, Portugal, e-mail:
ntma90@gmail.com
Jorge Gustavo Rocha
Algoritmi Research Centre, University of Minho, 4710-057 Braga, Portugal, e-mail:
jgr@di.uminho.pt
 Copyright (c) by the paper’s authors. Copying permitted for private and academic purposes. In:
A. Comber, B. Bucher, S. Ivanovic (eds.): Proceedings of the 3rd AGILE Phd School Champs sur
Marne, France, 15-17-September-2015, published at http://ceur.ws.org.


                                                                                             1
2                                                 Nuno Amorim and Jorge Gustavo Rocha

clouds. Moreover, new photographs can be added at any time, allowing the ability
to update models to this ever changing world.
   The application of SFM can vary from simple visualization purposes to more
complex tasks such 3D modeling and geographical location recognition. On the
later, photographs with unknown location are compared with a geocoded database
to compute their GPS coordinates.


2 State of Art

Early work on image geocoding started with methods based on a database of
geocoded photographs [1, 2, 3]. Image features are extracted from new photographs,
and feature matching is performed to retrieve similar photographs from the database.
Two-view geometry is then executed to compute the pose of the query photographs.
   Since two-view geometry is often computationally expensive as it usually relies
in RANSAC based methods to compute the relative pose of images, there was an
increased interest on using structure from motion to support the geocoding process.
Image features are extracted from new photographs, but are directly compared to 3D
point clouds. Rather than performing two-view geometry, it is computed a projection
matrix which validates 2D (query photograph) to 3D points (database point clouds).
Techniques such as nearest-neighbor feature matching [4] and data structures such
as vocabulary trees [5, 6, 7] are often used on state of art work to greatly reduce the
amount of matches needed to perform. Faster computational times can be achieved
by resorting to parallel processing on CPU and GPU units as shown in [6] which
achieved real time image geocoding with a GPU implementation of a vocabulary
tree, if new photographs successfully matched the first document retrieved from
their vocabulary tree queries.


3 Motivation

The advantages of image geocoding are clear when compared to other geocoding
systems. First of all image geocoding does not rely on a trilateration process, which
means it can compute coordinates on indoor environments as long as it has a WiFi
connection to issue the geocoding request. Also, image geocoding has access to
the heading of the queried photographs allowing the calculus of the direction in
which the photographs were taken, in a single query. This reinforce the utility of
image geocoding on supporting guiding systems. Lastly, the only (and ideal) device
required for image geocoding is a smartphone, since it contains both camera and
WiFi connection. As they are now omnipresent in our society, the cost for deploying
an image geocoding solution is greatly decreased.
Extended Abstract on: Minimal SfM Representation for Image Geocoding                3

4 Problem to be Solved

In order to image geocoding replace standard trilateration solutions, three charac-
teristics are desired: accuracy, speed and scalability. Starting from accuracy, a good
pose estimation is attained when there is related data within the database to the
queried photograph. Feature matching is used to ascertain which data to use when
computing the pose. Assuming that we are facing the best case scenario and the fo-
cal length and distortion parameters are known, then an impressive precision can be
achieved with a single photograph.
    Speed is defined by how fast can we find the correct database data to geocode the
queried photograph. Image processing requires expensive matrix operations as every
image pixel is relevant to compute visual features. Additionally, high resolution
images deliver better image features but also increase the computational time on
extracting and matching those. However, the constant evolution of hardware and the
parallelization of image processing algorithms is progressively breaking the barrier
of real time processing.
    Being scalable means that neither speed and accuracy are hindered with the
growth of the geocoded database, which is not quite the case. Assuming that each
SFM model contains millions of points, and each point is related to at least two
image descriptors and associated 2D data, a massive amount of visual data is re-
quired to support an image geocoding system. Consequently, querying the geocoded
database gets slower. Besides, more information means having an higher amount of
similar features, which may confuse the image geocoding system into geocoding
photographs miles way from their correct location.


5 Research Question and Future Work

Facing the scalability problem described in the previous section, in this thesis we
question which is the minimal scene representation of structure from motion to al-
low a good geocoding rate.
    Our main objective will be study and benchmark different state of art SFM based
geocoding systems, to enhance existing SFM compression strategies or to develop
alternative compression methods. We want compression rates able to maintain the
geocoding speed and rate, while allowing the scalability of the geocoding system to
wider areas. Also, rather than delivering a perfect 100% geocoding rate, we are only
interested in avoiding hindering this rate due to aggressive compression.
    We are aware that there is state of art research concerning SFM compression
[6, 4, 8], but rather than focusing the compression into a single geocoding engine, we
will generalize it to currently available engines. Since all image geocoding methods
work under the same assumptions (image features and 3D pose estimation), we
believe that the generalization is achievable.
4                                                         Nuno Amorim and Jorge Gustavo Rocha

References

    1. Robertson D, Cipolla R (2004) An Image-Based System for Urban Navigation. Review Lit-
       erature And Arts Of The Americas, doi: 10.5244/C.18.84
    2. Werner M, Kessel M, Marouane C (2011) Indoor positioning using smartphone camera, In-
       ternational Conference on Indoor Positioning and Indoor, doi: 10.1109/IPIN.2011.6071954
    3. Wang, E, Yan W (2013) iNavigation: an image based indoor navigation system, Multimedia
       Tools and Applications, doi: 10.1007/s11042-013-1656-9
    4. Li Y, Snavely N, Huttenlocher DP (2010) Location Recognition using Prioritized Feature
       Matching. ECCV’10 Proceedings of the 11th European conference on Computer vision, doi:
       10.1007/978-3-642-15552-9 57
    5. Schindler G, Brown M, Szeliski R (2007) City-Scale Location Recognition. Computer Vision
       and Pattern Recognition, CVPR 07. IEEE Conference, doi: 10.1109/CVPR.2007. 383150
    6. Irschara A, Zach C, Frahm J, Bischof H (2009) From structure-from-motion point clouds to
       fast location recognition. IEEE Conference on Computer Vision and Pattern Recognition, doi:
       10.1109/CVPR.2009.5206587
    7. Huitl R, Schroth G, Hilsenbeck S, Schweiger F, Steinbach E (2012) TUMindoor: An exten-
       sive image and point cloud dataset for visual indoor localization and mapping. 19th IEEE
       International Conference on Image Processing, doi: 10.1109/ICIP.2012.6467224
    8. Cao S, Snavely N (2014) Minimal Scene Descriptions from Structure from Motion
       Models. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), doi:
       10.1109/CVPR.2014.66