<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Semantics of Context: The Role of Interpretation and Belief in Visual Localization for Robots</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Stephanie Lowry Centre for Applied Autonomous Sensor Systems O</institution>
        </aff>
      </contrib-group>
      <fpage>22</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>While visual localization has improved in performance dramatically in recent years due to the development of ever-improving robust representations of locations, this paper considers a different aspect of the problem - the belief generation process. Belief generation is the conversion of a measure of similarity between two location representations to a measure of sameness - that is, are these two representations captured at the same location - which can be affected by the level of perceptual aliasing within the environment as well as the level of perceptual change. While probabilistic formulations of visual localization address these issues, environmental context can critically affect the performance of these belief generation methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>know exactly where you are. Even if long stretches of the road appear superficially similar, you need to reject
the possibility that they are actually the same place.</p>
      <p>The second scenario is perceptual change: places in the environment do not look the same as they did on
previous occasions. Perceptual change is particularly challenging when it happens uniformly over a region –
for example, if day turns to night or snow falls – as all places in that region will become difficult to recognize.
However, the spatial relationship between places does not change, so as long some similarity in appearance
remains, a weak location hypothesis can be formed and by observing multiple nearby locations and the spatial
relationship between them, a system can gradually build up confidence in its location belief.</p>
      <p>There is an inherent conflict between resolving perceptual aliasing and perceptual change: perceptual aliasing
requires adhering to a strict matching strategy where places must be both highly similar and highly distinctive,
while perceptual change requires a permissive matching strategy where places may be matched together even
when they do not appear similar at all. Thus an important consideration for a localization system is context – is
the system in a situation when it should demand highly rigorous matching expectations or is a more permissive
strategy necessary?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Belief Generation</title>
      <p>As discussed in [LSN+16], there are a number of methods for determining whether two location representations
were captured at the same location, such as voting methods [SBS07] and – when techniques inspired by
textbased document analysis were used – the term frequency–inverse document frequency (TF-IDF) was used [NG12]
to measure the mutual information between representations. A probabilistic formulation was also often used.
Using a probabilistic framework has advantages: it provides a mechanism for managing uncertainty introduced
from various sources and it naturally outputs a measure that expresses the degree of confidence in the current
location belief. One well-known probabilistic framework for visual localization is FAB-MAP [CN08]. FAB-MAP
uses a Na¨ıve Bayes or a Chow-Liu [CL68] approximation to simplify the complex joint probability between the
visual words in its model, and introduces a hidden variable to reduce the probabilities to quantities that can be
calculated from training data.</p>
      <p>An important aspect of location matching is geometric consistency – elements within the environment should
stay in the same physical position relative to each other. Geometric verification tests can eliminate false positive
matches using spectral clustering [Ols09] or RANSAC [FB81]. Furthermore, not only will elements within a
location remain geometrically consistent, but the spatial relationship of the locations themselves will remain
constant. These spatial relationships can be integrated into the localization belief probabilistically [BHK12]
or via other methods such as network flows [NSBS14]. The spatial relationship between locations is extremely
important information – in an extreme case, impressive localization results can still be achieved when appearance
information is ignored and only odometry information is used [BGU16].</p>
      <p>A number of trade-offs between competing priorities have been observed. A localization system becomes
increasingly dependent on spatial information when there is extreme perceptual change [MW12, NSBS14, HB14],
thus increasing sensitivity to motion uncertainty and reducing viewpoint flexibility. It has also been shown
that methods that perform more effectively in perceptually changing environments do not provide as accurate
localization [SMT+18]. These trade-offs suggest a choice must be made as to the requirements of the application
and the operating environment.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Probabilistic Belief Generation</title>
      <p>This section presents the formalism behind a probabilistic formulation of visual localization. Using a probabilistic
framework has many advantages: it provides a mechanism for managing uncertainty introduced from various
sources and naturally outputs a measure that expresses the degree of confidence in the current location belief.
Probabilistic localization also naturally integrates some spatial environmental context; if there is a great deal of
perceptual aliasing in the environment, the system’s confidence in its location belief will be low.</p>
      <p>The probabilistic framework is applied recursively over time as the system moves through the world: the
prior belief is updated based on the system’s motion model. The uncertainty in the system’s location belief is
continuously increased by the error in the motion model, and would grow in an unbounded manner if it were
not constrained and corrected by the external observations of the world.</p>
      <p>Formally, visual localization is probabilistically defined as follows (using the same notation as [CN08]): at time
step k, the system has made a series of observations Zk = {Z0, Z1, . . . , Zk}, and has previously visited locations
{L0, L1, . . . , Lk−1}. The likelihood of the system being in location Li at time k given the current observation Zk
is
p(Li | Zk) =
p(Zk | Li, Zk−1)p(Li | Zk−1)
p(Zk | Zk−1)
.</p>
      <p>(1)</p>
      <p>The second term in the numerator p(Li | Zk−1) is the location prior : the system’s prior belief about the
location before making the current observation Zk. It allows the localization to be updated recursively over
multiple timesteps: at time step k + 1 the output of Equation 1 becomes the new location prior and Equation 1
can be applied again.</p>
      <p>The other two terms are the observation likelihood models. The first term in the numerator p(Zk |
Li, Zk−1) is the likelihood that the robot would make observation Zk if it is indeed at location Li, and the
denominator p(Zk | Zk−1) is a normalizing factor that determines the likelihood that the robot would make the
observation Zk anywhere within the environment, thereby introducing further spatial context into the localization
calculation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Observation Likelihood Models</title>
      <p>The observation likelihood models are the mechanisms by which context is introduced into the localization
calculation. The system’s observations naturally have a degree of uncertainty and have to be interpreted within
the context of the environment: for example, should the system have a strict or a permissive matching strategy?
Furthermore, the probabilistic framework does not implicitly ensure temporal environmental context (that is,
perceptual change) is included.</p>
      <p>The observation likelihood models themselves need to be learned or at least embody some data-driven
assumptions about the environment. For example, FAB-MAP learns its likelihood models from training data
[CN08]. Many visual localization systems employ learning techniques to improve the performance of its location
representations, including using state-of-the-art deep learning to learn about appearance change [LGMR18] or
viewpoint change [GSM18], and learning about the observation likelihoods is closely related to the chosen image
representation.</p>
      <p>The performance of the observation likelihood model can be assessed independently of the performance of the
image representations. The likelihood models depend on prior belief about the likelihood of appearances which
naturally vary both due to the environment itself and to the conditions under which the environment is observed.</p>
      <p>The performance of a system can depend critically on the correctness of the likelihood model. In [Low14],
a probabilistic visual localization system was evaluated on the St Lucia dataset [GMMW10]. Two observation
likelihood models (M 1 and M 2) were trained on the data – M 1 was trained using the data from the same time
of day as the test data and M 2 was trained using data from a different time of day. The performance of the
system increased from correctly localizing in 10% of places using M 1 to correctly localizing in over 70% of places
using M 2, with all other aspects of the system remaining unchanged.</p>
      <p>Since the data used to train M 1 was only captured a few hours away from the data used to train M 2, these
results suggest that not only is the correct context necessary for generating a correct location belief, but that the
system must be flexible to change as the appearance of the environment varies. In fact, in some circumstances a
dynamically generated likelihood model approximated online using current environment data can out-perform a
pre-trained likelihood model that was calculated using exact ground truth data captured only a few hours earlier
[Low14]. However, a model that is approximated online also contains assumptions about the environmental
context [LM15]. If these assumptions are incorrect, it can also negatively affect the localization performance.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        Visual localization has made transformative progress in recent years, with existing image description methods able
to perform robustly in impressively challenging scenarios of perceptual change and differing viewpoints. These
robust image description methods can be used to evaluate similarity between different location representations,
which a belief generation system can convert into a likelihood or confidence metric. However, belief generation
methods are sensitive to the environmental context in which the system is operating. Thus training of the
models for a belief generation system must be appropriate for the current environmental conditions, and if a
system is operating in a dynamic, perceptually varying environment the belief generation models must reflect
this variation.
Acknowledgement
[AGT+18]
This work was supported by the Swedish Research Council (gran
        <xref ref-type="bibr" rid="ref4">t no. 2018</xref>
        -03807).
      </p>
      <p>R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. NetVLAD: CNN architecture for
weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(6):1437–1451, June 2018.</p>
      <p>M. Brubaker, A. Geiger, and R. Urtasun. Map-based probabilistic visual self-localization. IEEE
Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2016.</p>
      <p>H. Badino, D. Huber, and T. Kanade. Real-time topometric localization. In 2012 IEEE
International Conference on Robotics and Automation (ICRA), pages 1635–1642, May 2012.</p>
      <p>C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE
Transactions on Information Theory, 14(3):462–467, May 1968.</p>
      <p>M. Cummins and P. Newman. FAB-MAP: Probabilistic localization and mapping in the space of
appearance. The International Journal of Robotics Research, 27(6):647–665, 2008.</p>
      <p>M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with
applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395,
June 1981.
[GMMW10] A. Glover, W. Maddern, M. Milford, and G. Wyeth. FAB-MAP + RatSLAM: Appearance-based
SLAM for multiple times of day. In 2010 IEEE International Conference on Robotics and
Automation (ICRA), pages 3507–3512, May 2010.
[BGU16]</p>
      <p>
        A. Khaliq, S. Ehsan, Z. Chen, M. Milford, and K. McDonald-Maier. A Holistic Visual Place
Recognition Approach using Lightweight CNNs for Severe ViewPoint and Appearance Changes.
arXiv e-prin
        <xref ref-type="bibr" rid="ref4">ts, Nov 2018</xref>
        .
      </p>
      <p>
        Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436–444, May 2015.
[LGMR18] Y. Latif, R. Garg, M. Milford, and I. Reid. Addressing challenging place recognition tasks using
genera
        <xref ref-type="bibr" rid="ref4">tive adversarial networks. In 2018</xref>
        IEEE International Conference on Robotics and Automation
(ICRA), pages 2349–2355, May 2018.
      </p>
      <p>
        S. Lowry and M. Milford. Building beliefs: Unsupervised generation of observation likelihoods for
probabilistic localizatio
        <xref ref-type="bibr" rid="ref5">n in changing environments. In 2015</xref>
        IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), pages 3071–3078, Sept 2015.
      </p>
      <p>
        S. Lowry. Visual place recognition for persistent robot navigation in changing environments. PhD
thesis, Queensland Universi
        <xref ref-type="bibr" rid="ref1">ty of Technology, 2014</xref>
        .
      </p>
      <p>S. Lowry, N. Su¨nderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford. Visual
place recognition: A survey. IEEE Transactions on Robotics, 32(1):1–19, Feb 2016.</p>
      <p>M. Milford and G. Wyeth. SeqSLAM: Visual route-based navigation for sunny summer days and
stormy winter nights. In 2012 IEEE International Conference on Robotics and Automation (ICRA),
pages 1643–1649, May 2012.
[NG12]
[SMT+18]
[SSD+15]</p>
      <p>T. Nicosevici and R. Garcia. Automatic visual bag-of-words for online robot navigation and
mapping. IEEE Transactions on Robotics, 28(4):886–898, Aug 2012.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Naseer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Spinello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Burgard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Stachniss</surname>
          </string-name>
          .
          <article-title>Robust visual robot localization across seasons using network flows</article-title>
          .
          <source>In Proc. of the National Conference on Artificial Intelligence (AAAI)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Olson</surname>
          </string-name>
          .
          <article-title>Recognizing places using spectrally clustered local matches</article-title>
          .
          <source>Robotics and Autonomous Systems</source>
          ,
          <volume>57</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1157</fpage>
          -
          <lpage>1172</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Schindler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brown</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Szeliski</surname>
          </string-name>
          .
          <article-title>City-scale location recognition</article-title>
          . pages
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>June 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Maddern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Toft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hammarstrand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stenborg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okutomi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pollefeys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sivic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kahl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Pajdla</surname>
          </string-name>
          .
          <article-title>Benchmarking 6DOF outdoor visual localization in changing conditions</article-title>
          .
          <source>In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Su</surname>
          </string-name>
          ¨nderhauf,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shirazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dayoub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Upcroft</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Milford</surname>
          </string-name>
          .
          <article-title>On the performance of ConvNet features for place recognition</article-title>
          .
          <source>In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          , pages
          <fpage>4297</fpage>
          -
          <lpage>4304</lpage>
          ,
          <year>Sept 2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>