<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Geolocating Orientational Descriptions of Landmark Configurations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>James Pustejovsky</string-name>
          <email>jamesp@cs.brandeis.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Verhagen</string-name>
          <email>marc@cs.brandeis.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anthony Stefanidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Caixia Wang</string-name>
          <email>cwangg@gmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Geospatial Intelligence George Mason University</institution>
          ,
          <addr-line>Fairfax, VA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science Brandeis University</institution>
          ,
          <addr-line>Boston, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we outline how to translate verbal subjective descriptions of spatial relations into metrically meaningful positional information, and extend this capability to spatiotemporal monitoring. Document collections, transcriptions, cables, and narratives routinely make reference to objects moving through space over time. Integrating such information derived from textual sources into a geosensor data system can enhance the overall spatiotemporal representation in changing and evolving situations, such as when tracking objects through space with limited image data. We focus on landmark identification, since it proves to be a more tractable problem than open-domain image recognition.</p>
      </abstract>
      <kwd-group>
        <kwd>Spatial language</kwd>
        <kwd>geolocating</kwd>
        <kwd>spatial configurations</kwd>
        <kwd>landmarks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The relation between language and space has long been an area of active research.
Human languages impose particular linguistic constructions of space, of
spatiallyanchored events, and of spatial configurations that relate in complex ways to the
spatial situations in which they are used. Establishing tighter formal specifications of
this relationship has proved a considerable challenge and has so far eluded general
solutions. One reason for this is that the complexity of spatial language has often been
ignored. In much earlier and ongoing work, language is assumed to offer a relatively
simple inventory of terms for which spatial interpretations can be directly stated.
Examples of this can be found not only in accounts that focus on formalizations of
particular tasks, such as path and scene descriptions, navigation and way-finding, but
also in foundational work on the formal ontology of space, on qualitative spatial
calculi, and on cognitive approaches.</p>
      <p>
        Visual information in human experience is frequently accompanied by a linguistic
description of the image or scene. Consider, for example, the image in Figure 1. If the
goal is to identify the region of the image where one should look for the lost keys, one
first must identify the correct tree. If this image is automatically segmented using a
stock library of images for trees and entrances
        <xref ref-type="bibr" rid="ref11 ref6">(Millet et al., 2005; Hollink et al.,
2004)</xref>
        , several candidate regions for “tree” and “entrance” will be identified. Each
candidate region may then be ranked with respect to how likely it is to correspond to a
tree or an entrance, producing two ranked lists of candidate regions, T = (T1; T2; : : :)
and E = (E1;E2; : : :), where Ti are the candidate regions for “tree”, Ti ranks higher
than Ti+1, and Ej are the candidate regions for “entrance”. The associated verbal
description invokes the “left of” relation, thereby restricting the search for the
appropriate pair of candidate regions by imposing the corresponding spatial
constraint: LEFT_OF(Ti;Ej). The (Ti;Ej) pairs that do not satisfy the specified spatial
relation are given lower ranking, thus increasing the likelihood of identifying
correctly the relevant region in the image.
      </p>
      <p>
        Over the past decade, image annotation has been the focus of attention within
several research areas, in particular, in the context of content-based image retrieval
(CBIR). Some research, including the work done within the TRIPOD project at
Sheffield, examines the different ways that geo-referenced images can be described
        <xref ref-type="bibr" rid="ref3">(Edwardes et al., 2007)</xref>
        , though different approaches, such as the ESP Game can also
be used to address this problem. Much of the work on text-based image retrieval has
relied on extracting information about the image from image captions, as well as the
surrounding text and related metadata, such as filenames and anchor text extracted
from the referring web pages, as for example, in Yahoo!’s Image Search. Another
kind of image annotation data has become available with the rise of “citizen
geography”. User-annotated geo-referenced digital photo collections allowing for
image content labeling and annotation are being generated in distributed
environments, such as Flickr and GoogleEarth. Images are indexed with user-supplied
labels that typically form a particular language subset
        <xref ref-type="bibr" rid="ref5">(Grefenstette, 2008)</xref>
        . Under
such schemes, however, detailed image content annotation is not provided. A notable
exception is the “Flickr notes” feature that allows users to annotate regions within
images. This and other adaptations of the Fotonotes image annotation standard and
the associated software provide an opportunity for detailed annotation of images with
both captions and extended free text associated with each annotated image region.
While such efforts as those discussed above are useful metadata encodings over
images, there remain significant problems with unconstrained object recognition.
Hence, in this paper, we will focus on linguistic descriptions of landmark
configurations. Landmarks are visually identifiable objects with fixed spatial
locations, which carry semantic meaning for large groups of individuals. They are
typically large man-made or physical structures (e.g. buildings, communication
antennas, hills) and play an important role in navigation and wayfinding decisions
        <xref ref-type="bibr" rid="ref10">(see e.g. Werner et al, 1998; Steck and Mallot, 2000)</xref>
        . For example, routes can be
expressed as sequences of landmarks (Duckham et al., 2010) and paths connecting
them. The saliency of different landmarks can be expressed in terms of their
perceptive, cognitive, and contextual value (Caduff and Timpf, 2008). In this section
we address their role for geolocating an observer describing their relative orientational
properties in his/her view of a scene.
      </p>
      <p>Let us consider the scene depicted in Fig. 2, taken from Google StreetView, of the
intersection of Huntington and Mass. Avenues in Boston. In it we can identify
reference landmarks, namely three buildings: Horticultural Hall (HC), Prudential
Center (PC) and the Christian Science Monitor building (CSM). It also comprises
various other objects, for example a white van, a black truck, and a red car. Our
interest is in geolocating the observer of this scene by using orientational descriptions
of the relative appearance of the landmarks contained in it.</p>
      <p>Assuming that a narrator is familiar with these three landmarks, he/she could describe
the scene as follows:</p>
      <p>I see the SW and SE sides of Horticultural Hall, and
to the right of it I see the SW and SE sides of the Prudential Center, and
to the right of it I see the Christian Science Monitor Building.</p>
      <p>In this situation the narrator has described the scene through three types of statements:
•
•
•
explicit reference to specific landmarks, positioning the scene in their vicinity,
explicit description of orientational properties expressing the relative
positions of these landmarks in an observer-centric system1, and
implicit visibility declarations, whereby she indicates that she can observe
specific façades of landmark buildings.</p>
      <p>
        The orientational properties are modeled using ISO-Space
        <xref ref-type="bibr" rid="ref12">(Pustejovsky et al., 2011)</xref>
        .
ISO-Space distinguishes two major types of elements: entities and relations. Entities
include location, spatial entity, motion, event (or spatial state), and path. The two
main relations between these entities are the distance relation and the qualitative
spatial relation, which can be either a topological or a relative spatial relation.
      </p>
      <p>
        Relations such as “to the right of” are annotated as a relative spatial relation
between two elements, the figure and the ground, and the viewer perspective is
accounted for by two further attributes on the link tag: rframe, with values absolute,
relative and intrinsic, and viewer, which contains a variable indexed to the viewer
        <xref ref-type="bibr" rid="ref10 ref9">(Levinson, 2003, Freksa 1992, Ligozat, 1998)</xref>
        . Using the three kinds of information
above (landmarks, relative positions and visibility declarations), we can identify the
three landmarks in a GIS (Fig. 3), and proceed to estimate the location of the observer
through a series of view analysis and visibility polygon overlays as we describe
below.
      </p>
      <p>
        For every visibility statement we can identify a visibility zone through viewshed
analysis, using the local GIS information
        <xref ref-type="bibr" rid="ref8">(Kim et al., 2004)</xref>
        . The 2D visibility zone of
a specific façade (or any other object in space) is the locus of all points from which at
least a part of this façade is visible. For example, in Fig. 4, the visible zone of façade
1 An alternative would be to use the intrinsic orientation of the landmark, in which case "to the
right" would be interpreted relative to the landmark and not relative to the observer. Clearly,
both options would need to be explored down-stream.
      </p>
      <p>F1 is shown as the gray-shaded area. From any point outside this area it would be
impossible to see façade F1.</p>
      <p>Each additional visibility statement introduces additional visibility zone
information, and the location of the narrator can be eventually determined through the
intersection of the corresponding visibility zones through polygon clipping
techniques, such as Weiler-Atherton (1977). Fig. 5 shows the implementation of this
process for the scene of Fig. 1, through a progressive assessment of visibility
conditions for HC, CSM, and PC.</p>
      <p>Visibility zones (Z 1 in red, Z 2 in
blue) of two visible facades of
Horticultural Hall</p>
      <p>Visibility zones (Z 3) of one visible
facade of GSN</p>
      <p>Visibility zones (Z 4) of anther
visible facade of GSN
sub-region=Z 1 ! Z2
sub-region =Z 3!(Z1 ! Z2)</p>
      <p>sub-region = Z3!(Z1 ! Z2 )
Visibility zones (Z 5) of one visible
facade of Prudential Center
Visibility zones (Z 6) of one visible
facade of Prudential Center
sub-region = Z 5! (Z3!(Z1 ! Z2 ))</p>
      <p>sub-region = Z 5! (Z3!(Z1 ! Z2 ))</p>
      <p>Fig. 5. The progressive visibility intersection process</p>
      <p>The narrator position estimated through the process visualized in Fig. 5 is shown
on the local map in Fig. 6, marked as a red triangle. The triangle corresponds to all
positions from which the narrator would have a view of our scene that would be
comparable to the one depicted in Fig. 2 in terms of the orientational relationships of
the three depicted landmarks.
In this paper, we discuss the integration of multi-source data analysis for spatial
knowledge extraction from images. In particular, we focused on the specific
contribution of verbal subjective descriptions of spatial relations involving
orientation, and how these can be translated into metrically interpretable positional
statements within a GIS environment. We concentrated on the more tractable
subproblem of landmark identification. Orientational information in language was
modeled with ISO-Space annotation, providing both qualitative spatial relations and
anchored GPS values, once geolocating is performed.</p>
      <p>This work is ongoing research aimed to allow for the integration of information
available from different sources, better addressing the evolving needs of the
geoinformatics community. Our preliminary results suggest that scene content
information provided by verbal description can be mapped faithfully to metrically
grounded information. As this is preliminary work, there are clearly many details to
be worked out. For example, we have not yet precisely defined how orientational
relations are used to identify a landmark in a GIS, especially with anonymous
landmarks, a problem exacerbated when an ambiguity between intrinsic and
observerbased relative orientation cannot be easily resolved.</p>
      <p>One of the ultimate goals of this research is the development of algorithms that take
an image and accompanying verbal utterances and maps these to a partition of a 2D
grid. This application would be tuned to deal with more natural utterances than the
somewhat stilted verbal descriptions given with Fig. 2 above.</p>
    </sec>
    <sec id="sec-2">
      <title>4 Acknowledgements</title>
      <p>This research was funded under the NURI grant HM1582-08-1-0018 by the National
Geospatial Agency.
Steck S., Mallot, H.: The Role of Global and Local Landamarks in Virtual Environment
Navigation. Presence, 9(1), 69-83 (2000).</p>
      <p>Weiler K., P. Atherton. Hidden Surface Removal using Polygon Area Sorting. ACM
SIGGRAPH Computer Graphics, 11(2), 214-222 (1977).</p>
      <p>Werner S., Krieg-Brueckner B., Mallot H., Schweizer K., Freska C.: Spatial Cognition: The
Role of Landmark, Route, and Survey Knowledge in Human and Robot Navigation. In: Jarke
M., Pasedach K., Phl K. (eds.) Informatik ’97, pp. 41-50, Springer Verlag (1997).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Cognitive</given-names>
            <surname>Processing</surname>
          </string-name>
          ,
          <volume>9</volume>
          (
          <issue>4</issue>
          ),
          <fpage>249</fpage>
          -
          <lpage>267</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>Location Based Services</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <fpage>28</fpage>
          -
          <lpage>52</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Edwardes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Purves</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Birche</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Matyas</surname>
          </string-name>
          .
          <article-title>Deliverable 1.4: Concept ontology experimental report</article-title>
          .
          <source>Technical report, TRIPOD Project</source>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Campari</surname>
          </string-name>
          , and U. Formentini, eds,
          <source>Theories and methods of spatiotemporal reasoning in geographic space</source>
          , pages
          <fpage>162</fpage>
          -
          <lpage>178</lpage>
          . Springer, Berlin, (
          <year>1992</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Comparing the Language Used in Flickr, general Web Pages, Yahoo Images and Wikipedia</article-title>
          .
          <source>In OntoImage</source>
          <year>2008</year>
          , LREC, pages
          <fpage>6</fpage>
          -
          <lpage>11</lpage>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , G. Nguyen, G. Schreiber,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wielinga</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Worring</surname>
          </string-name>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>Adding spatial semantics to image annotations</article-title>
          .
          <source>In Proceedings of 4th International Workshop on Knowledge Markup and Semantic Annotation</source>
          , 3rd International Semantic Web Conference.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kim Y.-H.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Rana</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wise</surname>
            <given-names>S. Exploring</given-names>
          </string-name>
          <article-title>Multiple Viewshed Analysis using Terrain Features and Optimisation Techniques</article-title>
          .
          <source>Computers &amp; Geosciences</source>
          ,
          <volume>30</volume>
          (
          <fpage>9</fpage>
          -
          <lpage>10</lpage>
          ), pp.
          <fpage>1019</fpage>
          -
          <lpage>1032</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Levinson</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          <article-title>Space in Language and Cognition</article-title>
          , Cambridge University Press, (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Ligozat</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Reasoning about cardinal directions</article-title>
          .
          <source>Journal of Visual Languages and Computing</source>
          ,
          <volume>9</volume>
          :
          <fpage>23</fpage>
          -
          <lpage>44</lpage>
          . (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Millet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , I. Bloch,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hede</surname>
          </string-name>
          , and PA Moellic.
          <article-title>Using relative spatial relationships to improve individual region recognition</article-title>
          .
          <source>In Proc. 2nd Eur. Workshop Integration Knowledge, Semantics and Digital Media Technology</source>
          , pages
          <fpage>119</fpage>
          -
          <lpage>126</lpage>
          . (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moszkowicz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Verhagen.</surname>
          </string-name>
          ISO-Space:
          <article-title>The Annotation of Spatial Information in Language</article-title>
          ,
          <source>in Proceedings of ISA-6: ACL-ISO</source>
          , Oxford, England, (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>