<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Component-wise Annotation and Analysis of Informal Place Descriptions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Igor Tytyk</string-name>
          <email>ihor.tytyk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Baldwin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing and Information Systems The University of Melbourne</institution>
          <addr-line>Melbourne VIC 3010</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <fpage>7</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>We analyse the strategies used in formulating situated informal location descriptions, by identifying geospatial expressions contained therein and annotating each for properties such as geospatial granularity and identi ability. Analysis of the annotations leads to insights such as the predominance of suburb-level expressions, and prevalence of vernacular expressions.</p>
      </abstract>
      <kwd-group>
        <kwd>Informal place description</kwd>
        <kwd>geospatial expression</kwd>
        <kwd>named entity</kwd>
        <kwd>vernacular geography</kwd>
        <kwd>computational linguistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>When informally describing one's whereabouts or giving directions, people make
heavy use of place descriptions. In the descriptions they relate their location
to the surrounding objects, or landmarks [1]. In order to make the instructions
interpretable by the recipient, the description provider should use familiar
landmarks and relate the location to them appropriately. Thus, for a human recipient
this task is trivial. However, computational systems cannot easily interpret place
descriptions expressed in natural language, or generate natural-sounding route
or place descriptions.</p>
      <p>Additionally, humans frequently make use of vernacular place descriptions,
or refer to landmarks using non-standard renderings of their `o cial' names, as
a result, making it hard for computers to understand the description, and also
humans unfamiliar with the locality being described. Wu and Winter state that
placenames and spatial relations are main components of place descriptions, and
in order to interpret the descriptions their components must be interpretable [2].</p>
      <p>In this study we focus on analyzing placenames in the context of informal
place descriptions, that is placenames that are elicited naturally and in situ,
without any constraints or guidance. We manually identify geospatial expressions
in a dataset of placename descriptions, and further annotate the granularity level,
identi ability and normalised name of each such expression.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>Winter et al. collected situated place descriptions from players of the Tell us
where location-based mobile game [3].1 The game consisted of submitting
textual descriptions of the location of smart phone users, along with their GPS
location. The reasons we chose to use this data are many fold. First, the data
was collected across a broad sample of users, ensuring the heterogeneity of the
data and reducing sample bias. Second, the participants were asked to submit
textual descriptions of their location from anywhere in the state of Victoria,
Australia. This led to a diversity of locations, but within a restricted area of
familiarity to our annotators and with the expectation of consistency in the
strategies used by the participants to describe their location. Third, the users
were given no guidelines for writing the descriptions, meaning that the data is
rich in vernacular placename descriptions and the strategies used by users to
describe their location are varied. Lastly, since the participants were using their
mobile phones and basing the placename descriptions on their actual location.
As a result, the descriptions are situated, spontaneous, and as natural as we
could hope for.</p>
      <p>A total of 2221 place descriptions were collected through the Tell us where
game. However, the data contained duplicates. Since we are interested in
qualitative rather than quantitative data, it was decided to eliminate all duplicates
from the corpus. As a result, the nal number of descriptions was 1858.
2.1</p>
      <p>
        Annotation
We manually annotated the placename descriptions for geospatial expressions, in
the form of: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) geospatial named entities (Federation Square, Swanston Street );
and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) geospatial noun phrases (school, a leafy park ). Named entities are proper
names, and are generally subclassi ed according to the semantic class of the
referent, e.g. into persons, locations and organisations. However, for the purposes
of this research, we restrict our attention to geospatial named entities.
      </p>
      <p>One of the broader goals of this work is the compositional semantic
interpretation of place descriptions. It was thus decided that we should aim for maximum
segmentation granularity in our annotation, while avoiding nested annotations.
For example, if the place description were an address such as Melbourne
University Bookshop, in Parkville near the library, we would segment it into the
geospatial named entities Melbourne University Bookshop and Parkville, and the
geospatial noun phrase the library. Note that we would not also identify
Melbourne University as a geospatial named entity, as it is nested within Melbourne
University Bookshop.</p>
      <p>We expected many of the geospatial expressions in the dataset to be noun
chunks. For example, Queen Victoria Market is a single noun chunk geospatial
named entity, while a tall building is a single noun chunk geospatial noun phrase</p>
      <sec id="sec-2-1">
        <title>1 http://telluswhere.net/</title>
        <p>
          Granularity level Description
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Furniture Location within a room, referring to furniture (by my computer, in bed )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Room Location within a building, or parts belonging to it (in my room, third
oor ), or medium-sized vehicles (car, train)
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Building Location of a building, street no. or building name (geomatics dpt, street
corner/intersection,)
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) Street Institution, public space or street level, larger than building and/or
vaguer boundaries than building. For example, transport
infrastructure (railway, tramline, Ave, Circuit ), a public space (school, cemetery,
mall ), or a natural landmark (lake, park )
(5) District Suburb, rural district or locality, or post code area (carlton, North
Melbourne, CBD )
(6) City Town or city level, and metropolitan areas (Canberra, near Geelong )
(7) Country Everything beyond city level, including highways, freeways (Princes
        </p>
        <p>Hwy), islands (French Island ), rivers (Murray river ) and states (WA)
Table 1. Granularity level classi cation (Richter et al., 2012); all examples are taken
from the actual dataset, and are presented using the original orthography
referring to a construction, which can be used as a reference point when
describing a location. In the interests of expediting annotation, we rst chunk-parsed
the place descriptions, using the Stanford CoreNLP tools.</p>
        <p>The annotation scheme we used is comprised of several layers. The rst
annotation layer contains information about whether a given segment is a geospatial
named entity (NE NP ) or a geospatial noun phrase (NP NP ). The remaining
layers apply to each geospatial expression.</p>
        <p>The second layer of annotation is the granularity level, and captures the
\zoom level" of each geospatial expression. The granularity level is judged on
the scale from 1 to 7, based on the classi cation of Richter[4] as detailed in
Table 1. In some instances, we diverge from Richter's classi cation. For example,
when a named entity is too big or too small for the bounding box of its default
zoom level, we override the default to capture the zoom level which best matches
the size of the bounding box. Mountain Highway, e.g., goes through only a few
suburbs of Melbourne, so we override the Country granularity level for highways
and assign it to the zoom level of City to better re ect its size. Similarly, when
determining the granularity level of towns, it was decided to shift the small towns
that do not have suburbs (e.g. Warragul and Pakenham) from City to District.</p>
        <p>The third layer of annotation is identi ability. This captures whether a
geospatial expression is unique within Victoria or there are multiple instances
of it. There are three possible values for identi ability: non-identi able,
identiable ambiguous, and identi able non-ambiguous. All geospatial noun phrases
(e.g. school, park, monument ) are non-identi able, since the set of these
objects within Victoria is very large and it is not possible to geocode them
without disambiguating information. Some geospatial named entities are considered
to be non-identi able due to their ubiquity and unavailability within standard
gazetteers of an exhaustive listing of every instance within Victoria (e.g.
McDonalds, 7-eleven). On the other hand, a geospatial named entity can refer to a
small set of several places which are enumerated in a gazetteer, in which cause
they are considered to be identi able ambiguous. For example, there are four
instances of Canning Street in Victoria, so every Canning Street in the corpus is
annotated as identi able and ambiguous. On the other hand, Flemington Road
is identi able non-ambiguous as there is only one instance in Victoria.</p>
        <p>As with granularity, the determination of identi ability is inevitably
subjective. To reduce the e ects of subjectivity as much as possible, we base the
judgement on two online gazetteers: OpenStreetMap2 and Google Maps.3. Google
Maps contains an extensive listing of named entities, but has poor coverage over
non-standard or vernacular equivalents of less well-known named entities. Thus,
while melb uni (standard = The University of Melbourne) and fed square
(standard = Federation Square) can be found in Google Maps, it does not contain local
vernacular such as broady (standard = Broadmeadows ) or non-standard
abbreviations such as pi for Phillip Island or fg for Ferntree Gully. Here, we elicited
support from locals and the Google search engine to interpret the geospatial
expression.</p>
        <p>Names of cafes, restaurants, and other small businesses were the most di cult
NEs to judge identi ability for. Even though OpenStreetMap lists a vast number
of buildings, eating places, shops, many of them were missing.</p>
        <p>The fourth and nal level of annotation is the placename normalisation.
Since the place descriptions were submitted by mobile phone, the dataset
contains a lot of abbreviations, misspellings and vernacular names. The canonical
name/spelling was provided in all such instances. For example, melb uni would
be normalised to The University of Melbourne. We observed an inevitable
dependency between identi ability and placename normalisation for geospatial named
entities: if a geospatial named entity cannot be identi ed, it is not possible to
determine its normalised rendering.</p>
        <p>Some of the submitted place descriptions do not contain any information
about the location (e.g. this will be an everlasting love) or are located outside of
Victoria (e.g. in Wagga Wagga). All such descriptions were marked as irrelevant
at the message level, using the IRREL label.</p>
        <p>For the annotation we used brat,4 a highly-con gurable, easy-to-use
webbased text annotation tool.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Analysis and Discussion</title>
      <p>Having annotated the dataset, we extracted a feature vector for every annotated
geospatial expression (excluding the irrelevant descriptions). Each feature vector
contained a set of values: id, geospatial expression type, granularity level, identi
ability, original spelling, and canonic (normalized) spelling. Then, all the vectors</p>
      <sec id="sec-3-1">
        <title>2 http://www.openstreetmap.org</title>
      </sec>
      <sec id="sec-3-2">
        <title>3 http://maps.google.com.au/</title>
      </sec>
      <sec id="sec-3-3">
        <title>4 http://brat.nlplab.org/</title>
        <p>were collated into a table and fed into the R statistical package5 for analysis.
In total, 3061 geospatial expressions were extracted, 2139 (70%) of which were
geospatial named entities. That is, without any constraint on the description,
about two thirds of geospatial expressions contained in place descriptions can
potentially be found in gazetteers.</p>
        <p>Figure 1 presents a distribution of geospatial expressions across zoom levels,
broken down by identi ability. The mean granularity value is 4.01, with a
standard deviation of 1.05. The most common granularity level is 4 (Street ), with
about 45% of all geospatial expressions. This means that when writing place
descriptions, users tend to make heavy use of streets, parks, squares, universities
and hospitals. Of the remainder, almost a quarter (24%) of the referents are
of the Building granularity level (level 3), and about 18% are of the (Suburb)
granularity level (level 5).</p>
        <p>The correlation between the granularity level and the fraction of
non-identi able placenames is not very surprising: the bigger the spatial feature, the
more likely it will be identi able. On the other hand, the appreciable drop in
non-identi ability at the Suburb level is proof of the salience and unambiguity
of the placenames within this level. After dividing all the geospatial expressions
by identi ability and ltering out from the non-identi able ones the names of
chain stores and eating places (e.g., McDonald's, Subway, Coles ), it is possible
to calculate how many of the named entities are not in the gazetteers
(Open</p>
      </sec>
      <sec id="sec-3-4">
        <title>5 http://www.r-project.org/</title>
        <p>StreetMap and Google Maps). Out of 2139 named entities, 51 (2.4%) are not
contained in the gazetteers. As a rule, among these placenames are names of
restaurants, apartment blocks, and other small scale companies (e.g. Pilkington
Glass, Ching Chong Food, Yarra Crest Appartments ).</p>
        <p>Another important category of geospatial expression is vernacular
descriptions. We found a considerable number of entrenched vernacular equivalents
of salient Victorian placenames, and common strategies for forming vernacular
place names. Some of them are formed by simply dropping one of the constituent
words (Narre Warren ! narre), some by \clipping" the word (Yackandandah
! yack, Dandenong ! dande), and some are acronyms (Phillip Island ! pi,
Ferntree Gully ! fg ). However, the most productive pattern was \embellished
clipping", shortening the expression to the rst syllable and adding a diminutive
su x -y, -ie, (e.g. Richmond ! richy, Beacons eld ! beacy, South Gippsland
Highway ! south gippy ). The pattern is particularly peculiar to the Australian
English. From the collected informal NEs, one can infer that only salient and
unambiguous placenames undergo the process of vernacularization. Since suburb
names in Victoria are unique and widely used for describing locations, they are
most commonly substituted by their informal equivalents.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In this paper, we have performed detailed component-wise analysis of informal
place descriptions. From this study, we can conclude the following: (a) most
geospatial expressions are streetnames, parks, buildings and suburbs; (b) the
presence of a suburb-level placename in the description increases its identi
ability; (c) vernacular place descriptions are commonly used, based on a small
number of strategies; and (d) geospatial named entities which are mostly likely
to not be contained in gazetteers are names of pubs, cafes, and small businesses.</p>
      <p>This paper has considered placenames independently of the message-level
interpretation. A logical next step is a compositional analysis of the place
description based on the annotations we have done, and investigation of how spatial
relational semantics (e.g, prepositions like near, at, in) impacts on message
interpretability and the properties of its constituent placenames.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Klippel</surname>
          </string-name>
          .
          <article-title>Way nding Choremes Conceptualizing Way nding and Route Direction Elements</article-title>
          .
          <source>PhD thesis</source>
          , Universitaet Bremen,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Winter</surname>
          </string-name>
          .
          <article-title>Interpreting Destination Descriptions in a Cognitive Way</article-title>
          . Schloss Dagstuhl, Dagstuhl,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-F.</given-names>
            <surname>Richter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cavedon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Stirling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Duckham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kealy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajabifard</surname>
          </string-name>
          .
          <article-title>Location-based mobile games for spatial knowledge acquisition</article-title>
          . In Janowicz et al., editor, Cognitive Engineering for
          <string-name>
            <surname>Mobile</surname>
            <given-names>GIS</given-names>
          </string-name>
          , Belfast, Maine, USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Richter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vasardani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Stirling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-F.</given-names>
            <surname>Richter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Winter</surname>
          </string-name>
          .
          <article-title>Zooming in zooming out hierarchies in place descriptions</article-title>
          .
          <source>Unpublished manuscript</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>