<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Probabilistic Approach to Modelling Spatial Language with Its Application To Sensor Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jamie Frost</string-name>
          <email>jamie.frost@clg.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alastair Harrison</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen Pulman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Newman</string-name>
          <email>pnewman@robots.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Oxford, Computational Linguistics Group</institution>
          ,
          <addr-line>OX1 3QD</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford, Mobile Robots Group</institution>
          ,
          <addr-line>OX1 3PJ</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We examine why a probabilistic approach to modelling the various components of spatial language is the most practical for spatial algorithms in which they can be employed, and examine such models for prepositions such as 'between' and 'by'. We provide an example of such a probabilistic treatment by exploring a novel application of spatial models to the induction of the occupancy of an object in space given a description about it.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Space occupies a privileged place in language and our cognitive systems, given
the necessity to conceptualise various semantic domains. Spatial language can
broadly be divided into two categories [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: functions which map regions to some
part of it, e.g. ‘the corner of the park’, and functions (in the form of spatial
prepositions) which map a region to either an adjacent region, projection or axis,
e.g. ‘the car between the two trees’. Approaches to implementing spatial models
have fallen into two categories. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for example takes a logic-based approach,
using a set of predicates on objects and binary or tertiary relations that connect
objects to generate descriptions of objects that distinguishes it from others. A
second approach is a numerical one, which given some reference object or objects
and another ‘located’ object 1 or point, assigns a value based on some notion of
‘satisfaction’ of the spatial relation in question. But conceptualisation of this
assigned value has a large amount of variety. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses a ‘Potential Field Model’
characterised by potential elds which decreases away from object boundaries.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for example uses a linear function to model topological prepositions such as
‘near’, and produces a value in the range [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] depending on whether some point
is directly by the object in question or on/beyond the horizon.
      </p>
      <p>However, we argue that a conceptually more rigorous probabilistic approach
is needed for all aspects of spatial language, in which validity of some spatial or
semantic proposition is determined by the likelihood a human within the context
1 We use the term ‘locative expression’ to refer to any expression whose intention is
to identify the location of an object or objects (such as ‘a chair by the table’). The
‘located object’ refers to the object in question, and the ‘reference’ object(s) are
others that can be used to determine the location of the located object (the table in
the latter example).
of the expression would deem it to be true. We motivate this by the following
reasons:
1. It provides a uniform treatment of condence across both spatial and
nonspatial domains; uncertainty may be established in the latter in cases of
variants of descriptive attributes (such as names) for example. As a result
these models can be used in a variety of spatial algorithms such as searching
or describing objects and inferring the occupancy in space of an object.
2. In the latter of the above applications (which will be explored in detail) as
well as other independent systems or frameworks, a probabilistic
representation is often required.</p>
      <sec id="sec-1-1">
        <title>3. Combining multiple spatial observations becomes more transparent: While</title>
        <p>any monotonically increasing or decreasing function is sucient to establish
a relative measure of applicability across candidate points or objects, the
lack of consideration of the function’s ‘absolute’ value becomes problematic
when combining data from dierent spatial models, for example if we were
to say ‘The chair is by the table and between the cat and the rug’.</p>
        <p>
          Such an approach of assessing the ‘acceptibility’ of regions given a spatial
relation is based on a concept called ‘Spatial Templates’ established by [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], but
a probabilistic approach puts more emphasis on absolute value. What precisely
then do we mean by ‘human condence’ ? One might think we can measure it by
the probability that a given human would consider a (spatial) proposition to be
true. But such a notion neglects a concept in philosophy known as subjectivism,
in which rational agents can have degrees-of-belief in a proposition (rather than
constricted to boolean answers of ‘agree’ or ‘disagree’), and probabilities can be
interpreted as the measure of such a belief. With such an assumption it is
therefore sucient to construct our models based on the ‘average degree-of-belief’
across people in some sample. Generically, this condence can be dened as
p( j ), where represents the proposition and represents the context. For a
particular spatial model, one might use p(in_f ront(obj1; obj2)jxt), where xt is
the current position of the observer. We use x as a convenience to indicate that
the location (say its centre of mass) of the located object in is at position x.
        </p>
        <p>In the next section we present such models we have developed for the
prepositions ‘between’ and ‘by’, and present a possible novel approach in which we
might induce the occupancy of an object in space given a spatial description.We
carried out an online experiment in which users asserted the validity of various
locative expressions given a variety of scenes. For each category of spatial
relation, e.g. by and between (and a number of other prepositions not presented
here), the user was asked to rate the extent to which they agreed with the given
statement, on a scale of 1 (representing ‘no’) to 7 (representing ‘yes’), each
question accompanied by a picture 2. To produce the ‘average degree-of-belief’ we
conn = Connecting lines</p>
        <p>|q- v1|
v1
3
1
|v1-v2|
q|x-q|
2
2
(1) Full Validity
(2) Partial Validity
(3) No Validity
Reference
Object</p>
        <p>
          v2
Located Object
(centred at x)
scaled the average answer to [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ]. Our models are based on the Proximal Model
as described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. That is, features are based on the nearest point to the
reference object, thus incorporating the shape of the object. This is in contrast to
the Centre-of-Mass Model (as used in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for example) which treats all objects as
points. This latter approach is computationally simpler and requires less data,
although can be problematic for larger objects; if for example we were to assess
the acceptibility of ‘you are near the park’, we would expect such a judgement
to be based on proximity to the edge of the park rather than the centre.
2
2.1
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Between</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Spatial Models for ‘between’ and ‘by’</title>
      <p>
        The model we present below determines the acceptibility of a proposition =
between(a; b; c) such that a is the located object, b and c the reference objects,
and the position of a is at x. We determined that any point within the convex hull
of the two reference objects (excluding the area of the objects themselves) was
deemed to be fully valid. Outside of this area, certainty degraded proportional
to the centrality of the object. Our model below quanties these ndings:
p( xjxt) = p( x) =
max(0; 1
jxtolqj ) if x 2= Hull(ref1 [ ref2)
1 otherwise
(1)
2 The experiment was restricted to native English speakers only, due to cross-linguistic
variations in spatial coding, such as a lack of distinction between dierent frames of
reference (that is, distinguishing between say the deictic interpretation of in front
of the tree based on the position of the observer, and the intrinsic interpretation
based on the salient side of an object, as in in front of the shop) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>s.t. q = arg minfjx</p>
      <p>q0
l : (v1; v2) = arg minfjx
l0
q0j q0 on line lg; tol = jv1
q0j q0 on line l0; l0 2 conng
v2jk1( j
jv1
q
conn = f(v1; v2)jv1 2 ref1; v2 2 ref2; (v1; v2) 2 edges(Hull(ref1 [ ref2))g
0.9
0.8
0.7
0.6
0.5 cne
e
d
i
0.4 fnoC
0.3
0.2
0.1
0.5 0
0.5
0.4
lxuh 0.3
ven
tcoo 0.2
cne
itsa 0.1
D</p>
      <p>0</p>
      <p>
        x is the central point of the located object in question, ref1 and ref2 are
the vertices of the two referenced objects, Hull(V ) gives the convex hull of the
set of vertices V (thus q is the nearest point on the convex hull to x), tol gives
the maximum allowed distance from the convex hull before the condence score
is 0, conn is the set of 2 edges on the convex hull which connect the shapes
corresponding to ref1 and ref2 (that is, the straight dotted lines in Fig. 2 and
function edges gives the edges of a polygon. k1 controls the maximum tolerance
permitted, a specied proportion of the distance between the two objects, and
k2 controls the curvature of this ambiguous region. Via model tting (using the
minimum sum of squared dierences) we found values of k1 = 0:55 and k2 = 2:5
yielded the best results (see Figure 3).
For the preposition ‘by’, there are 3 main variables that can inuence the
magnitude of the condence score; the base width ( w) and height (h) of the reference
object, and the distance ( d) from the reference object. For polygonal objects,
users were given 8 dierent reference objects in their scenarios, of a variety of
dierent widths and heights. It was found that although the condence score
for a given distance with respect to the width of the object (i.e. wd ) was a good
starting point (see Fig 4(a)), greater heights led to a small increase in
probability. Assuming a linear relationship with height (again relative to the object
width), we therefore divide by wh + kh for some constant kh (given that at
objects such as lakes still yield a non-zero condence score). Additionally, smaller
objects tended to have a larger tolerance of distance with respect to this width,
although this eect became less prominent as the width of the object became
very large. Thus we multiply the distance by log(w + kw) for some constant
kw 1, since for very small objects we still expect some tolerance of distance.
Combining these relationships and simplifying, we suggest the following model:
p( i;j jxt) = clamp(kc
kmd
log (w + kw) )
h + khw
where km and kc the coecients of some line to obtain the condence from the
adjusted distance, and clamp clamps the overall value to the range [0; 1]. Fig.
4(b) shows the eect of these using these transformations, using kw = 14 and
kh = 2, resulting in values for km and kc of 1.38 and 1.15 respectively. Ultimately
it is impossible to base any model of ‘by’ on physical metrics alone; the ‘use case’
of objects, i.e. the set of contexts in which an object is used, is likely to have
an eect. In Fig. 4(b) for the case where the reference object was a chair, it is
apparent condence deteriorated with distance much faster than expected. But
if one considers that a chair is intended ‘to be sat on’, and therefore adjust the
recorded height h to the more salient ‘seat-level’, we obtain condence values
very close to the model for this example.
(2)
0
1
2 Distan3ce(/w) 4
5
6
0
0.2 0.4 Adjusteddistance
0.6 0.8
1
1.2
We now propose a method to infer the occupancy in space of a particular object
given an observation in the form of a spatial description made about it. This is
strongly predicated on a probabilistic treatment of our spatial models discussed
earlier. Occupancy Grid Mapping is a technique employed in robotics to generate
maps of an environment via noisy sensor measurements. The occupancy grid map
is useful because it can subsequently be fused with other maps obtained from
say physical sensors. The aim is to produce a posterior p(mjz1:t; x1:t) where
m = fmig is a partitioning of space into a nite grid of cells mi, z1:t are the
observations made up to time t, and x1:t are the poses of the robot at each
observation. mi is the event that cell i is occupied, thus p(mi) describes the
probability that cell i is occupied. In the scope of this paper, we focus on how
the ‘inverse sensor model’ p(mijzt; xt) can be computed, although a more detailed
description of Occupancy Grid Mapping can be found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Our aim is to compute this inverse sensor model, in terms of our
calculated p( jxt) probabilities from the previous section. An important
simplifying assumption we make is that locative expressions refer to a specic point
in space, within the boundaries of the object in question. This seems
intuitive; were we to describe a town as being ‘10km away’, it would clearly be
fallacious to assume that the entirety of the town is precisely 10km away. We
dene a probability p(ri;j jxt; zt), where ri;j represents the event that the
observer was referring to a point (i; j) in their observation, and zt is the
locative expression such that the position of the located object is not specied, say
 (since we consider such a position in ri;j ). We can then calculate the
desired probability easily by simply normalising our condence function across the
space: p(ri;j jxt; zt = ) = p( i;j jxt). Before we determine how to calculate
p(mi;j jxt; zt), we analyse the conceptual parallelism between traditional
Occupancy Grid Mapping and that employed in our linguistic context.
3.1</p>
      <sec id="sec-2-1">
        <title>A Comparison of Sensor Models</title>
      </sec>
      <sec id="sec-2-2">
        <title>On a cursory inspection there are some initial clear similarities that can be</title>
        <p>drawn between the traditional occupancy grid map and our linguistic variant.
Both involve the pose of some observer xt (although depending on the spatial
model this is sometimes irrelevant) and some manifestation of an observation zt;
a physical sensor reading with respect to the traditional approach and a
locative expression for the linguistic approach. Upon closer analysis more similarities
can be drawn. With a physical sensor, we expect a measurement of distance to
a point being sensed to be noisy, and thus maintain a probability distribution
with regards to the precise position of this point. This corresponds to our
distribution p(ri;j jxt; zt). For a locative expression of a town being 10km away,
human error or rounding is likely to lead to uncertainty in the judged distance,
and additionally the direction of the town is unspecied, leading to a ‘blurred
doughnut’ type distribution.</p>
        <p>There are however a number of conceptual dierences. With traditional
Occupancy Grid Mapping the posterior for a cell is only updated if it was part of
the sensor range (i.e. we make no assumption with regards to space outside the
limited range of our sensor). With locative expressions however, we can infer
data outside that explicitly conveyed. Suppose for our town example, the town
was 1km in diameter, and that the distance judgement of 10km (to some point
within the town) was entirely accurate. If the centre of the town was actually
10.5km away, our observation would still hold, but a point any further could not
possibly be occupied by ‘town’.
3.2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Computing the Inverse Sensor Model</title>
        <sec id="sec-2-3-1">
          <title>We can use the above fact to compute p(mi;j jxt; zt) from our previously calcu</title>
          <p>lated values of p(ri;j jxt; zt). Let Q be the set of possible ‘poses’ for the located
object such that the point (i; j) is within the object’s boundary, and a pose is the
position and orientation of the object. Given our assumption that the observer
referred to a point within the connes of their perceived position of the object,</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Q represents all valid poses of the object given such a point. It follows that</title>
          <p>p(mi;j jxt; zt) = Rq2Q p(qjxt; zt) dq. For each pose q 2 Q there is an associated
frame (iq ; jq ; q) where (iq ; jq ) is the nominal centre of the shape in pose q (say
The knownshape of
the object(presuming
availability of this data)
in some poseq.</p>
          <p>The nominalcentre of
(tihqe*, ojqb*)je.ctin this pose,</p>
          <p>The axis/rotation θqof
the objectin this pose.</p>
          <p>cTohnespidoeinratt(iio,jn).under
the centre of mass) and q is the rotation of the object about this point. It is
then possible to use p(riq ;jq jxt; zt) to refer to the probability of the object being
positioned at (iq ; jq ) (see Fig. 5). The pose also has a probability p( q) associated
with its orientation; for simplicity we assume this is independent of xt and zt
(although the use of p( qjzt) would allow us to model for example observations
such as The boat is in front of you, facing East ). Putting this together, this
gives us the following equation to compute the occupancy probability:
Z</p>
          <p>q2Q
p(mi;j jxt; zt) =
p(riq ;jq jxt; zt)p( q) dq s.t. Q = fq j (i; j) 2 R(q)g
(3)
where R(q) is the region of the located object in pose q. Considering the pose
of the located object has useful consequences; it allows us to model for example
that vehicles are aligned to the direction of a road. Given a lack of prior shape
information with regards to the located object, and given the above integral is
somewhat intractable, a suitable approximation is to use the approximate width</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>W of the object (which can be obtained via knowledge of the class of the located</title>
        <p>object, say the usual width of a town). If we infer as little about the shape as
possible, the resulting approximation of the shape is a circle of diameter W .</p>
      </sec>
      <sec id="sec-2-5">
        <title>Equation 3 then reduces to the following:</title>
        <p>p(mi;j jxt; zt) =</p>
        <p>Z
(i0;j0)2R( W2 ;i;j)
p(ri0;j0 jxt; zt) di0 dj0
(4)
s.t. R( W2 ; i; j) is a set of points in a circular region of centre (i; j) and radius W2
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions &amp; Future Work</title>
      <sec id="sec-3-1">
        <title>In this paper we motivated a probabilistic approach to modelling spatial language that can be used in a number of algorithms, and provided an example of such an algorithm to induce a sense of ‘the space that an object occupies’ via the use of occupancy grid maps. We also presented models for the prepositions</title>
        <p>
          ‘by’ and ‘between’ based on the results of an online experiment. Future work is
predominantly focused further development of our dialogue manager language
that interacts with these spatial models, as well as developing further algorithms
which make use of such models. For example, we developed an algorithm that
combines semantic and spatial models to provide condence scores for arbitrarily
complex locative expressions (including those based on current bounded
trajectories, such as ‘the second left’). We are also investigating a measure of ‘relevance’
(one of the Gricean maxims [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]) for locative expressions, a consideration that is
particularly key in generating descriptions of objects or locations.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>This work has been supported by the European Commission under grant agreement number FP7-231888-EUROPA.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Herskovits</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1986</year>
          )
          <article-title>Language</article-title>
          and Spatial Cognition, Cambridge University Press, .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Haddock</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>1991</year>
          )
          <article-title>In Proceedings of the fth conference on European chapter of the Association for Computational Linguistics Morristown</article-title>
          , NJ, USA:
          <article-title>Association for Computational Linguistics</article-title>
          . pp.
          <fpage>161166</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Olivier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tsujii</surname>
            ,
            <given-names>J.-I.</given-names>
          </string-name>
          (
          <year>2004</year>
          <source>) Articial Intelligence Review</source>
          <volume>8</volume>
          ,
          <fpage>147158</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kelleher</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Costello</surname>
            ,
            <given-names>F. J.</given-names>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>Comput</article-title>
          . Linguist.
          <volume>35</volume>
          (
          <issue>2</issue>
          ),
          <fpage>271306</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Logan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sadler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Language and space chapter A computational analysis of the apprehension of spatial relations</article-title>
          , pp.
          <volume>493529</volume>
          MIT Press (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Levinson</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wilkins</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          <article-title>Grammars of Space: Explorations in Cognitive Diversity chapter 1</article-title>
          , pp. 45 Cambridge University Press (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Regier</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Carlson</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          (
          <year>2001</year>
          ) J.
          <source>Exp Psychol Gen</source>
          <volume>130</volume>
          (
          <issue>2</issue>
          ),
          <fpage>273298</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Thrun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2003</year>
          )
          <article-title>Auton</article-title>
          .
          <source>Robots</source>
          <volume>15</volume>
          (
          <issue>2</issue>
          ),
          <fpage>111127</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Grice</surname>
            ,
            <given-names>H. P.</given-names>
          </string-name>
          (
          <year>1975</year>
          )
          <article-title>Logic and</article-title>
          conversation In Peter Cole and Jerry L. Morgan, (ed.),
          <source>Syntax and semantics 3: Speech Acts</source>
          , volume
          <volume>3</volume>
          , pp.
          <volume>4158</volume>
          New York: Academic Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>