<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conversational Natural Language interaction for Place-related Knowledge Acquisition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Srinivasan Janarthanam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oliver Lemon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingkun Liu</string-name>
          <email>x.liu@hw.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Phil Bartie</string-name>
          <email>philbartie@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William Mackaness</string-name>
          <email>william.mackaness@ed.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiphaine Dalmas</string-name>
          <email>tiphaine.dalmas@aethys.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jana Goetze</string-name>
          <email>jagoetze@kth.se</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Interaction Lab, Heriot-Watt University</institution>
          ,
          <addr-line>Edinburgh</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KTH Royal Institute of Technology</institution>
          ,
          <addr-line>Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of GeoSciences, University of Edinburgh</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>School of Informatics, University of Edinburgh</institution>
        </aff>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>38</lpage>
      <abstract>
        <p>We focus on the problems of using Natural Language interaction to support pedestrians in their place-related knowledge acquisition. Our case study for this discussion is a smartphone-based Natural Language interface that allows users to acquire spatial and cultural knowledge of a city. The framework consists of a spoken dialogue-based information system and a smartphone client. The system is novel in combining geographic information system (GIS) modules such as a visibility engine with a question-answering (QA) system. Users can use the smartphone client to engage in a variety of interleaved conversations such as navigating from A to B, using the QA functionality to learn more about points of interest (PoI) nearby, and searching for amenities and tourist attractions. This system explores a variety of research questions involving Natural Language interaction for acquisition of knowledge about space and place.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Although spatial information is available to users through graphical/visual
smartphone apps, there are still important problems such as how such information is
delivered safely and proactively, and without cognitively overloading the user.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] suggested that cognitive load of information presented in textual and
speechbased interfaces is medium and low respectively when compared to more
complicated visual interfaces. An interesting research objective, therefore, is to build
a hands-free and eyes-free system that engages pedestrian users by presenting
all information and receiving user requests through Natural Language.
      </p>
      <p>In addition, and in contrast to other mobile applications, we focus on
conversational interaction { meaning that the system accumulates information over
time, and that the interface plans its utterances to achieve long-term
communicative goals. Note that recent speech interfaces such as Siri use Natural Language
input but they are not truly conversational because they mostly handle single
user inputs one at a time, without maintaining and updating models of the
dialogue context.</p>
      <p>
        Our Natural Language interface integrates with a city model and a visibility
engine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to identify points of interest and visible landmarks for presentation, a
pedestrian tracker to improve the GPS positioning of the user, and a
questionanswering (QA) system to enable users to explore information about the city
more freely than with a graphical interface.
      </p>
      <p>Table 1 presents an example dialogue interaction with the interface showing
the use of visibility information and Question-Answering.</p>
      <p>User: I want to go to a museum.</p>
      <p>System: How about the National Museum of Scotland? It is a ve star rated attraction.
User: Okay.</p>
      <p>System: Okay. Directions to the National Museum of Scotland.</p>
      <p>System: Turn right on to South Bridge and walk towards the tower in front of you.
...</p>
      <p>User: I am looking for an Indian restaurant.</p>
      <p>System: Okay. There is an Indian restaurant named Kushis Diner on West Nicholson Street.
User: Okay. Take me there.
...</p>
      <p>System: Near you is the famous statue of David Hume.</p>
      <p>User: Tell me more about David Hume.</p>
      <p>System: David Hume is a Scottish philosopher who ....
There are several mobile apps such as Triposo, Tripwolf, and Guidepal that
provide point of interest information, and apps such as Google Navigation that
provide navigation instructions to users. However, they demand the user's visual
attention because they predominantly present information on a small screen of a
mobile device. In contrast, we are developing a speech-only interface in order to
keep the user's cognitive load low and avoid users from being distracted (perhaps
dangerously so) from their primary task.</p>
      <p>
        Previously, generating navigation instructions in the real world for
pedestrians has been an interesting research problem in both computational linguistics
and geo-informatics [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. For example, CORAL is an NLG system that
generates navigation instructions incrementally by keeping track of the user's location,
but the user has to ask for the next instruction when he reaches a junction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
DeepMap is a system that interacts with the user to improve positioning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It
asks users whether they can see certain landmarks, and based on their answers
improves the user's GPS position estimate. However, in many such current
systems, interactions happen through the use of GUI elements such as drop-down
lists and buttons, and not by using speech interaction. The Edinburgh Augmented
Reality System (EARS) was a prototype system that presented point of interest
information to users based on visibility [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In contrast to these earlier systems our objective is to present navigational,
point-of-interest and amenity information in an integrated way using Natural
Language dialogue, with users interacting eyes-free and hands-free through a
headset connected to a smartphone.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Architecture</title>
      <p>
        The architecture of the current system is shown in gure 1. The server side
consists of a dialogue interface (parser, interaction manager, and generator), a
City Model, a Visibility Engine, a QA server and a Pedestrian tracker.
The dialogue interface consists of a speech recogniser, an utterance parser, an
Interaction Manager and an utterance generator. The speech recognition module
recognises the user's utterance from the user's speech input. The utterance parser
translates user utterances in to meaning representations called dialogue acts.
The Interaction Manager is the central component of this architecture, which
provides the user navigational instructions and interesting PoI information. It
receives the user's input in the form of a dialogue act and the user's location
in the form of latitude and longitude information. Based on these inputs and
the dialogue context, it responds with system output dialogue act (DA), based
on a dialogue policy. The utterance generator is a natural language generation
module that translates the system DA into surface text, using the Open CCG
toolkit [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
3.2
      </p>
      <sec id="sec-2-1">
        <title>Pedestrian tracker</title>
        <p>
          Global Navigation Satellite Systems (GNSS) (e.g. GPS, GLONASS) provide a
useful positioning solution with minimal user side setup costs, for location aware
applications. However urban environments can be challenging with limited sky
views, and hence limited line of sight to the satellites, in deep urban corridors.
There is therefore signi cant uncertainty about the user's true location reported
by GNSS sensors on smartphones [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This module improves on the reported user
position by combining smartphone sensor data (e.g. accelerometer) with map
matching techniques, to determine the most likely location of the pedestrian [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>The output includes a robust street centreline location, and a candidate space
showing the probability of the user's more exact position (e.g. pavement
location). This module ensures that any GNSS-reported location placing the user at
a rooftop location would be corrected to the most likely ground level location,
taking into consideration user trajectory history and map matching techniques.
User orientation is inferred from their trajectory.
3.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>City Model</title>
        <p>The City Model is a spatial database containing information about thousands of
entities in the city of Edinburgh. These data have been collected from a variety of
existing resources such as Ordnance Survey, OpenStreetMap and the Gazetteer
for Scotland. It includes the location, use class, name, street address, and where
relevant other properties such as build date. The model also includes a pedestrian
network (streets, pavements, tracks, steps, open spaces) which can be used to
calculate minimal cost routes, such as the shortest path.
3.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Visibility Engine</title>
        <p>
          This module identi es the entities that are in the user's vista space [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. To do this
it accesses a digital surface model, sourced from LiDAR, which is a 2.5D
representation of the city including buildings, vegetation, and land surface elevation.
The visibility engine uses this dataset to o er a number of services, such as
determining the line of sight from the observer to nominated points (e.g. which
junctions are visible), and determining which entities within the city model are
visible. A range of visual metrics are available to describe the visibility of
entities, such as the eld of view occupied, vertical extent visible, and the facade
area in view. These metrics can be then used by the interaction manager to
generate e ective Natural Language navigation instructions. E.g. \Walk towards
the castle", \Can you see the tower in front of you?", \Turn left after the large
building on your left after the junction" and so on.
3.5
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Question-Answering server</title>
        <p>
          The QA server currently answers a range of Natural Language de nition
questions. E.g., \Tell me more about the Scottish Parliament", \Who was David
Hume?", etc. QA identi es the entity focused on in the question using
machinelearning techniques [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and then proceeds to a textual search on texts from
the Gazetteer of Scotland and Wikipedia, and de nitions from WordNet glosses.
Candidates are reranked using a trained con dence score with the top candidate
used as the nal answer. These are usually long, descriptive answers and are
provided in spoken output as a ow of sentence chunks that the user can
interrupt. This information can also be o ered by the system when a salient entity
appears in the user's viewshed.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>User interface</title>
      <p>Users can interact with the system using a smartphone client that communicates
with the system via the 3G network. The client is an Android app running on
the user's mobile phone. It consists of two parts: the user's position tracker
and the interaction module. The position tracker module senses user's position
(latitude and longitude) and accelerometer readings. This information is sent to
the system. The interaction module captures the user's speech input and relays
it to the system. It also receives the system's utterances, which then is converted
in to speech using the Android text-to-speech service.</p>
      <p>
        We also built a web-based user interface to support the development of the
system modules. It allows web-users to interact with our system from their
desktops. It uses Google Street View to allow users to simulate pedestrian walking.
An interaction panel lets the user interact with the system using Natural
Language text or speech input. The system's utterances are synthesized using the
Cereproc text-to-speech engine and presented to the user. For a detailed
description of this component, please refer to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. A demonstration of this system will
be presented at [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Future work</title>
      <p>There are many remaining challenges in this research area for discussion, for
instance:
{ interleaving question-answering and navigation dialogue in a coherent
manner;
{ optimising the action selection of the dialogue interface (i.e. what to say next
in the conversation), using machine learning techniques similar to [13{15];
{ robustly handling the uncertainty generated by GPS sensors, speech
recognition, and ambiguity of Natural Language interaction itself;
{ generating useful referring expressions (e.g. the church on your left with the
spire) which combine spatial and visual information;
{ evaluating this system with real pedestrian users (this phase of the project
is imminent).
The research has received funding from the European Community's Seventh
Framework Programme (FP7/2007-2013) under grant agreement no. 270019
(SPACEBOOK project http://www.spacebook-project.eu/).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kray</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laakso</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elting</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coors</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Presenting route instructions on mobile devices</article-title>
          .
          <source>In: Proceedings of IUI 03</source>
          ,
          <string-name>
            <surname>Florida.</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bartie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackaness</surname>
          </string-name>
          , W.:
          <source>D3</source>
          .
          <article-title>4 pedestrian position tracker</article-title>
          .
          <source>Technical report, The SPACEBOOK Project (FP7/2011-2014 grant agreement no. 270019)</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geldof</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prost</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>CORAL : Using Natural Language Generation for Navigational Assistance</article-title>
          .
          <source>In: Proceedings of ACSC2003</source>
          , South Australia. (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Richter</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duckham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Simplest instructions: Finding easy-to-describe routes for navigation</article-title>
          .
          <source>In: Proceedings of the 5th Intl. Conference on Geographic Information Science</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Malaka</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zipf</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deep Map - challenging IT research in the framework of a tourist information system</article-title>
          .
          <source>In: Information and Communication Technologies in Tourism 2000</source>
          , Springer (
          <year>2000</year>
          )
          <volume>15</volume>
          {
          <fpage>27</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bartie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackaness</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Development of a speech-based augmented reality system to support exploration of cityscape</article-title>
          .
          <source>Transactions in GIS 10</source>
          (
          <year>2006</year>
          )
          <volume>63</volume>
          {
          <fpage>86</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajkumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards Broad Coverage Surface Realization with CCG</article-title>
          .
          <source>In: Proc. of the UCNLG+MT workshop</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zandbergen</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbeau</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Positional accuracy of assisted gps data from high-sensitivity gps-enabled mobile phones</article-title>
          .
          <source>Journal of Navigation</source>
          <volume>64</volume>
          (
          <issue>3</issue>
          ) (
          <year>2011</year>
          )
          <volume>381</volume>
          {
          <fpage>399</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Montello</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Scale and multiple psychologies of space</article-title>
          . In Frank, A.U.,
          <string-name>
            <surname>Campari</surname>
          </string-name>
          , I., eds.:
          <article-title>Spatial information theory: A theoretical basis for GIS</article-title>
          . (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikhailian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalmas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinchuk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Learning foci for question answering over topic maps</article-title>
          .
          <source>In: Proceedings of ACL</source>
          <year>2009</year>
          .
          <article-title>(</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Janarthanam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemon</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A web-based evaluation framework for spatial instruction-giving systems</article-title>
          .
          <source>In: Proc. of ACL</source>
          <year>2012</year>
          , South Korea. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Janarthanam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemon</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bartie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackaness</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalmas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goetze</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Integrating location, visibility, and question-answering in a spoken dialogue system for pedestrian city exploration</article-title>
          .
          <source>In: Proc. of SIGDIAL</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Janarthanam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemon</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Learning Adaptive Referring Expression Generation Policies for Spoken Dialogue Systems</article-title>
          .
          <source>In: Empirical Methods in Natural Language Generation</source>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Janarthanam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemon</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Learning to adapt to unknown users: referring expression generation in spoken dialogue systems</article-title>
          . In:
          <article-title>Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (</article-title>
          <year>2010</year>
          )
          <volume>69</volume>
          {
          <fpage>78</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Rieser</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemon</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Reinforcement Learning for Adaptive Dialogue Systems: a Data-driven Methodology for Dialogue Management and Natural Language Generation</article-title>
          .
          <source>Theory and Applications of Natural Language Processing</source>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>