<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Talking Robots</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ph.D. student at Department of Civil Engineering and Computer Science Engineering Tor Vergata University of Rome, Italy Research associate at Department of Computer, Control, Management Engineering Sapienza University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the last years robotic platforms have appeared in many research and everyday life contexts. An easy way of interacting with them has then become a necessity. Human Robot Interaction is the research field that aims at studying how robots can interact with humans in the most natural way. In this work we will present preliminary studies that we have been done in this direction, focusing on Natural Language based interaction, with particular attention to the grounding problem. In particular, we will study how Statistical Machine Learning techniques can be applied to Natural Language as it is used to interact with robots. Moreover, we will also investigate how this approach can be integrated in such complex systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Robots are slowly becoming part of everyday life, as they are being marketed
for commercial applications (viz. telepresence, cleaning or entertainment). As a
consequence, the ability to interact with a non-expert user is becoming a key
requirement. The Human Robot Interaction (HRI) field aims at realizing robotic
systems that o↵er a level of interaction the more natural as possible. This means
providing robots with sensory systems capable of understanding and replicating
human languages, such as speech, gestures, voice intonation, pragmatic
interpretation, and any other non-verbal interaction. The ultimate goal of this research
area is to provide robots with the ability of solving human languages references
in the application context they belong, such as the real world (e.g. assigning the
right coordinates to the phrase the kitchen) or an abstract world (e.g. solving
anaphoric references in the domain of the discourse). This cognitive process, that
is natural and implicit among human beings, is commonly called grounding.</p>
      <p>Our research will investigate the problems related to the natural language
analysis involved in the design of HRI systems. To this aim, we will explore
the possibility of reusing approaches that have been largely applied in di↵erent
Natural Language Understanding (NLU) tasks and testing their applicability
in the HRI field. In particular, we will focus on finding a bridge between the
linguistic knowledge expressed in spoken commands and the robot representation
of the world as a support for the grounding process.</p>
    </sec>
    <sec id="sec-2">
      <title>Motivations and Background Works</title>
      <p>Among the di↵erent kinds of interaction treated by HRI, we will focus on those
aspects involving natural language. User utterances can be recognized and
transcribed by Automatic Speech Recognition systems (ASR) that in the last years
have become more and more accessible and powerful. The main issue is that in
order to translate user utterances into robotic actions, we need to understand
their meaning. For instance, from the sentence “take the bottle on the table”, we
need to provide the command corresponding to the action of taking . Moreover,
we need to identify the relation holding between the bottle and the table. This
semantic information can be crucial as well to ground linguistic expressions into
objects as they are represented in the robot set of beliefs (i.e. robot knowledge
and perceptions).</p>
      <p>
        To fill the gap between the robot world representation and the linguistic
knowledge expressed in user utterances, we need to extract the meaning from
a sentence and represent it in a suitable form. Grammar-based ASR systems
often o↵er the possibility to attach semantic primitives to each grammar rule.
The meaning representation is obtained as the composition of all the primitives
explored during the decoding. Such approach has been largely adopted in the
robotic field, as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Grammars indeed have the limit of covering just a
segment of the language. If we want to realize more general HRI systems, and thus
to cover a wider range of linguistic phenomena, we need to rely on free-form
speech SR engines. Unfortunately, this kind of systems do not provide any kind
of additional information, besides the plain transcription of utterances. A
representation of their meaning can be only obtained by an external semantic parsing
process. Natural Language Processing (NLP) approaches based on formal
languages have found wide application in the HRI field, e.g. semantic parsing with
Combinatory Cathegorial Grammars (CCG), as in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where a way of obtaining
a meaning representation based on Discourse Representation Structures [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
directly from the speech recognition is used. Similarly, in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], CCGs are used to
produce a representation in term of Hybrid Logics Dependency Semantics [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
logic form. However, the overall attention has recently shifted towards the
application of Statistical Learning techniques, reflecting the will of designing more
general solutions. Several fields of research have shown a growing interest in HRI,
giving the chance to apply these techniques in this area. Experts with di↵erent
backgrounds proposed their own approach, mainly coming from Robotics,
Computational Linguistics and Cognitivism.
      </p>
      <p>
        The problem of grounding natural language symbols into robot
representations of the world has been mostly explored in developing system for tasks as
Human Augmented Mapping or able to follow route instructions. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], a
simulated robot system called MARCO able to follow route instruction in a virtual
environment is presented. Here spoken commands are parsed using compound
action specifications to model which actions to take under which conditions. These
structures capture the commands in route instructions by modeling the surface
meaning of a sentence as a verb-argument structure, and are obtained after a
natural language processing chain. This work has been continued and extended
in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where Statistical Learning has been applied to learn how to map
commands in the corresponding logical form-like structure. This represent the robot
instruction that can be directly executed and implicitly resolves the grounding of
all the entities. The work in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] proposes a system that learns to follow
navigational natural language directions by apprenticeship from routes through a map
paired with English descriptions. Reinforcement learning algorithm is applied to
determine what portions of the language describe which aspects of the route.
      </p>
      <p>
        Other works have been inspired by novel spatial semantic theories. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] the
problem of executing natural language directions is formalized through Spatial
Description Clauses (SDCs), a structure that can hold the result of the spatial
semantic parsing in terms of spatial roles. The same representation has been
exploited in a subsequent work [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], where the probabilistic graphical model
theory is applied to parse each instruction into a structure called Generalized
Grounding Graph (G3). Here the SDCs are used as a base component of the
more general structure of the G3, that represents both semantic and syntactic
information of the spoken sentence. In some cases, the construction of the
representation is taken into account as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where the robot learns the features
of the environment, through the use of narrated guided tours. In this work, the
robot builds both the metrical and topological representations of the
environment during the tour. Spatial and semantic information are then associated to
the evolving situations through events labeling, that occur during a tour, and
are later attached to the nodes of a topological graph.
      </p>
      <p>
        However, the approaches proposed so far have only taken into account
single aspects (e.g. deep analysis of solving spatial relations [
        <xref ref-type="bibr" rid="ref14 ref21">14, 21</xref>
        ]) of the overall
linguistic analysis necessary to realize a complete grounding process. The
complexity of the problem is higher and is well described in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Here, it is stated
that a complete natural language HRI system should be able to: (i ) react in the
same time frame of a human; (ii ) process all stages of language processing in a
concurrent way; (iii ) own the capability of understanding spoken language; (iv )
decode multi-modal cues, such as linguistic expressions accompanied by gestures;
(v ) share the perspective on the world and on events with its interlocutors; (vi )
start interaction to support bidirectional communications. All these features can
constrain possible interpretations of the language, biasing the grounding process.
It arises that the level of natural language analysis therefore needed is high and
complex, as di↵erent informations, corresponding to di↵erent levels of semantics,
need to be extracted and provided to the system. Moreover, this results in a
sophisticated interaction schema among the system modules (e.g. NLU processors,
inference engines over knowledge bases, perception systems).
      </p>
      <p>The need to re-elaborate the problem from this point of view is being
perceived by the community. Complex architectures have been already realized for
tasks such as Question Answering, where the cooperation of structured NLP
modules and other processors is fundamental. In order to maximize the
replicability and adaptability, we argue that similar approaches should be followed
in the implementation of HRI interfaces. One of our purposes is to study the
applicability of robust NLP techniques that have been already adopted for other
tasks.</p>
      <p>Following this direction, as a basic step of our research we contributed to
the development of a prototype robot for Human-Augmented Mapping that is
being used for experimental purpose. The data gathered during the experiments
will be used in this research. In the meanwhile, a corpus of spoken commands
is being collected using a web interface. It contains audio files paired with the
corresponding transcriptions. Each transcription is annotated according to
different semantic formalisms, describing the linguistic knowledge we want to
capture. This corpus should become a useful resource for several tasks, e.g. training
specific learning algorithms.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Theories and Methods</title>
      <p>An hypothetical NLU processing chain of a HRI system has to deal with audio
processing and transcription, meaning understanding and dialogue management.
The first module consists in ASR engine. To improve the grounding process,
the module can be extended with the capability of detecting the source of the
speech. This could assist, in fact, the reference point identification of certain
spatial expressions (e.g. “the door on my right ”). Morphological analysis and
syntactic parsing are performed during the second step, as they can add crucial
information for further semantic processing. This latter is the core of the NLU
chain. Di↵erent semantic parsers can be used in parallel or in cascade, as the
information generated by one such parser can be useful for others. During this
step, the modules can also require an interaction with external resources, such
as Linguistic Thesaurus or Knowledge Bases. This might be useful in discarding
unlikely interpretations and consequently leading the system to consider other
hypotheses from the ASR. Finally, the utterances should be enriched with all the
meaning representations needed to correctly ground it in the robot set of beliefs.
Dialogical interaction can be managed by a dialogue system that interacts with
each step of the process.</p>
      <p>
        In our research, we will mainly focus on the semantic analysis part of the
chain. In fact, while robust tools for ASR (e.g. Microsoft Speech Platform, Google
Speech API or CMUSphix [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]) and for morpho-syntactic analysis (e.g. Stanford
CoreNLP [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) are available, semantic parsing must be designed from scratch.
Although semantic processors exist, they are not always free and, more
important, they o↵er just one level of semantic analysis. As stated in Section 2, we
need di↵erent levels of information; consequently, our HRI system should rely
on several semantic parsers. We need then to define which are the aspects of
the world we want to model through semantic analysis. First, in order to be
useful a robot is expected to perform the actions corresponding to the received
commands; second, these actions take place in an physical environment.
Looking back at linguistic theories that studied how these two aspects are conveyed
through linguistic knowledge, we found that Frame Semantics [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Holistic
Spatial Semantics [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] o↵er models of interpretation suitable for our purpose.
The first generalizes actions or, more generally, experiences representing them
as Semantic Frames. Each frame describes a scene or the general concept behind
an action, enriched by a set of semantic arguments that play specific roles with
respect to the frame. Robot actions can then be linked with the semantic frame
corresponding to that action. For example, in the sentence “take the book on the
table”, the semantic frame related to the action of Taking is evoked by the verb
take. The semantic role Theme (i.e. the entity taken during the Taking action) is
here expressed and represented by the book on the table. Similarly, Holistic
Spatial Semantics explains the spatial referring expressions contained in sentences
in terms of spatial relations composed by spatial roles. Considering the previous
example, the words book and table are related through the preposition on, that
holds the spatial relation and plays the role of Spatial indicator, while the
other two are respectively the Trajector and the Landmark. These two
representations can collaborate to model the sentence meaning in a complete way.
One more issue to be addressed is that these representations are not designed
to work together, so further research about a formalism that should act as a
general-purpose semantic container representation need to be done.
      </p>
      <p>
        In the first step of this research many of the aspects so far reported have
been individually examined, and solution based on novel NLP techniques have
been proposed. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] we propose a re-ranking approach to get the best speech
transcription from the set of di↵erent hypotheses produced by an ASR
system. The ranking function is learned through a Support Vector Machine (SVM )
exploiting a combination of di↵erent kernels capturing syntactic and semantic
aspects of the utterances (e.g. Smoothed Partial Tree Kernels [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Moreover, the
linguistic problem of extracting semantic representations from natural language
expressions has been proposed in tasks as Semantic Role Labeling (SRL)[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and
Spatial Role Labeling (SpRL)[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. We developed SRL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and SpRL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] systems
that model the problem as a sequential labeling task, exploiting specific
formulations of SVMs, as SV M Multiclass [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and SV M Hmm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These systems have
not yet been used together. Their application in a HRI architecture, using the
robotic prototype we developed, deserves further investigation.
      </p>
      <p>Another aspect our research wants to explore is the use of dialogical
mechanisms to improve the grounding process. Getting the meaning of a sentence
may be insucient to correctly ground the linguistic references. In fact, these
can refer to objects or positions in the real world as well as entities in the
abstract domain of the discourse (e.g. anaphoric references). Providing the robot
with a more complex level of interaction, such as the ability to ask for
clarifications about ambiguous expressions, can improve the grounding capability of
the robot. Similarly, the system could use dialogue to learn user-specific
linguistic references, such as new terms or particular ways of calling objects, or new
syntactic forms. This dialogue would be exploited to update the general
knowledge of the robot, adding new concepts in a knowledge base or feeding the ASR
grammars with new rules. In our robotic platform, we started modeling the
dialogue with Petri-Net Plans. They can drive the overall behavior of the robot, by
managing the interaction among all the modules, including the NLU chain. The
integrated representation of dialogue and robot actions is another issue that we
intend to address in our research.</p>
      <p>We are aware that this proposal is a starting point for an analysis that will
be wider and longer. Among all those aspects that should be examined and
modeled in a HRI system, we took into account only the two (i.e. actions and
spatial references) we considered fundamental to our ends. Future researches
might investigate the study of temporal relations expressed in natural language.
In parallel, we want to investigate and foster the reuse of robust NLP
solutions in a field where single aspects of the problem have been explored, without
converging to a common point.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsochantaridis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Hidden Markov support vector machines</article-title>
          .
          <source>In: Proceedings of the ICML</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastianelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellucci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perera</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Kernel-based discriminative re-ranking for spoken command understanding in hri</article-title>
          .
          <source>In: Proceedings of Ai*iA '13</source>
          . p. to appear (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bastianelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basili</surname>
          </string-name>
          , R.:
          <article-title>Unitor-hmm-tk: Structured kernelbased learning for spatial role labeling</article-title>
          .
          <source>In: Proceedings of SemEval-2013</source>
          . Atlanta, Georgia, USA (
          <year>June 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bos</surname>
          </string-name>
          , J.:
          <article-title>Compilation of unification grammars with compositional semantics to speech recognition packages</article-title>
          .
          <source>In: COLING</source>
          (
          <year>2002</year>
          ), http://dblp.uni-trier.de/ db/conf/coling/coling2002.html#Bos02
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oka</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A spoken language interface with a mobile robot</article-title>
          .
          <source>Artificial Life and Robotics</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <fpage>42</fpage>
          -
          <lpage>47</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
          </string-name>
          , R.J.:
          <article-title>Learning to interpret natural language navigation instructions from observations</article-title>
          .
          <source>In: Proceedings of AAAI '11</source>
          . pp.
          <fpage>859</fpage>
          -
          <lpage>865</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellucci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastianelli</surname>
          </string-name>
          , E.:
          <article-title>Structured learning for semantic role labeling</article-title>
          .
          <source>Intelligenza Artificiale</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>163</fpage>
          -
          <lpage>176</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basili</surname>
          </string-name>
          , R.:
          <article-title>Structured lexical similarity via convolution kernels on dependency trees</article-title>
          .
          <source>In: EMNLP</source>
          . pp.
          <fpage>1034</fpage>
          -
          <lpage>1046</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Curran</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bos</surname>
          </string-name>
          , J.:
          <article-title>Linguistically motivated large-scale nlp with c&amp;c and boxer</article-title>
          .
          <source>In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions</source>
          . pp.
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
          . Association for Computational Linguistics, Prague, Czech Republic (
          <year>June 2007</year>
          ), http://www.aclweb.org/anthology/P07-2009
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          :
          <article-title>Frames and the semantics of understanding</article-title>
          .
          <source>Quaderni di Semantica</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>222</fpage>
          -
          <lpage>254</lpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hemachandra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kollar</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Following and interpreting narrated guided tours</article-title>
          .
          <source>In: Proceedings of ICRA '11</source>
          . Shanghai, China (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finley</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.N.</given-names>
          </string-name>
          :
          <article-title>Cutting-plane training of structural SVMs</article-title>
          .
          <source>Machine Learning</source>
          <volume>77</volume>
          (
          <issue>1</issue>
          ),
          <fpage>27</fpage>
          -
          <lpage>59</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Accurate unlexicalized parsing</article-title>
          .
          <source>In: Proceedings of ACL'03</source>
          . pp.
          <fpage>423</fpage>
          -
          <lpage>430</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kollar</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tellex</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
          </string-name>
          , N.:
          <article-title>Toward understanding natural language directions</article-title>
          .
          <source>In: Proceedings of the 5th ACM/IEEE</source>
          . pp.
          <fpage>259</fpage>
          -
          <lpage>266</lpage>
          . HRI '10, IEEE Press, Piscataway, NJ, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kordjamshidi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Otterlo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moens</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          :
          <article-title>Spatial role labeling: Towards extraction of spatial relations from natural language</article-title>
          .
          <source>ACM Trans. Speech Lang. Process</source>
          .
          <volume>8</volume>
          (
          <issue>3</issue>
          ), 4:
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          :
          <fpage>36</fpage>
          (Dec
          <year>2011</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/2050104.2050105
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Kruij↵,
          <string-name>
            <surname>G.J.M.:</surname>
          </string-name>
          <article-title>A Categorial-Modal Logical Architecture of Informativity: Dependency Grammar Logic &amp; Information Structure</article-title>
          .
          <source>Ph.D. thesis, Faculty of Mathematics and Physics</source>
          , Charles University, Prague, Czech Republic (
          <year>April 2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Kruij↵,
          <string-name>
            <given-names>G.J.M.</given-names>
            ,
            <surname>Zender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Jensfelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.I.</surname>
          </string-name>
          :
          <article-title>Situated dialogue and spatial organization: What, where</article-title>
          . . . and why?
          <source>International Journal of Advanced Robotic Systems</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>MacMahon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stankiewicz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuipers</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Walk the talk: connecting language, knowledge, and action in route instructions</article-title>
          .
          <source>In: Proceedings of AAAI '06</source>
          . pp.
          <fpage>1475</fpage>
          -
          <lpage>1482</lpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gildea</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
          </string-name>
          , N.:
          <article-title>Semantic Role Labeling</article-title>
          .
          <source>Synthesis Lectures on Human Language Technologies</source>
          , Morgan &amp; Claypool Publishers (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Scheutz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cantrell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schemerhorn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Toward humanlike task-based dialogue processing for human robot interaction</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>34</volume>
          (
          <issue>4</issue>
          ),
          <fpage>64</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Tellex</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kollar</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dickerson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walter</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
          </string-name>
          , N.:
          <article-title>Approaching the symbol grounding problem with probabilistic graphical models</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>34</volume>
          (
          <issue>4</issue>
          ),
          <fpage>64</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Vogel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Learning to follow navigational directions</article-title>
          .
          <source>In: Proceedings of ACL '10</source>
          . pp.
          <fpage>806</fpage>
          -
          <lpage>814</lpage>
          . Association for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamere</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwok</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raj</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gouvea</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woelfel</surname>
          </string-name>
          , J.:
          <article-title>Sphinx-4: A flexible open source framework for speech recognition (</article-title>
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zlatev</surname>
          </string-name>
          , J.:
          <article-title>Spatial semantics</article-title>
          .
          <source>Handbook of Cognitive</source>
          Linguistics pp.
          <fpage>318</fpage>
          -
          <lpage>350</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>