<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking Speech Understanding in Service Robotics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Vanzo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Iocchi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Nardi</string-name>
          <email>nardig@dis.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphael Memmesheimer</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dietrich Paulus</string-name>
          <email>paulusg@uni-koblenz.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Ivanovska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerhard Kraetzschmar</string-name>
          <email>gerhard.kraetzschmarg@h-brs.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bonn-Rhein-Sieg University of Applied Sciences</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sapienza University of Rome</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Koblenz-Landau</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Speech understanding is a fundamental feature for many applications focused on human-robot interaction. Although many techniques and several services for speech recognition and natural language understanding have been developed in the last years, speci c implementation and validation on domestic service robots have not been performed. In this paper, we describe the implementation and the results of a functional benchmark for speech understanding in service robotics that has been developed and tested in the context of di erent robot competitions: RoboCup@Home, RoCKIn@Home and within the European Robotics League on Service Robots. Di erent approaches used by the teams in the competitions are presented and the evaluation results obtained in the competitions are discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>speech recognition</kwd>
        <kwd>speech understanding</kwd>
        <kwd>service robots</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Robots are expected to support human activities in everyday scenarios, by
interacting with di erent kinds of users. In particular, domestic robots (i.e. robots
operating in our homes) have already entered the market. Examples are
cleaning robots, tele-presence robots and assistive robots for elderly care. In these
contexts, the interaction with the user plays a key role. For this reason, the
importance of enabling untrained users to interact with personal robots has
increased. The goal of the research in Human-Robot Interaction (HRI) is to
realize robotic systems that exhibit a natural and e ective interaction with users.
Therefore, robots should be provided with sensory systems able to understand
and replicate human communication, such as speech, gestures, voice intonation,
pragmatic interpretation, and any other non-verbal interaction. Interaction, by
de nition, requires communication. Humans usually communicate by means of
natural language, which can be considered one of the most e ective vehicles
of interaction. In this respect, the aim of Human-Robot Interaction in Natural
Language is to develop robots that are able to solve human language references
in the application context they belong. Competition is how humans improve
themselves and push forward their abilities. Recently, the robotic research is
facing the problem of sharing a common platform and common methodologies
to quantitatively compare the di erent approaches of a particular task.</p>
      <p>This paper addresses the problem of benchmarking robot speech
understanding through scienti c competitions. In particular, we will focus on service
robots operating in a home environment and on service robot competitions, i.e.,
RoboCup@Home4 and other ones derived by it. The goal of such a benchmark
is to measure and evaluate the performances of the speech understanding
capability of a general robotic platform, as well as to create a common workspace
of discussion on the topic. More speci cally, we will describe the Functional
Benchmark on Speech Understanding (FBM3), performed (in di erent forms)
during the RoCKIn, RoboCup@Home, and European Robotics League Service
Robots (ERL-SR) competitions, along with results and con gurations of the
recent benchmark that took place in the ERL-SR Local Tournament5 in Peccioli
on January 2017. Three teams participated: (i) SPQReL team, joint team
between Sapienza University of Rome (Italy) and University of Lincoln (UK), (ii)
b-it-bots@home team at Bonn-Rhein-Sieg University of Applied Sciences
(Germany), and (iii) homer@uniKoblenz team at University of Koblenz and Landau
(Germany).</p>
      <p>The paper is structured as follows. In Section 2 we introduce the FBM3,
along with the used performance metrics. Section 3 focuses on the di erent
solutions compared in the competition and on the corresponding results. Finally,
in Section 4 we analyze these results and draw some conclusions.
2</p>
      <p>
        A Functional benchmark for speech understanding in
Service Robotics
This functional benchmark aims at evaluating the ability of a robot to
understand speech commands that a user gives in a home environment. A list of
commands are selected among the set of prede ned recognizable commands,
i.e., commands that the robot should be able to recognize within the tasks of
the competition or in similar situations. For this competition, the audio les
have been randomly extracted from the HuRIC corpus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a resource that
contains speaker utterances in the home robotic domain. Only commands that meet
the requirements of the task have been chosen. Each implemented system is
expected to interpret the provided audio les, producing an output according to
a suitably de ned representation. Such a representation, inspired by the Frame
Semantics [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], has to respect a command/arguments structure, where each
argument is instantiated according to the arguments of the command evoking verb.
It is referred to as Command Frame Representation (CFR) (e.g. \go to the living
room" will correspond to MOTION(goal : \living room00)).
4 http://www.robocupathome.org/
5 https://sites.google.com/a/dis.uniroma1.it/erl-sr-peccioli/
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Input provided</title>
      <p>
        For the generation of the output, teams are provided with a knowledge base
(Frame Knowledge Base, FKB) containing a set of semantic frames, in line
with [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Each frame corresponds to an action that the robot is supposed to
perform or, in general, to a robot command. The FKB contains a description of
each frame, in terms of allowed arguments (e.g. destination for a Motion
command), their names and additional information on how to model the activated
frame into the CFR. The list of frames and related arguments is the following:
{ Motion: The action performed by the robot itself of moving from one position
to another, occasionally specifying a speci c path followed during the motion.
The starting point is always taken as the current position of the robot.
      </p>
      <p>Goal: The nal position in the space to be occupied at the end of the
motion action.</p>
      <p>Path: The trajectory followed while performing the motion towards the
Goal.
{ Searching : The action of inspecting an environment or a general location,
with the aim of nding a speci c entity.</p>
      <p>Theme: The entity (most of the time an object) to be searched during
the searching action.</p>
      <p>Ground: The environment or the general location in the space where
to search for the Theme.
{ Taking : The action of removing an entity from one place, so that the entity
is in robot possession.</p>
      <p>Theme: The entity (typically an object) taken through the action.
Source: The location occupied by the Theme before the action is
performed and from which the Theme is removed.
{ Bringing : The action of changing the position of an entity in the space from
a location to another.</p>
      <p>Theme: The entity (typically an object), being carried during the
bringing action.</p>
      <p>Goal: The endpoint of the path along which the carrier (e.g. the robot
- and thus the Theme) travels
Source: The beginning of the path along which the carrier (e.g. the
robot - and thus the Theme) travels</p>
      <p>Composition of actions is also possible in the CFR, corresponding to more
complex action as the Pick and place action, represented by a sequence of Taking
frame followed by a Bringing frame (e.g. for the command \take the box and bring
it to the kitchen").
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Scoring</title>
      <p>During the functional benchmark, di erent aspects of the speech understanding
process will be assessed:
1. The Word Error Rate and the Speech Recognition Accuracy on the
transcription of the user utterances, in order to evaluate the performance of the speech
recognition process. While the former counts the errors made in transcribing
the speech signal, in terms of words, the latter focuses on the transcription
of the whole command.
2. For the generated CFR, the performance of the system is evaluated against
the provided gold standard version of the CFR, that is conveniently paired
with the transcription. Two di erent performances will be evaluated at this
step. One measuring the ability of the system in recognizing the main action,
called Action Classi cation (AC), and one related to the recognition of the
full command, that is Full Command Recognition (FCR). AC is carried out
in term of Precision, Recall and F-Measure, while FCR is measured through
Accuracy. For the AC these measures are de ned as follow:
{ Precision: the percentage of correctly tagged frames among all the frames
tagged by the system;
{ Recall : the percentage of correctly tagged frames with respect to all the
gold standard frames;
{ F-Measure: the harmonic mean between Precision and Recall.
For the FCR, the accuracy is the percentage of correctly interpreted
commands.</p>
      <p>The nal rank of the teams is evaluated considering the FCR. If this score
will be the same for two or more teams, the WER will be used as penalty to
evaluate the nal ranking.
3</p>
      <p>Di erent approaches
The task of robotic spoken commands understanding involves two main
subtasks: the speech recognition step, in which a speech audio signal is processed
and transcribed, and the natural language understanding phase, in which the
actual interpretation of the sentence is extracted. Both of them can be carried
out jointly, by following orthogonal approaches or relying on o -the-shelf tools.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Transcribing Speech</title>
      <p>
        The robustness of Automatic Speech Recognition in domain-speci c settings has
been addressed in several works. For example, in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], a joint model of the speech
recognition process and language understanding task is proposed. Such model
results in a re-ranking framework aiming at modeling aspects of the two tasks
at the same time.
      </p>
      <p>
        There are several works in which the combination of free-form ASR engines
and grammar based systems are exploited. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] two di erent ASR systems
work together sequentially: the rst is grammar-based and it is constrained by
the rule de nitions, while the second is a free-form ASR, that is not subject to
any constraint. Their approach focuses on the acceptance of the results of the
rst recognizer. In case of rejection, the second recognizer is activated. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a
robust ASR for robotic application is proposed. The system aims at exploiting
the combination of a Finite State Grammar (FSG) and an n-gram based ASR
to reduce false positive detections. Speci cally, a hypothesis produced by the
FSG-based decoder is accepted whenever it matches some hypotheses within the
n-best list of the n-gram based decoder. A similar approach is the one proposed
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], where a multi-pass decoder is used to overcome the limitations of a single
ASR. The FSG is used to produce the most likely hypothesis. Then, the n-gram
decoder produces an n-best list of transcriptions. Finally, if the best hypothesis
of the FSG decoder matches with at least one transcription among the n-best,
then the sentence is accepted. Many o -the-shelf tool for ASR are also available
on the market, all of them o ering valuable performances in terms of accuracy
and usability. Google Speech API 6 is probably one the most widespread, due to
its availability on mobile devices. The Microsoft Bing Speech API 7 o ers, along
with the speech transcription service, even the voice authentication. API.AI 8
allows to recognize the intent of a sentence, that is provided with the speech
transcription. The Nuance VoCon9 is an o ine ASR that allows to customize
the recognition on the desired domain, by relying on the de nition of speci c
vocabularies and grammars.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Understanding Robotic Commands</title>
      <p>
        For understanding a robotic command, one of the simplest solution is to rely
on keywords or templates that aim at catching the semantic elements for the
targeted task [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For instance, the CFR parser is based on the simple rules
de ned by the RoCKIn rulebook10. First, a verb corresponding to a prede ned
frame is discovered. Next, the attributes of the frame, e.g., location, object,
bene ciary, etc are found. This utilizes lists of prede ned values for each attribute.
Finally, the results are composed into the required strings and written into a text
le. This approach is simple to be implemented and integrated into a robotic
architecture. More sophisticated approaches are based on the de nition of
syntactic grammars, that are often augmented through semantic attachments [
        <xref ref-type="bibr" rid="ref4 ref5">4,
5</xref>
        ]. However, such approach is limited by the grammar and can be di cult to
extend. The solutions that recently receive more consensus within the
community of Natural Language Processing are based on statistical Machine Learning
and data-driven analysis of the addressed linguistic phenomena [
        <xref ref-type="bibr" rid="ref11 ref6">6, 11</xref>
        ]. Among
them, LU4R11 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] - adaptive spoken Language Understanding for(4) Robots
has been developed by the Semantic Analytics Group at the University of Roma
Tor Vergata and the LabRoCoCo Group at Sapienza University of Rome. It
is a publicly available tool to parse robotic commands, in the context of
service Robotics. LU4R is based on the model that has been proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
6 https://cloud.google.com/speech/
7 https://www.microsoft.com/cognitive-services/en-us/speech-api
8 https://api.ai/
9
http://www.nuance.co.uk/for-business/speech-recognition-solutions/voconhybrid/index.htm
10 https://www.rockinrobotchallenge.eu/rockin d2.1.3.pdf
11 http://sag.art.uniroma2.it/lu4r.html
Spoken Language Understanding (SLU) process is driven by Machine Learning
techniques, that allow to generalize the addressed phenomena and improve the
robustness against unseen sentences.
The comparative performance analysis (reported in Table 1) of di erent
combinations of techniques for speech understanding reported in this paper allows to
determine a realistic expected performance in understanding typical commands
issued to a domestic service robot.
      </p>
      <p>The results have been obtained by spoken audio acquired during robot
competitions, like RoboCup@Home, from di erent people and in di erent
environmental conditions. These data thus contains all the typical noises a ecting robot
competitions on service robots and real scenarios. Consequently, the results
provide a realistic assessment of typical performance in this task. The presented
benchmark for speech understanding in service robotics has been a useful tool
for such an evaluation and is available for further comparisons.</p>
      <p>The results presented in this paper outline two interesting observations.
First, free-form automatic speech recognition systems signi cantly outperform
grammar-based approaches. This is probably due to the di culty of speech
grammars to cover all the possible linguistic phenomena, in terms of lexicon and
syntactic rules. In fact, specially when dealing with spoken language, the sentences'
structure is often unpredictable. Second, machine learning-based methods for
command understanding seem to be more robust and reliable, as they are able
to further generalize the lexicon and to cope with possible minor transcription
errors. More speci cally, the combination between Google ASR and LU4R
consistently provided for the best results in several di erent runs, that are encouraging
for its deployment in real situations.</p>
      <p>Although results are generally positive, we are still far away from a full
understanding of the commands. Future work will include additional studies
on this topic in order to further improve the performance and, to this end, we
believe that robot competitions and benchmarks and the joint e ort of di erent
research groups will signi cantly contribute to achieve this goal.
We want to thank Nuance Communications for sponsoring academic licenses
that have been used for the experiments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>The berkeley framenet project</article-title>
          .
          <source>In: Proceedings of ACL and COLING</source>
          . pp.
          <volume>86</volume>
          {
          <issue>90</issue>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bastianelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellucci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>HuRIC: a human robot interaction corpus</article-title>
          .
          <source>In: Proceedings of the 9th edition of the Language Resources and Evaluation Conference</source>
          . pp.
          <volume>4519</volume>
          {
          <fpage>4526</fpage>
          . Reykjavik, Iceland (may
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bastianelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanzo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A discriminative approach to grounded spoken language understanding in interactive robotics</article-title>
          .
          <source>In: Proceedings of the Twenty-Fifth International Joint Conference on Arti cial Intelligence</source>
          ,
          <source>IJCAI</source>
          <year>2016</year>
          , New York, NY, USA,
          <fpage>9</fpage>
          -
          <issue>15</issue>
          <year>July 2016</year>
          . pp.
          <volume>2747</volume>
          {
          <issue>2753</issue>
          (
          <year>2016</year>
          ), http://www.ijcai.org/Abstract/16/390
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bos</surname>
          </string-name>
          , J.:
          <article-title>Compilation of uni cation grammars with compositional semantics to speech recognition packages</article-title>
          .
          <source>In: Proceedings of the 19th International Conference on Computational Linguistics - Volume 1</source>
          . pp.
          <volume>1</volume>
          {
          <issue>7</issue>
          . COLING '
          <volume>02</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2002</year>
          ), http://dx.doi.org/10.3115/1072228.1072323
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oka</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A spoken language interface with a mobile robot</article-title>
          .
          <source>Arti cial Life and Robotics</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <volume>42</volume>
          {
          <fpage>47</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
          </string-name>
          , R.J.:
          <article-title>Learning to interpret natural language navigation instructions from observations</article-title>
          .
          <source>In: Proceedings of the 25th AAAI Conference on AI</source>
          . pp.
          <volume>859</volume>
          {
          <issue>865</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Doostdar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schi er</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>Lakemeyer</surname>
          </string-name>
          , G.:
          <article-title>A Robust Speech Recognition System for Service-Robotics Applications</article-title>
          , pp.
          <volume>1</volume>
          {
          <fpage>12</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2009</year>
          ), http://dx.doi.
          <source>org/10.1007/978-3-642-02921-9 1</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          :
          <article-title>Frames and the semantics of understanding</article-title>
          .
          <source>Quaderni di Semantica</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <volume>222</volume>
          {
          <fpage>254</fpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wermter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards robust speech recognition for human-robot interaction</article-title>
          .
          <source>In: Proceedings of the IROS2011 Workshop on Cognitive Neuroscience Robotics (CNR)</source>
          . pp.
          <volume>23</volume>
          {
          <issue>28</issue>
          (
          <year>September 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Levit</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buntschuh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Garbage modeling with decoys for a sequential recognition scenario</article-title>
          .
          <source>In: ASRU</source>
          . pp.
          <volume>468</volume>
          {
          <fpage>473</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2009</year>
          ), http://dblp.unitrier.de/db/conf/asru/asru2009.html
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>MacMahon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stankiewicz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuipers</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Walk the talk: connecting language, knowledge, and action in route instructions</article-title>
          .
          <source>In: proceedings of the 21st national conference on Arti cial intelligence - Volume</source>
          <volume>2</volume>
          . pp.
          <volume>1475</volume>
          {
          <fpage>1482</fpage>
          . AAAI'
          <fpage>06</fpage>
          , AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Morbini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Audhkhasi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artstein</surname>
            , R., Van Segbroeck,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagae</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgiou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Traum</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A reranking approach for recognition and classi - cation of speech input in conversational dialogue systems</article-title>
          .
          <source>In: Spoken Language Technology Workshop (SLT)</source>
          ,
          <year>2012</year>
          IEEE. pp.
          <volume>49</volume>
          {
          <issue>54</issue>
          (Dec
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Perera</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>M.M.:</given-names>
          </string-name>
          <article-title>Handling complex commands as service robot task requests</article-title>
          .
          <source>In: Proceedings of the Twenty-Fourth International Joint Conference on Arti cial Intelligence</source>
          ,
          <source>IJCAI</source>
          <year>2015</year>
          ,
          <string-name>
            <given-names>Buenos</given-names>
            <surname>Aires</surname>
          </string-name>
          , Argentina,
          <source>July 25-31</source>
          ,
          <year>2015</year>
          . pp.
          <volume>1177</volume>
          {
          <issue>1183</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>