<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decoupling of Modality Integration and Interaction Design for Multimodal Human-Robot Interfaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mathieu Vallee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominik Ertl</string-name>
          <email>ertlg@ict.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Roles in Multimodal UI Design</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vienna University of Technology Institute of Computer Technology Vienna</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The development of a multimodal Human-Robot Interface (HRI) involving mixed-initiative and context-awareness is complex and laborious. The integration of individual modalities (e.g., gesture recognition or speech output) and the design of natural human-robot interaction are two di erent tasks that both require their own expertise. In this paper, we consider three di erent roles that participate in the development of a multimodal User Interface (UI): the interaction designer, the modality integrator, and the multimodal UI designer. We present how the decoupling between these roles is facilitated by tools based on interaction modeling. We nally discuss how the decoupling is bene cial for introducing mixed-initiative and context-awareness in multimodal HRI. Previous work shows that it is common to separate the task of UI generation from that of the application development [1]. We expect that the development of a multimodal UI is laborious enough to be split up in di erent tasks as well. So, we identify three di erent roles for a stronger decoupling of the needed tasks in UI generation . Figure 1 depicts the roles and their interactions:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>{ The interaction designer de nes the modality-independent interaction
between a human and a robot. It is responsible for de ning the UI behaviour.
{ The modality integrator provides a particular modality (or group of
modalities), like speech input. This requires either adapting and con guring existing
toolkits, or developing a new modality. Additionally, this role is responsible
for de ning the physical properties of the integrated modality.
{ The multimodal UI designer considers the interplay between the modalities.</p>
      <p>This role considers the physical properties of the several integrated
modalities in order to realize the desired interaction.
GUI,
Speech,
Gesture,
...</p>
      <p>Interaction Designer
Modality Integrator
what we want to do
what we have</p>
      <p>MMUI</p>
      <p>Designer</p>
      <p>In current practice and in existing systems mostly a single person (or group
of persons) realizes the task of the three roles with no clear separation of
responsibilities. Often, the design of the UI is directed either towards demonstrating a
particular modality or towards a single interaction description. In the rst case,
the risk is to limit interaction to what is supported by this modality, without
considering usability. In the second case, the risk is to highly tailor the
multimodal UI to the intended interaction, so limiting robustness and reusability. As
a result, it is still di cult to understand the respective advantages of modalities
(when to use a given modality) as well as the generic patterns governing the
interplay between modalities (how to combine modalities for better usability).
Furthermore, the role of the multimodal interface designer is particularly di
cult, since it requires a good understanding of both the interaction design and
the physical properties of particular modalities.</p>
      <p>We propose to use supportive tools for the design of multimodal UIs. In
particular, there is a strong need for: (i) languages to express the desired
interactions, (ii) languages to express the physical properties of individual modalities,
(iii) and tools for facilitating the mapping between desired interaction
descriptions and physical properties of individual modalities.</p>
      <p>
        A potential approach is the platform presented in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that uses a discourse
model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and a communication platform for semi-automatic multimodal UI
generation.
      </p>
      <p>First, an interaction designer models the desired interaction scenarios as
formal discourses between a human and a computer. The interaction is de ned on
a modality-independent high-level and supports modeling of mixed-initiative as
well. At the same time, e.g., a modality provider couples speech input to the
platform. For example, a freely available speech toolkit like Julius 1 is manually
integrated into the platform so that the platform can use the speech recognition
functionality of the toolkit. Finally, the multimodal UI designer couples the
dis1 http://julius.sourceforge.jp-visitedatthetimeofthiswriting
course and the available modalities. The discourse model provides basic units
of communication that are derived from the speech act theory. These are the
intention, like an Informing, and the so-called propositional content. The
propositional content is comparable to the meaning of the verb and the object in a
simple English sentence. An example for a propositional content is
getNameOfUser. Both, the intention and the propositional content form a basic unit
of communication. The multimodal UI designer de nes the pairs of intentions
and propositional contents that a given modality can express. For example, the
designer decides if speech input supports an Informing-getNameOfUser or not.
This is a mapping of modality speci c representations (e.g., a speech grammar or
a set of gesture recognition symbols) into a more generic representation based on
communicative acts. So, the multimodal UI designer \programs" the platform,
extended with individual modalities (plugins), in order to realize the desired
interaction.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Introducing Mixed-Initiative and Context-Awareness</title>
      <p>The decoupled roles facilitate the de nition of more natural multimodal HRI.
In particular, the decoupling enables each role to concentrate on its main task.
While the modality integrator concentrates on issues related to an individual
modality (e.g., the performance of a speech recognition engine), the interaction
designer focuses on more natural interaction for users.</p>
      <p>
        For example, the interaction designer has to consider mixed-initiative and
context awareness when a semi-autonomous service robot performs a task jointly
with humans in a real-world environment. Mixed-initiative allows both, the
human and the robot, to initiate the interaction. However, this often requires a
way for the robot to attract the attention of the user. Speech and movement of
the robot serve this purpose well, while GUI su ers from limitations (it needs to
be visible to the user). A robot that attracts the attention of its user by moving
towards products in a supermarket is described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. With context-awareness,
the robot takes its own context into account and the context of the user during
the interaction. This is particularly useful when the robot has the capability to
choose the right interaction modality depending on the context. Decoupling of
the roles enables to delay the selection of a particular modality at runtime, so
the robot can consider a speci c context.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Discussion and Open Questions</title>
      <p>
        The proposed approach has been studied successfully for the development of
a multimodal UI of a semi-autonomous shopping robot with GUI, speech and
gesture [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Previous work demonstrates interaction scenarios involving
mixedinitiative [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as well.
      </p>
      <p>Despite these initial results, some open questions remain and are subject to
future work. Regarding the interaction designer role, the interaction language
a ects potential applications. The proposed discourse model focuses on
processoriented applications, like the one's for shopping, and may not be suitable for
other types of applications. Further evaluation whether the interaction designer
can really design \good" discourses without in-depth knowledge about modalities
is under way. Regarding the modality integrator role, the e ort for integrating a
modality depends on the modality's type. For example, a transformation process
for a GUI requires model-to-model transformations and is far more laborious
than the barcode reader's integration where only a few method calls have to be
programmed.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we introduce a distinction between three roles involved in the
design of multimodal UIs. This approach allows a better decoupling between tasks
that require di erent expertise. Tools based on discourse modeling appropriately
support this decoupling and facilitate communication between roles. We discuss
how this decoupling and the accompanying tools facilitate the design of a more
natural human-robot interface and point out the relationship to mixed-initiative
and context-awareness. Although future works is necessary for evaluating the
simplicity of interaction design and modality integration, this approach already
enabled the development of a multimodal UI of a shopping robot.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This research has been carried out in the CommRob project (http://www.
commrob.eu) and is partially funded by the EU (contract number IST-045441
under the 6th framework programme).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. de Baar,
          <string-name>
            <given-names>D.J.M.J.</given-names>
            ,
            <surname>Foley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.</given-names>
            ,
            <surname>Mullet</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.E.</surname>
          </string-name>
          :
          <article-title>Coupling application design and user interface design</article-title>
          .
          <source>In: CHI '92: Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          , New York, NY, USA, ACM (
          <year>1992</year>
          )
          <volume>259</volume>
          {
          <fpage>266</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ertl</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Semi-automatic generation of multimodal user interfaces</article-title>
          .
          <source>In: EICS'09: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems</source>
          , New York, NY, USA, ACM (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bogdan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaindl</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavaldjian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popp</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horacek</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arnautovic</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szep</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Generating an abstract user interface from a discourse model inspired by human communication</article-title>
          .
          <source>In: Proceedings of the 41th Annual Hawaii International Conference on System Sciences (HICSS-41)</source>
          , Piscataway, NJ, USA, IEEE Computer Society Press (
          <year>January 2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kaindl</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bogdan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Multimodal communication involving movements of a robot</article-title>
          . In: CHI '
          <article-title>08 extended abstracts on human factors in computing systems</article-title>
          , New York, NY, USA, ACM (
          <year>2008</year>
          )
          <volume>3213</volume>
          {
          <fpage>3218</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vallee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burger</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ertl</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lerasle</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falb</surname>
          </string-name>
          , J.:
          <article-title>Improving user interfaces of interactive robots with multimodality</article-title>
          .
          <source>In: Proceedings of the International Conference on Advanced Robotics (ICAR</source>
          <year>2009</year>
          ).
          <source>(June</source>
          <year>2009</year>
          )
          <volume>1</volume>
          {
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>