Event-based Synchronization of Model-Based Multimodal
                       User Interfaces
                         Marco Blumendorf, Sebastian Feuerstack, Sahin Albayrak
                                  DAI-Labor, Technische Universität Berlin
                              Franklinstrasse 28/29, D-10587 Berlin, Germany
                  [Marco.Blumendorf, Sebastian.Feuerstack, Sahin.Albayrak]@dai-labor.de

ABSTRACT                                                                    available in smart environments. In combination with the
Smart environments utilize computers as tools supporting the user           emergence of new devices supporting different interaction
in his daily life, moving interaction with computers from a single          modalities (pen-based input, voice-, mouse-, touch-, and gesture-
system to a complex, distributed environment. User interfaces               based interaction) this offers new interaction possibilities
available in this environment need to adapt to the specifics of the         allowing the user to choose the most feasible device for a specific
various available devices and are distributed across several                task. The simultaneous availability of these capabilities also
devices at the same time. A problem arising with distributed user           allows the combination of multiple devices and modalities,
interfaces is the required synchronization of the different parts. In       increasing the available communication bandwidth to interact
this paper we present an approach allowing the event-based                  with the computer system. However, the dynamic distribution of
synchronization of distributed user interfaces based on a multi-            user interfaces that is required for this kind of interaction is a task
level user interface model. We also describe a runtime system we            facing several technical problems. The device independent
created, allowing the execution of model-based user interface               description and the decomposition of user interfaces are currently
descriptions and the distribution of user interfaces across various         tackled by several model-based approaches [3][4][10][2],
devices and modalities using channels established between the               researching for new ways to define user interfaces in a device
system and the end devices.                                                 independent manner. Such a system is required to dynamically
                                                                            adapt to changes in the environment to support flexible human-
                                                                            computer interaction, allowing the user to change, add and
Categories and Subject Descriptors                                          remove interaction devices according to the executed task.
H.5 [Information Interfaces and Presentation]: User interfaces;             Distributing user interfaces in such a manner requires the
D.2.2 [Software Engineering]: Design Tools and Techniques-                  coordination of the different presentations and the resulting input
User Interfaces; H.1.2 [Models and Principles]: User/Machine                from the user. A mechanism to synchronize the views and update
Systems-Human factors; H.5.2 [Information Interfaces and                    the presentation is needed, as well as a mechanism allowing the
Presentation]: User Interfaces-graphical user interfaces,                   interpretation of the user input. The system has to assure that the
interaction styles, input devices and strategies, voice I/O.                different views are consistent and provide a usable view of the
                                                                            system.
General Terms                                                               In this paper we present an approach supporting multimodal
Design, Human Factors                                                       human-computer interaction allowing the user to increase
                                                                            interaction capabilities and expressiveness by dynamically
Keywords                                                                    combining multiple modalities. The coordination of the different
                                                                            parts of the user interface takes place via event propagation
Multimodal interaction, user interface model, distributed user
                                                                            through a multi-level model-based user interface as defined in the
interfaces, synchronization, ubiquitous computing, smart
                                                                            Cameleon reference architecture proposed in [2]. An
environments
                                                                            implementation of the approach is described, based on our
                                                                            runtime environment for model-based multimodal user interfaces,
1. INTRODUCTION                                                             supporting event-based coordination, the Multi-Access Service
The ever increasing processing power of current personal                    Platform (MASP).
computers supports the development of increasingly complex
                                                                            The remainder of this paper is structured as follows. In section 2
applications. The focus is moved from interacting with the
                                                                            we present the related work in this area. Section 3 describes our
computer to utilizing the computer as a tool supporting users in
                                                                            approach to multi-level event propagation allowing the
solving everyday problems. Computer systems increasingly move
                                                                            coordination of distributed user interfaces. Afterwards we
to the background and change to silent servants ubiquitously
                                                                            describe our implementation of the Multi-Access Service
                                                                            Platform, incorporating a first prototype of the approach. We
Permission to make digital or hard copies of all or part of this work for
                                                                            conclude with a summary and outlook in the final section.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy    2. RELATED WORK
otherwise, or republish, to post on servers or to redistribute to lists,    Common authoring approaches rely on model-based mechanisms
requires prior specific permission and/or a fee.                            such as [8][10][6] and use transformations on different levels of
                                                                            abstraction to generate multi-modal user interfaces. The basis for
MDDAUI’06, October 2, 2006, Genova, Italy.
                                                                            most approaches is a task tree notation based on the Concurrent
Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.
Task Tree notation [7]. Most of the current approaches focus on
the definition of models for the creation of user interfaces at
design time, but there are also ongoing efforts to realize the user
interface generation using a model interpreter at runtime [5][6] to
dynamically adapt to the interaction capabilities offered by the
connected modalities.
Most multimodal approaches we are aware of render the user
interface model to a single multimodal final user interface
definition like XHMTL+Voice as it is done described in [10] for
example. These approaches are limited to single devices, handling
internally the synchronization of input and output-modalities.
Other frameworks offering comprehensive multi-modal user
interfaces such as [1] concentrate on specific environments like
cockpit control or on multi-modal interaction with an avatar [9].
These approaches have a strong focus on specific domains and are      Figure 1: Hierarchical multilevel event propagation using the
usually connecting the supported modalities closely together, as                   Cameleon Reference Architecture
all participating modalities are known in advance.
                                                                      interface presentations for specific devices, the abstraction can be
The availability of dynamic environments, providing a                 used to interpret user input events communicated bottom-up.
combination of devices unknown at design time requires                Input events issued by the FUI are propagated step by step
approaches allowing the dynamic derivation of user interfaces at      through the user interface levels being semantically enriched, to
runtime and their distribution and fission based on the available     allow their interpretation on the abstract and conceptual level of
devices [11]. In most cases bridging different technical standards    the model. In the same way events from the FUI are abstracted it
to connect all devices available in smart environments is still a     is also possible to use the reification to derive output messages,
challenge, especially when devices are dynamically selected and a     updating the specific presentation from abstract events resulting
synchronization of the distributed user interface is required.        from the user input interpretation. The combination of the two
The various approaches allow the definition of user interfaces that   mechanisms allows the coordination of the different parts of the
can be delivered to different devices as well as the design of        distributed user interface based on event propagation mechanisms.
distributed user interfaces. However, it is yet unclear how the       The fact that events are either directly interpreted by the specific
dynamic (re-) distribution of user interfaces at runtime and the      layer or propagated to the next layer without directly affecting it
coordination of the distributed parts can be realized in detail. In   avoids event conflicts occurring when different FUIs receive
the next section, we present our approach to dynamic                  conflicting input. This allows recognizing and handling conflicts
coordination of distributed multimodal user interfaces.               at the affected layer before lower layers have been altered. Figure
                                                                      1 depicts the model responsible for the creation of the distributed
                                                                      FUI, which spans a tree across the different levels of abstraction.
3. MULTILEVEL EVENT PROPAGATION                                       This entails that the different final user interfaces share a common
Our approach to the synchronization of distributed user interfaces
                                                                      root node allowing the synchronization of the different
is based on a messaging mechanism, allowing the propagation of
                                                                      representations via the propagation of events through this root
events through a multi level user interface model. The model our
                                                                      node.
approach is based on incorporates the following levels:
conceptual level, abstract UI, concrete UI, and final UI, as          As illustrated in Figure 1 an event fired by user interaction (for
proposed by the cameleon reference framework in [2]. The              instance moving the mouse over a widget) is first processed by
different levels of the user interface model and the mappings         the final user interface and mapped to a concrete interaction
between the levels refine the presentation of the UI when moving      object (CIO). The platform specific “onmouseover” event could
from the conceptual model to the final user interface (FUI) and       thus be transformed to a more abstract focus event on the concrete
add semantic meaning to the presented elements when moving            UI model (1). This abstraction involves looking up the CIO that
from the FUI to the conceptual model. Final user interfaces           has been associated to the widget that fired the “onmouseover”
(FUIs) are thereby generated by top-down reification                  event. In our approach each CIO knows its parent abstract
mechanisms, refining the presentation information based on the        interaction object (AIO) on the next abstraction level (whereas the
different abstraction levels of the model. As we focus on smart       AIO does not know all its derived CIOs). Before the event is
environments we target the combination of multiple interaction        propagated to the abstract UI layer, it is abstracted to a “focus”
devices to simultaneously access one application. The actual view     event and associated to an AIO (2). On the next level, the abstract
to the system is thus defined by a set of generated FUIs,             UI processes the event and relates it to the task model of the user
distributed across multiple devices with each FUI being adapted       interface (3). A specific task receiving the focus, results in a
to the specific capabilities of the device. This system of            “setfocus” event, propagated the same path backwards, as all final
distributed FUIs forms a highly dynamic and complex                   user interfaces displaying the element now have to be notified
environment that requires the synchronization of the different        about the changed focus. During this top-down event propagation
parts at runtime.                                                     (reification) the “setfocus” event issued from the task level is
                                                                      propagated to the derived abstract UIs (4). On the AUI level, the
In our approach we realize the required synchronization via the
                                                                      events are related to the involved AIOs and then further
propagation of messages through the defined user interface
                                                                      propagated to the CUI level (5). Here the events are again mapped
model. In the same way the reification can be used to derive user
                                                                      to the associated CIOs and interpreted depending on the targeted
 Figure 2: Synchronization via coordination topics of loosely
         coupled connections in MASP architecture
output modality. Finally the adapted events are delivered to the
FUI level (6.1+6.2), where they result in an update of the specific      Figure 3: The graphical user interface of the cooking aid
presentation. In a visual modality, the “setfocus” event              interface model. To provide a general abstraction layer from the
could result in a highlighting of a widget, whereas in a voice-       multiple events that can be fired by the final user interface (i.e.
based modality, the event could result in a speech output.            onmouseover, onmouseout, onclick, onblur, etc in HTML) we
To evaluate the described event propagation mechanism and the         introduce three types of interaction events: focus, input and
classification of the events for the dynamic coordination of          output. Focus events have a navigational nature, covering events
distributed user interfaces we developed a runtime system for a       that do not change the status of the system, but the status of the
model based user interface. Based on a task tree model the event      current view of the system. Input events have an interaction
abstraction and reification mechanisms as well as an event            nature, covering selection and text input triggered by the user.
classification are combined to provide multimodal user interfaces,    Output events are events issued by the system to adapt the
allowing the multimodal usage of web-based application via            presentation of the user interface to the current status of the
multiple interaction channels that can be added and removed           system. They allow to synchronize FUIs when the presentation
independently at runtime. In the following section we describe        changes. A FUI presented on an end-device can issue focus or
our implementation of the Multi-Access Service Platform,              input events, whenever a user interaction occurs and receive
allowing accessing an application described by a user interface       output events when the presentation has to be updated. The
model via various channels.                                           mapping of FUI specific events to a supported interaction event is
                                                                      provided by the channel, managing the communication with the
                                                                      specific FUI. The interaction channel thus provides a device and
4. THE MULTI-ACCESS SERVICE                                           modality abstraction, introducing a common interaction
PLATFORM                                                              mechanism. In our implementation we are using Java Messaging
The Multi-Access Service Platform (MASP) has been realized as         System (JMS)-based messaging, allowing the flexible distribution
a framework allowing event-based synchronization of distributed       of messages to the affected system components. Events received
user interfaces based on a hierarchical user interface model. The     through an interaction channel are propagated to the backend
delivery of final user interfaces and messages is realized via        model through a number of topics, allowing the classification of
connections to devices supporting two-way client-server               the received events and their appropriate distribution.
communication (Figure 2).
                                                                      In addition to the interaction events we defined additional events
Connections established between the MASP and any interaction          on the level of the task model (task done, task disabled, task
device accessing the MASP realize event-based, two-way client-        enabled) to communicate changes on the task level. We define
server communication, by abstracting from the underlying              that specific input events can be mapped to task done events by
communication mechanism (i.e. HTTP). In our understanding a           the abstract user interface. Task enabled and disabled events are
connection is a way to describe the communication with a specific     mapped to output events, taking care that the specific task
device, acting as a container combining different communication       presentations are shown or hidden from the specific FUIs. In our
channels to abstract from the device specifics. A communication       implementation the task model is defined using the Concurrent
channel is part of a connection and responsible for the one-way       Task Tree notation [7], interpreted at runtime.
communication of events. We distinguish between input channels,
                                                                      Using connections the runtime system can be dynamically
providing user input events to the system and output channels
                                                                      connected to various devices by setting up the required channels.
allowing the manipulation of the FUI via output events. Each
                                                                      Once a channel is set up, the system can render a final user
channel provides eventing capabilities and is connected to
                                                                      interface for the channel and deliver it to the device. A user
different topics, allowing the classification of events.
                                                                      interface can be distributed across multiple devices and modalities
Interaction with the system takes place via events fired by the       when multiple channels are available. A mechanism of sending
user interface through the channels. These events are processed       updates to the presented user interfaces via output events allows
by our system and delivered to the affected parts of the user         the redistribution and adaptation of user interfaces when new
                                                                      device enter or leave the interaction environment.
5. The Virtual Cook                                                   7. ACKNOWLEDGMENTS
As an example, demonstrating the usability of our framework, we       We thank the German Federal Ministry of Economics and
created a Virtual Cook, presenting a cooking aid, showing the         Technology for supporting our work as part of the Service Centric
required steps to support the user when preparing a meal. Figure 3    Home project in the "Next Generation Media" program.
shows the graphical user interface of the virtual cook. As a person
usually doesn’t have the hands free for using mouse and keyboard
                                                                      8. REFERENCES
during cooking, we equipped the Virtual Cook with a voice based
                                                                      [1] Bouchet, J.; Nigay, L. & Ganille, T. (2004),ICARE software
interface, which can support the control of the visual output.
                                                                          components for rapidly developing multimodal interfaces, in
Besides the possibility to dynamically add and remove a voice
                                                                          'ICMI '04: Proceedings of the 6th international conference on
channel when using the application we also added support for a
                                                                          Multimodal interfaces', ACM Press, New York, NY, USA,
gesture recognition channel. The voice channel is realized via
                                                                          pp. 251-258.
SIP-based communication, allowing a loose coupling of the voice
channel. To connect the gesture channel we created an interface       [2] Calvary, G., Coutaz, J., Thevenin, D., Limbourg, Q.,
defining five gestures for navigation in the cooking aid user             Bouillon, L. and Vanderdonckt, J. A Unifying Reference
interface (back, forward, up, down and step done). This interface         Framework for Multi-Target User Interfaces. Interacting
can be delivered to a gesture recognition device we build                 with Computers 15, 3 (2003), 289–308.
ourselves, extending the possible interaction modalities via          [3] Coninx, K.; Luyten, K.; Vandervelpen, C.; den Bergh, J.V.
gesture-based interaction.                                                & Creemers, B. (2003),Dygimes: Dynamically Generating
Our implementation of the virtual cook application using the              Interfaces for Mobile Computing Devices and Embedded
MASP framework to realize an enhanced multimodal user                     Systems., in 'Mobile HCI', pp. 256-270.
interface allows the distribution of the user interface across        [4] Eisenstein, J.; Vanderdonckt, J. & Puerta, A.R. (2001),
various modalities based on the availability of the devices. The          Applying model-based techniques to the development of UIs
connection abstraction allows us to dynamically add and remove            for mobile computers, in 'Intelligent User Interfaces', pp. 69-
devices from the environment which results in interaction                 76.
modalities being added/removed to/from the application.
                                                                      [5] Klug, T. & Kangasharju, J. (2005), Executable Task Models,
                                                                          in 'Proceedings of TAMODIA 2005', ACM Press, Gdansk,
6. CONCLUSION                                                             Poland, pp. 119-122.
In this paper we introduced the event-based synchronization of
distributed user interfaces, based on a hierarchical user interface   [6] Mori, G.; Paterno;, F. & Santoro, C. (2003), Tool support for
model, defining the different aspects of the UI on multiple levels        designing nomadic applications, in 'IUI '03: Proceedings of
of abstraction. The framework we presented allows processing of           the 8th international conference on Intelligent user
user input events and synchronization of dynamically distributed          interfaces', ACM Press, New York, NY, USA, pp. 141--148.
user interfaces. A connection-based communication mechanism           [7] Paterno, F (1999), Model-based Design and Evaluation of
combining multiple channels can be used to communicate with               Interactive Applications. Springer Verlag. Berlin 1999.
the user via multiple modalities.
                                                                      [8] Paterno, F. & Giammarino, F. (2006), Authoring interfaces
However, in our work we focused on the extension of a primary             with combined use of graphics and voice for both stationary
modality with additional redundant interaction capabilities. The          and mobile devices, in 'AVI '06: Proceedings of the working
extension of the approach to support any mixture of modalities as         conference on Advanced visual interfaces', ACM Press, New
well as the usage of complementary user interfaces requires more          York, NY, USA, pp. 329-335.
research considering the elimination of ambiguous events and the
                                                                      [9] Reithinger, N.; Alexandersson, J.; Becker, T.; Blocher, A.;
fusion of multipart events.
                                                                          Engel, R.; Löckelt, M.; Müller, J.; Pfleger, N.; Poller, P.;
We also did not set a strong focus on the rendering of user               Streit, M. & Tschernomas, V. (2003), SmartKom: adaptive
interfaces for the different modalities from one common model,            and flexible multimodal access to multiple applications, in
but rather annotated a task tree with user interfaces for the             'ICMI '03: Proceedings of the 5th international conference on
different supported modalities.                                           Multimodal interfaces', ACM Press, New York, NY, USA,
In the future, we also want to support more gesture and voice             pp. 101-108.
commands and a more flexible definition of the user interface         [10] Stanciulescu, A.; Limbourg, Q.; Vanderdonckt, J.; Michotte,
model, considering the different interaction styles of the                 B. & Montero, F. (2005), A transformational approach for
modalities in a more appropriate way.                                      multimodal web user interfaces based on UsiXML, in 'ICMI
The presented approach provides an event-based mechanism                   '05: Proceedings of the 7th international conference on
incorporating the multi-level structure of model-based user                Multimodal interfaces', ACM Press, New York, NY, USA,
interfaces to coordinate distributed user interfaces. However, the         pp. 259-266.
presented implementation is still not complete and can be             [11] Vandervelpen, C. and Coninx, K. Towards model-based
extended towards better support for the new possibilities provided         design support for distributed user interfaces. In Proceedings
by multimodal human-computer interaction in smart                          of the Third Nordic Conference on Human-Computer
environments.                                                              interaction (Tampere, Finland, October 23 - 27, 2004).
                                                                           NordiCHI '04, vol. 82. ACM Press, New York, NY, 61-70.