Event-based Synchronization of Model-Based Multimodal User Interfaces Marco Blumendorf, Sebastian Feuerstack, Sahin Albayrak DAI-Labor, Technische Universität Berlin Franklinstrasse 28/29, D-10587 Berlin, Germany [Marco.Blumendorf, Sebastian.Feuerstack, Sahin.Albayrak]@dai-labor.de ABSTRACT available in smart environments. In combination with the Smart environments utilize computers as tools supporting the user emergence of new devices supporting different interaction in his daily life, moving interaction with computers from a single modalities (pen-based input, voice-, mouse-, touch-, and gesture- system to a complex, distributed environment. User interfaces based interaction) this offers new interaction possibilities available in this environment need to adapt to the specifics of the allowing the user to choose the most feasible device for a specific various available devices and are distributed across several task. The simultaneous availability of these capabilities also devices at the same time. A problem arising with distributed user allows the combination of multiple devices and modalities, interfaces is the required synchronization of the different parts. In increasing the available communication bandwidth to interact this paper we present an approach allowing the event-based with the computer system. However, the dynamic distribution of synchronization of distributed user interfaces based on a multi- user interfaces that is required for this kind of interaction is a task level user interface model. We also describe a runtime system we facing several technical problems. The device independent created, allowing the execution of model-based user interface description and the decomposition of user interfaces are currently descriptions and the distribution of user interfaces across various tackled by several model-based approaches [3][4][10][2], devices and modalities using channels established between the researching for new ways to define user interfaces in a device system and the end devices. independent manner. Such a system is required to dynamically adapt to changes in the environment to support flexible human- computer interaction, allowing the user to change, add and Categories and Subject Descriptors remove interaction devices according to the executed task. H.5 [Information Interfaces and Presentation]: User interfaces; Distributing user interfaces in such a manner requires the D.2.2 [Software Engineering]: Design Tools and Techniques- coordination of the different presentations and the resulting input User Interfaces; H.1.2 [Models and Principles]: User/Machine from the user. A mechanism to synchronize the views and update Systems-Human factors; H.5.2 [Information Interfaces and the presentation is needed, as well as a mechanism allowing the Presentation]: User Interfaces-graphical user interfaces, interpretation of the user input. The system has to assure that the interaction styles, input devices and strategies, voice I/O. different views are consistent and provide a usable view of the system. General Terms In this paper we present an approach supporting multimodal Design, Human Factors human-computer interaction allowing the user to increase interaction capabilities and expressiveness by dynamically Keywords combining multiple modalities. The coordination of the different parts of the user interface takes place via event propagation Multimodal interaction, user interface model, distributed user through a multi-level model-based user interface as defined in the interfaces, synchronization, ubiquitous computing, smart Cameleon reference architecture proposed in [2]. An environments implementation of the approach is described, based on our runtime environment for model-based multimodal user interfaces, 1. INTRODUCTION supporting event-based coordination, the Multi-Access Service The ever increasing processing power of current personal Platform (MASP). computers supports the development of increasingly complex The remainder of this paper is structured as follows. In section 2 applications. The focus is moved from interacting with the we present the related work in this area. Section 3 describes our computer to utilizing the computer as a tool supporting users in approach to multi-level event propagation allowing the solving everyday problems. Computer systems increasingly move coordination of distributed user interfaces. Afterwards we to the background and change to silent servants ubiquitously describe our implementation of the Multi-Access Service Platform, incorporating a first prototype of the approach. We Permission to make digital or hard copies of all or part of this work for conclude with a summary and outlook in the final section. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy 2. RELATED WORK otherwise, or republish, to post on servers or to redistribute to lists, Common authoring approaches rely on model-based mechanisms requires prior specific permission and/or a fee. such as [8][10][6] and use transformations on different levels of abstraction to generate multi-modal user interfaces. The basis for MDDAUI’06, October 2, 2006, Genova, Italy. most approaches is a task tree notation based on the Concurrent Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00. Task Tree notation [7]. Most of the current approaches focus on the definition of models for the creation of user interfaces at design time, but there are also ongoing efforts to realize the user interface generation using a model interpreter at runtime [5][6] to dynamically adapt to the interaction capabilities offered by the connected modalities. Most multimodal approaches we are aware of render the user interface model to a single multimodal final user interface definition like XHMTL+Voice as it is done described in [10] for example. These approaches are limited to single devices, handling internally the synchronization of input and output-modalities. Other frameworks offering comprehensive multi-modal user interfaces such as [1] concentrate on specific environments like cockpit control or on multi-modal interaction with an avatar [9]. These approaches have a strong focus on specific domains and are Figure 1: Hierarchical multilevel event propagation using the usually connecting the supported modalities closely together, as Cameleon Reference Architecture all participating modalities are known in advance. interface presentations for specific devices, the abstraction can be The availability of dynamic environments, providing a used to interpret user input events communicated bottom-up. combination of devices unknown at design time requires Input events issued by the FUI are propagated step by step approaches allowing the dynamic derivation of user interfaces at through the user interface levels being semantically enriched, to runtime and their distribution and fission based on the available allow their interpretation on the abstract and conceptual level of devices [11]. In most cases bridging different technical standards the model. In the same way events from the FUI are abstracted it to connect all devices available in smart environments is still a is also possible to use the reification to derive output messages, challenge, especially when devices are dynamically selected and a updating the specific presentation from abstract events resulting synchronization of the distributed user interface is required. from the user input interpretation. The combination of the two The various approaches allow the definition of user interfaces that mechanisms allows the coordination of the different parts of the can be delivered to different devices as well as the design of distributed user interface based on event propagation mechanisms. distributed user interfaces. However, it is yet unclear how the The fact that events are either directly interpreted by the specific dynamic (re-) distribution of user interfaces at runtime and the layer or propagated to the next layer without directly affecting it coordination of the distributed parts can be realized in detail. In avoids event conflicts occurring when different FUIs receive the next section, we present our approach to dynamic conflicting input. This allows recognizing and handling conflicts coordination of distributed multimodal user interfaces. at the affected layer before lower layers have been altered. Figure 1 depicts the model responsible for the creation of the distributed FUI, which spans a tree across the different levels of abstraction. 3. MULTILEVEL EVENT PROPAGATION This entails that the different final user interfaces share a common Our approach to the synchronization of distributed user interfaces root node allowing the synchronization of the different is based on a messaging mechanism, allowing the propagation of representations via the propagation of events through this root events through a multi level user interface model. The model our node. approach is based on incorporates the following levels: conceptual level, abstract UI, concrete UI, and final UI, as As illustrated in Figure 1 an event fired by user interaction (for proposed by the cameleon reference framework in [2]. The instance moving the mouse over a widget) is first processed by different levels of the user interface model and the mappings the final user interface and mapped to a concrete interaction between the levels refine the presentation of the UI when moving object (CIO). The platform specific “onmouseover” event could from the conceptual model to the final user interface (FUI) and thus be transformed to a more abstract focus event on the concrete add semantic meaning to the presented elements when moving UI model (1). This abstraction involves looking up the CIO that from the FUI to the conceptual model. Final user interfaces has been associated to the widget that fired the “onmouseover” (FUIs) are thereby generated by top-down reification event. In our approach each CIO knows its parent abstract mechanisms, refining the presentation information based on the interaction object (AIO) on the next abstraction level (whereas the different abstraction levels of the model. As we focus on smart AIO does not know all its derived CIOs). Before the event is environments we target the combination of multiple interaction propagated to the abstract UI layer, it is abstracted to a “focus” devices to simultaneously access one application. The actual view event and associated to an AIO (2). On the next level, the abstract to the system is thus defined by a set of generated FUIs, UI processes the event and relates it to the task model of the user distributed across multiple devices with each FUI being adapted interface (3). A specific task receiving the focus, results in a to the specific capabilities of the device. This system of “setfocus” event, propagated the same path backwards, as all final distributed FUIs forms a highly dynamic and complex user interfaces displaying the element now have to be notified environment that requires the synchronization of the different about the changed focus. During this top-down event propagation parts at runtime. (reification) the “setfocus” event issued from the task level is propagated to the derived abstract UIs (4). On the AUI level, the In our approach we realize the required synchronization via the events are related to the involved AIOs and then further propagation of messages through the defined user interface propagated to the CUI level (5). Here the events are again mapped model. In the same way the reification can be used to derive user to the associated CIOs and interpreted depending on the targeted Figure 2: Synchronization via coordination topics of loosely coupled connections in MASP architecture output modality. Finally the adapted events are delivered to the FUI level (6.1+6.2), where they result in an update of the specific Figure 3: The graphical user interface of the cooking aid presentation. In a visual modality, the “setfocus” event interface model. To provide a general abstraction layer from the could result in a highlighting of a widget, whereas in a voice- multiple events that can be fired by the final user interface (i.e. based modality, the event could result in a speech output. onmouseover, onmouseout, onclick, onblur, etc in HTML) we To evaluate the described event propagation mechanism and the introduce three types of interaction events: focus, input and classification of the events for the dynamic coordination of output. Focus events have a navigational nature, covering events distributed user interfaces we developed a runtime system for a that do not change the status of the system, but the status of the model based user interface. Based on a task tree model the event current view of the system. Input events have an interaction abstraction and reification mechanisms as well as an event nature, covering selection and text input triggered by the user. classification are combined to provide multimodal user interfaces, Output events are events issued by the system to adapt the allowing the multimodal usage of web-based application via presentation of the user interface to the current status of the multiple interaction channels that can be added and removed system. They allow to synchronize FUIs when the presentation independently at runtime. In the following section we describe changes. A FUI presented on an end-device can issue focus or our implementation of the Multi-Access Service Platform, input events, whenever a user interaction occurs and receive allowing accessing an application described by a user interface output events when the presentation has to be updated. The model via various channels. mapping of FUI specific events to a supported interaction event is provided by the channel, managing the communication with the specific FUI. The interaction channel thus provides a device and 4. THE MULTI-ACCESS SERVICE modality abstraction, introducing a common interaction PLATFORM mechanism. In our implementation we are using Java Messaging The Multi-Access Service Platform (MASP) has been realized as System (JMS)-based messaging, allowing the flexible distribution a framework allowing event-based synchronization of distributed of messages to the affected system components. Events received user interfaces based on a hierarchical user interface model. The through an interaction channel are propagated to the backend delivery of final user interfaces and messages is realized via model through a number of topics, allowing the classification of connections to devices supporting two-way client-server the received events and their appropriate distribution. communication (Figure 2). In addition to the interaction events we defined additional events Connections established between the MASP and any interaction on the level of the task model (task done, task disabled, task device accessing the MASP realize event-based, two-way client- enabled) to communicate changes on the task level. We define server communication, by abstracting from the underlying that specific input events can be mapped to task done events by communication mechanism (i.e. HTTP). In our understanding a the abstract user interface. Task enabled and disabled events are connection is a way to describe the communication with a specific mapped to output events, taking care that the specific task device, acting as a container combining different communication presentations are shown or hidden from the specific FUIs. In our channels to abstract from the device specifics. A communication implementation the task model is defined using the Concurrent channel is part of a connection and responsible for the one-way Task Tree notation [7], interpreted at runtime. communication of events. We distinguish between input channels, Using connections the runtime system can be dynamically providing user input events to the system and output channels connected to various devices by setting up the required channels. allowing the manipulation of the FUI via output events. Each Once a channel is set up, the system can render a final user channel provides eventing capabilities and is connected to interface for the channel and deliver it to the device. A user different topics, allowing the classification of events. interface can be distributed across multiple devices and modalities Interaction with the system takes place via events fired by the when multiple channels are available. A mechanism of sending user interface through the channels. These events are processed updates to the presented user interfaces via output events allows by our system and delivered to the affected parts of the user the redistribution and adaptation of user interfaces when new device enter or leave the interaction environment. 5. The Virtual Cook 7. ACKNOWLEDGMENTS As an example, demonstrating the usability of our framework, we We thank the German Federal Ministry of Economics and created a Virtual Cook, presenting a cooking aid, showing the Technology for supporting our work as part of the Service Centric required steps to support the user when preparing a meal. Figure 3 Home project in the "Next Generation Media" program. shows the graphical user interface of the virtual cook. As a person usually doesn’t have the hands free for using mouse and keyboard 8. REFERENCES during cooking, we equipped the Virtual Cook with a voice based [1] Bouchet, J.; Nigay, L. & Ganille, T. (2004),ICARE software interface, which can support the control of the visual output. components for rapidly developing multimodal interfaces, in Besides the possibility to dynamically add and remove a voice 'ICMI '04: Proceedings of the 6th international conference on channel when using the application we also added support for a Multimodal interfaces', ACM Press, New York, NY, USA, gesture recognition channel. The voice channel is realized via pp. 251-258. SIP-based communication, allowing a loose coupling of the voice channel. To connect the gesture channel we created an interface [2] Calvary, G., Coutaz, J., Thevenin, D., Limbourg, Q., defining five gestures for navigation in the cooking aid user Bouillon, L. and Vanderdonckt, J. A Unifying Reference interface (back, forward, up, down and step done). This interface Framework for Multi-Target User Interfaces. Interacting can be delivered to a gesture recognition device we build with Computers 15, 3 (2003), 289–308. ourselves, extending the possible interaction modalities via [3] Coninx, K.; Luyten, K.; Vandervelpen, C.; den Bergh, J.V. gesture-based interaction. & Creemers, B. (2003),Dygimes: Dynamically Generating Our implementation of the virtual cook application using the Interfaces for Mobile Computing Devices and Embedded MASP framework to realize an enhanced multimodal user Systems., in 'Mobile HCI', pp. 256-270. interface allows the distribution of the user interface across [4] Eisenstein, J.; Vanderdonckt, J. & Puerta, A.R. (2001), various modalities based on the availability of the devices. The Applying model-based techniques to the development of UIs connection abstraction allows us to dynamically add and remove for mobile computers, in 'Intelligent User Interfaces', pp. 69- devices from the environment which results in interaction 76. modalities being added/removed to/from the application. [5] Klug, T. & Kangasharju, J. (2005), Executable Task Models, in 'Proceedings of TAMODIA 2005', ACM Press, Gdansk, 6. CONCLUSION Poland, pp. 119-122. In this paper we introduced the event-based synchronization of distributed user interfaces, based on a hierarchical user interface [6] Mori, G.; Paterno;, F. & Santoro, C. (2003), Tool support for model, defining the different aspects of the UI on multiple levels designing nomadic applications, in 'IUI '03: Proceedings of of abstraction. The framework we presented allows processing of the 8th international conference on Intelligent user user input events and synchronization of dynamically distributed interfaces', ACM Press, New York, NY, USA, pp. 141--148. user interfaces. A connection-based communication mechanism [7] Paterno, F (1999), Model-based Design and Evaluation of combining multiple channels can be used to communicate with Interactive Applications. Springer Verlag. Berlin 1999. the user via multiple modalities. [8] Paterno, F. & Giammarino, F. (2006), Authoring interfaces However, in our work we focused on the extension of a primary with combined use of graphics and voice for both stationary modality with additional redundant interaction capabilities. The and mobile devices, in 'AVI '06: Proceedings of the working extension of the approach to support any mixture of modalities as conference on Advanced visual interfaces', ACM Press, New well as the usage of complementary user interfaces requires more York, NY, USA, pp. 329-335. research considering the elimination of ambiguous events and the [9] Reithinger, N.; Alexandersson, J.; Becker, T.; Blocher, A.; fusion of multipart events. Engel, R.; Löckelt, M.; Müller, J.; Pfleger, N.; Poller, P.; We also did not set a strong focus on the rendering of user Streit, M. & Tschernomas, V. (2003), SmartKom: adaptive interfaces for the different modalities from one common model, and flexible multimodal access to multiple applications, in but rather annotated a task tree with user interfaces for the 'ICMI '03: Proceedings of the 5th international conference on different supported modalities. Multimodal interfaces', ACM Press, New York, NY, USA, In the future, we also want to support more gesture and voice pp. 101-108. commands and a more flexible definition of the user interface [10] Stanciulescu, A.; Limbourg, Q.; Vanderdonckt, J.; Michotte, model, considering the different interaction styles of the B. & Montero, F. (2005), A transformational approach for modalities in a more appropriate way. multimodal web user interfaces based on UsiXML, in 'ICMI The presented approach provides an event-based mechanism '05: Proceedings of the 7th international conference on incorporating the multi-level structure of model-based user Multimodal interfaces', ACM Press, New York, NY, USA, interfaces to coordinate distributed user interfaces. However, the pp. 259-266. presented implementation is still not complete and can be [11] Vandervelpen, C. and Coninx, K. Towards model-based extended towards better support for the new possibilities provided design support for distributed user interfaces. In Proceedings by multimodal human-computer interaction in smart of the Third Nordic Conference on Human-Computer environments. interaction (Tampere, Finland, October 23 - 27, 2004). NordiCHI '04, vol. 82. ACM Press, New York, NY, 61-70.