A Notation and a Layered Architecture to Model Dynamic Instantiation of Input Devices and Interaction Techniques: Application to Multi-Touch Interactions Arnaud Hamon1,2, Eric Barboni1, Philippe Palanque1, Raphaël André2 1 2 ICS-IRIT, University Toulouse 3, AIRBUS Operations 118, route de Narbonne, 316 route de Bayonne 31062 Toulouse Cedex 9, France 31060 Toulouse cedex 9 {lastname}@irit.fr {Firstname.Lastname}@airbus.com ABSTRACT encountered before. Indeed, while new interaction Representing the behavior of multi-touch interactive techniques have been proposed on a regular basis by the systems in a complete, concise and non-ambiguous way is research community (e.g. multimodal gesture+voice still a challenge for formal description techniques. Indeed, interactions by R. Bolt in [5], post-WIMP interactions such multi-touch interactive systems embed specific constraints as [4] …) recent years have seen the adoption and that are either cumbersome or impossible to capture with deployment of such interaction techniques in many classical formal description techniques. This is due to both different types of systems. Together with this evolution of the idiosyncratic nature of multi-touch technology (e.g. the interaction techniques, the appearance and adoption of new fact that each finger represents an input device and that input devices is also a significant change with respect to the gestures are directly performed on the surface without an past. Indeed, mass market computers remained for nearly additional instrument) and the high dynamicity of 20 years equipped with standard mouse and keyboard while interactions usually encountered in this kind of systems. nowadays, one interacts with more sophisticated input This paper presents a formal description technique able to devices such as multi-touch surfaces, Kinect, Wiimote, … model multi-touch interactive systems. A layered architecture is also proposed that proposes a generic However, these new input devices and their associated structure for organizing models of multi-touch systems. We interaction techniques have significantly increased the the focus the presentation on how to represent the dynamic development complexity of interactive systems. For instantiation of input devices (i.e. finger) and how they can instance, multimodal interaction techniques are now then be exploited dynamically to offer a multiplicity of common both as input and output modalities. One of the interaction techniques which are also dynamically most challenging examples is the one of multi-touch instantiated. systems1. Indeed, even though some studies [4] show that they improve the bandwidth between the users and the Author Keywords system, they bring specific challenges such as handling Multi-touch interactions, model-based approaches, formal dynamic management of input devices (the fingers) and description techniques their associated interaction techniques (including fusion and fission of input (e.g. input fusion for a pinch) as well as fusion and fission of rendering (e.g. output fusion for ACM Classification Keywords fingers clustering)). D.2.2 [Software] Design Tools and Techniques - Computer- aided software engineering (CASE), H.5.2 [Information This paper first presents a formal description technique able Interfaces]: User Interfaces - Interaction styles. to describe in a complete and unambiguous way the behavior of multi-touch systems. As it consists in INTRODUCTION extensions of previous work, we make explicit the changes Over the last decade the field of interactive systems that have been made to the ICO notation. We present the engineering had to face multiple challenges at a pace never basic constructs of the extensions and how they can be applied on a simple example making particularly explicit EGMI 2014, 1st International Workshop on Engineering how dynamic management of both input devices and Gestures for Multimodal Interfaces, June 17 2014, Rome, interaction techniques are accounted for. This paper Italy. Copyright 2014 for the individual papers by the addresses more specifically multi-touch input devices and papers' authors. Copying permitted only for private and interaction techniques but the concepts are applicable to any academic purposes. This volume is published and interactive system where input devices are connected and copyrighted by its editors. http://ceur-ws.org/Vol-1190/. 1 We use in this paper multi-touch systems as a shortcut for interactive systems offering multi-touch interactions 15 disconnected at runtime and requiring reconfiguration of input devices (fingers) makes it possible for interaction interaction techniques. Secondly, the paper presents a designers to define very sophisticated interaction layered architecture that structures models of multi-touch techniques making use of several fingers grouped systems. together for instance. Such grouping requires fusions of events from the groups of fingers but also the fusion of MODELLING CHALLENGES DUE TO DYNAMIC output information to provide feedback to the users ASPECTS OF MULTITOUCH SYSTEMS about the current state of recognition of the interaction. In classical interactive systems, the set of input and output For example, interaction techniques featuring a group devices are identified at design time and the interaction of two fingers will require modifying the initial techniques to be used for interacting with the application rendering of each finger’s graphical feedback as in are based on this predefined set and also defined Figure 1-b). Figure 1-a) presents a graphical feedback beforehand [3]. Multi-touch systems challenge this by of three fingers on a multi-touch application. requiring the capacity for handling input devices (i.e. fingers) that may appear and disappear dynamically while These challenges go beyond the ones brought by the interaction takes place. multimodal interactions identified in [12]. In such context, when the interactive system is started input devices are not present and thus not identified. Users’ fingers are considered as input devices and are only detected as they touch (or get close enough to) the tactile surface. The input devices (fingers) detected at execution time need to be dynamically instantiated in order to be registered and listened to. While this can be easily managed using programming languages, such aspect is usually not addressed by modelling techniques in the literature. While model-based approaches provide well identified benefits such as abstract description, possible reasoning about models, complete and unambiguous descriptions, in order to deal with multi-touch systems they have to address the following challenges: Describe the dynamic management of input devices. This includes the description (inside models) of dynamic creation (instantiation) of input devices and the description of how many of them are present at any a) time. This management also requires the removal of the devices from the models when they are freed; Make explicit in the models the connection between the hardware (input devices) and their software counterpart (i.e. device drivers and transducers as introduced in [6] and formalized in [1]); Describe the set of states, the events produced and the event consumed by the device drivers and the transducers; Describe the interaction techniques that have to handle references to dynamically instantiated models related to the input devices (drivers and transducers); Describe how interaction techniques behavior evolves according to the addition and removal of input devices. Such capability is extremely demanding on the specification techniques requiring dynamic management of interaction techniques as demonstrated b) in [13]. Figure 1- a) 3 input device detected; b) output of the clustering of two input devices (merged disks bottom left) Described fusion and fission of input and output within the interaction technique. Indeed, the use of multiple THE EXTENDED ICO NOTATION 16 Based on the study of the related work and the dimensions marking described in [9], only the ICO notation allows the explicit name of the event the modelling of all the multi-touch characteristics. However, eventName transition is linked to extensive modelling of multi-touch systems has the source of the event demonstrated the need for modifying the ICO notation in eventSource received order to provide primitives for handling specificities of multi-touch systems. It is important to note that these The collection of the primitives do not constitute extensions to the expressive 3 : Event eventParameters parameters of the power of ICOs but bring the formal description technique block received event closer to what is needed to model multi-touch systems. This boolean expression is why the proposed extensions contribute beyond ICOs as based on the such extensions could be added to other notations, provided eventCondition eventParameters’ their expressive power is sufficient for modeling multi- values used for the touch systems. firing 4 : Action Introduction action an action block The ICO notation (Interactive Cooperative Objects) is a Table 1- Properties of the generic event transition formal description technique devoted to specify interactive systems. Using high-level Petri nets [8] for dynamic Informal description of dynamic instantiation behavior description, the notation also relies on object- ICOs, due to their Petri nets underpinning, are particularly oriented approach (dynamic instantiation, classification, efficient to create and destroy elements when they are encapsulation, inheritance and client/server relationships) to represented as tokens. As ICOs’ tokens refer to objects or describe the structural or static aspects of systems. other ICOs, it is possible to use such high-level tokens to represent input devices such as fingers on a touchscreen. The ICO notation is based on a behavioral description of Such tokens refer to other ICO models describing the the interactive system using the Cooperative objects detailed behavior of the input device. For instance, Figure 4 formalism that describes how the object reacts to external presents the behavior of a finger both in terms of states stimuli according to its inner state. This behavior, called the (values for position, pressure, ...) and events (e.g. update Object Control Structure (ObCS) is described by means of corresponding to move events). Object Petri Net (OPN). An ObCS can have multiple places and transitions that are linked with arcs as with standard The ICO model in Figure 3 describes how new input Petri nets. As an extension to these standard arcs, ICO devices are instantiated and stored in a manager. The top- allows using test arcs and inhibitor arcs. Each place has an left transition in Figure 3 illustrates how new input devices initial marking (represented by one or several tokens in the can be added to an ICO model with the creation of a model place) describing the initial state of the system. As the of finger type (instruction finger=create Finger(touchinfo)). paper mainly focuses on behavioral aspects, we do not The newly created reference is then stored in a waiting describe them further (more can be found in [14]. place (called ToAddFinger) in order to be connected to an interaction technique in charge of handling the events that ICO notation objects are composed of four components: a will be produced by the new device. cooperative object for the behavior description, a presentation part (i.e. Graphical Interface), and two Handling events from dynamically instantiated sources functions (activation and rendering) describing the links An ICO model may act as an event handler for events between the cooperative object and the presentation part. emitted by other models or java instances. The detailed ICOs have been used for various types of multi-modal description of these mechanisms is available in [16]. In interfaces [11] and in particular for multi-touch [9]. This addition, the different transition blocks of Figure 3 (top-left notation is also currently applied for formal specification in transition) are presented in Table 1. the fields of Air Traffic Control interactive applications [14], space command and control ground systems [15], or Formal description interactive military [2] or civil cockpits [1]. Due to space constraints, the formal definition of the extensions is not given here but its denotational semantics Block Field Name Field Description is given in terms of “standard” ICOs as defined in [14]. unique name, not 1: Name name necessary linked to the A LAYERED APPLICATION TO SUPPORT DYNAMIC block HANDLING OF INPUT DEVICES eventName 2: boolean expression This section proposes a layered architecture (see Figure 2) Precondition precondition independent of the making explicit the various models needed to describe block event but depending on multi-touch systems as well as the way they communicate. This architecture allows handling the dynamicity aspects of 17 input devices and interaction techniques. The proposed Low-level transducer description architecture features “good” properties of software systems The model presented in Figure 3 is called a transducer as it mentioned in [17] (flexibility, separation of concerns, is located (in terms of software architecture) in between the extensibility and hardware independence). Our proposed hardware devices and the interaction techniques as architecture is similar to the layered based architecture illustrated in Figure 2. There could be a chain of such described in [7] but explicitly describes the dynamic aspects models handling events from the lower level (raw events or related to multi-touch such as dynamic instantiation. data from the hardware input devices) to high-level events as a double click (see [1] for more details on transducers). Figure 2 presents this architecture starting (bottom) making explicit the flow of events from the hardware (lower level) The low-level transducer encapsulates the references to the multi-touch interactive application (higher level). The towards the upper-level models of the handling mechanism architecture relies on existing coded layers and ICO models such as FingerModels and the interaction technique and is OS independent. The hardware layer describes the ClusteringModel. The role of this low-level transducer is to hardware equipment providing tactile input. In our case, forward events received from the hardware to low-level this layer relates to the touchscreens considered. The events in FingerModels (which model the fingers’ hardware driver layer refers to the drivers used by the behavior). operating system and produces low-level events (such as “downs” with their basic attributes (posX, posY, …)) that propagate the upper levels. On top of this layer, the JavaFX layer provides object oriented events through an OS independent layer. The low-level transducer (bigger box in Figure 2) is in charge of producing high-level events corresponding to the interaction techniques recognized by the system. This layer is modelled using ICO and manages the dynamicity of the input devices. The detailed description of this layer and its components is described in the followings sections. Figure 3 – Excerpt of the model of a low level transducer During the initialization, the low-level transducer instantiates the ClusteringModel through the createClustering transition and stores its reference in the ClusteringModel place. When the low-level transducer receives a “rawToucheventf_down” event from the hardware, the fingerInstantiation transition is fired, the event parameters (the touch iD, and its additional information) are retrieved and used to dynamically Figure 2 - Layered architecture to support dynamic handling instantiate a new instance of FingerModel. The of input devices addFingerToClustering transition then adds the DEMONSTRATING HANDLING OF INPUT DEVICES: A FingerModel reference to the cluster model. This is how the SIMPLE EXAMPLE USING ICOS interaction technique is informed of the detection of new This paragraph describes the ICO models used for the fingers. The low-level transducer then stores the reference example presented Figure 1-b, which handles dynamically of the FingerModel in the FingerPool place (which contains referenced input devices and corresponds to the main the list of all the detected fingers). When the transducer components of the architecture presented Figure 2. receives “rawToucheventf_update” (resp. “rawToucheventf_up”) events from the hardware, the transition updatingFinger (resp. freeFinger) is then 18 triggered and updates accordingly the proper FingerModel. described offering various properties such as finger tilt These updates are provided using the communication angle, acceleration and direction of the movements. mechanism of ICO services and not using events since the Lastly, this finger model is an extensible model that can low-level transducer contains references toward the describe very complex behaviors. For example, if one needs FingerModels and is able to match the hardware events to describe the behavior of a finger input as in Proton++ with the right model. [10], this can be done in a finger model as the one presented. Indeed this model specifies when the touch Figure 4 – Generic Model of Finger events are broadcasted and that such broadcasting can be Modelling touch fingers Each time the low-level transducer receives an event controlled in order to match a sequential system sending corresponding to the detection of a finger on the hardware, user events every 30ms as in [10]. it creates the model and links it with the interaction Modelling the interaction technique “finger clustering” technique model(s). When the event received corresponds This paragraph describes how the ICO notation handles to an update of an already detected finger, the low-level interaction techniques including output fusion of transducer notifies the corresponding finger model using information related to the reception of events produced by the services “update”. When the finger is removed from the dynamically instantiated input devices (see Figure 5). In hardware, the low-level transducer fires the transition this example, the interaction technique model is in charge freeFinger, which destroys the corresponding FingerModel. of pairing co-located input devices so they can be handled For readability purposes, the model presented in Figure 4 as a group of fingers. This corresponds to the interaction features a limited set of fingers properties: position and presented in Figure 1 where the right-hand side of the figure pressure. However, more complex finger models have been presents the rendering associated to the detection of a pair 19 Figure 5 - Model of the interaction technique “finger clustering” of fingers (bottom-left of the figure) while the other finger which triggers the method hideFingerRendering for remains ungrouped. The model presented in Figure 5 is each model. This method hides the elementary composed of a service (addFinger), two places (ListOfPairs rendering associated to each finger. When a pair is storing the pairs of fingers and SingleFingersList storing the detected, both references are combined in a token “single” fingers) and event-transitions to update the added to the place LisfOfFingerPairs which calls the clustering according to the evolution of the position of method createPairedFingerRendering, which displays fingers on the touchscreen. Each time a finger model is the rendering associated to the two-finger cluster.It is created (a new finger touches the screen), the low level important to note that output is thus connected to state transducer calls the “addFinger” service and a reference to a changes in the models (which only occur when tokens new finger model is set in place SingleFinger. When a are added to or removed from places) while inputs are finger from SingleFingerList (called finger1 for instance) event based and thus associated to transitions. moves close enough to another finger (e.g. finger2) in that place too, two cases are represented: ObCS Node ObCS Rendering method name event • finger2 is close enough of finger1 (condition in the event condition zone of transition cluster2Fingers is SingleFingerList tokenAdded showFingerRendering true) then transition cluster2Fingers is fired, finger1 and finger2 are removed from place SingleFingerList SingleFingerList tokenRemoved hideFingerRendering and a new token consisting of the pair (finger1, finger2) and their respective position is stored in place ListOfFingerPairs. ListOfFingerPairs tokenAdded createPairedFingerRendering • finger2 is too far from finger1 (condition in the event condition zone of transition noClusterDetected is true) ListOfFingerPairs tokenRemoved removePairedFingerRendering then that transition is fired and the new position of finger is updated. Table 2 -Rendering functions of the interaction technique • When a pair is detected, the user interface should CONCLUSION display graphically such dynamic grouping. This is This paper has identified a set of challenges towards the defined by the rendering function associated to the production of complete and unambiguous specifications of interaction technique and presented in Table 2. When multi-touch systems. The main issues deal with the two fingers are merged, the token referencing these two dynamic instantiation of input devices and the dynamic models are removed from SingleFingerList place reconfiguration of interaction techniques. We have 20 highlighted the fact that such concerns have not previously 8. Genrich, H. J. 1991. Predicate/Transitions Nets. In encountered (at least at this large scale) when engineering High-Levels Petri Nets: Theory and Application. K. interactive systems. This paper has presented a twofold way Jensen and G. Rozenberg, (Eds.), Springer Verlag for addressing these issues: (1991) pp. 3-43 • A layered software architecture made up of 9. Hamon A., Palanque P., Silva J-L., Deleris Y., & communicating models, which makes explicit a set of Barboni E. 2013. Formal description of multi-touch components and their inter-relations in order to address interactions.5th symp. on Engineering interactive this dynamicity challenge; computing systems (EICS '13). ACM, 207-216 10. Kenrick K., Björn H., DeRose T. & Maneesh A. 2012. • A formal description technique able to describe in a Proton++: a customizable declarative multitouch complete and unambiguous way such dynamic framework. Proc. of ACM symposium on User interface behaviors. software and technology (UIST '12). ACM, 477-486. While the formal notation contribution is very specific to 11. Ladry J-F., Navarre D., Palanque P. Formal description the work presented here, the layered architecture is techniques to support the design, construction and independent from it and can be reused within any evaluation of fusion engines for sure (safe, usable, framework dealing with multi-touch interactions. reliable and evolvable) multimodal interfaces. ICMI 2009: 185-192 REFERENCES 1. Accot J., Chatty S., Maury S. & Palanque P. Formal 12. Lalanne D., Nigay L., Palanque P., Robinson P., Transducers: Models of Devices and Building Bricks for Vanderdonckt J. & Ladry J-F. Fusion engines for Highly Interactive Systems. DSVIS 1997, Springer multimodal input: a survey. ACM ICMI 2009: 153-160, Verlag, pp. 234-259. ACM DL 2. Bastide R., Navarre D., Palanque P., Schyn A. & 13. Spano L-D., Cisternino A., Paternò F., Fenu G. GestIT: Dragicevic P. A Model-Based Approach for Real-Time a declarative and compositional framework for Embedded Multimodal Systems in Military Aircrafts. multiplatform gesture definition. EICS 2013: 187-196 Int. Conference on Multimodal Interfaces (ICMI'04), 14. Navarre D., Palanque P., Ladry J-F. & Barboni E. ICOs: ACM DL, 10 pages. A model-based user interface description technique 3. Bellik Y., Rebaï I., Machrouh E., Barzaj Y., Jacquet C., dedicated to interactive systems addressing usability, Pruvost G., Sansonnet J.-P.: Multimodal Interaction reliability and scalability. ACM Trans. Comput.-Hum. within Ambient Environments: An Exploratory Study. Interact., 16(4), 18:1–18:56. 2009 INTERACT (2) 2009: 89-92 15. Palanque P., Bernhaupt R., Navarre D., Ould M. & 4. Bi X., Grossman T., Matejka J., and Fitzmaurice G.: Winckler M. Supporting Usability Evaluation of 2011. Magic desk: bringing multi-touch surfaces into Multimodal Man-Machine Interfaces for Space Ground desktop work. In Proceedings of the SIGCHI Segment Applications Using Petri net Based Formal Conference on Human Factors in Computing Systems Specification. Ninth Int. Conference on Space (CHI '11). ACM, New York, NY, USA, 2511-2520. Operations, Italy, June 18-22, 2006 5. Bolt, R and Herranz, E. (1992). “Two-Handed Gesture 16. Palanque P. & Schyn A. A Model-Based Approach for in Multi-Modal Natural Dialog”, Proceedings of the Engineering Multimodal Interactive Systems in fifth annual ACM symposium on User interface INTERACT 2003, IFIP TC 13 conf. on HCI, 10 pages. software and technology, ACM Press, p 7-14 17. Khandkar S.H. & Maurer F. A domain specific language 6. Buxton W. A three-state model of graphical input, IFIP to define gestures for multi-touch applications. In TC 13 INTERACT’90, 1990, p. 449–456. Proceedings of the 10th Workshop on Domain-Specific Modeling (DSM '10). ACM, New York, NY, USA, , 7. Echtler F. and Klinker G.. 2008. A multitouch software Article 2 , 6 pages. architecture. In Proceedings of the 5th Nordic conference on Human-computer interaction: building bridges (NordiCHI '08). ACM, New York, NY, USA, 463-466. 21