A Notation and a Layered Architecture to Model Dynamic
 Instantiation of Input Devices and Interaction Techniques:
           Application to Multi-Touch Interactions
                   Arnaud Hamon1,2, Eric Barboni1, Philippe Palanque1, Raphaël André2
           1                                                   2
            ICS-IRIT, University Toulouse 3,                     AIRBUS Operations
                118, route de Narbonne,                        316 route de Bayonne
            31062 Toulouse Cedex 9, France                    31060 Toulouse cedex 9
                  {lastname}@irit.fr                    {Firstname.Lastname}@airbus.com
ABSTRACT                                                             encountered before. Indeed, while new interaction
Representing the behavior of multi-touch interactive                 techniques have been proposed on a regular basis by the
systems in a complete, concise and non-ambiguous way is              research community (e.g. multimodal gesture+voice
still a challenge for formal description techniques. Indeed,         interactions by R. Bolt in [5], post-WIMP interactions such
multi-touch interactive systems embed specific constraints           as [4] …) recent years have seen the adoption and
that are either cumbersome or impossible to capture with             deployment of such interaction techniques in many
classical formal description techniques. This is due to both         different types of systems. Together with this evolution of
the idiosyncratic nature of multi-touch technology (e.g. the         interaction techniques, the appearance and adoption of new
fact that each finger represents an input device and that            input devices is also a significant change with respect to the
gestures are directly performed on the surface without an            past. Indeed, mass market computers remained for nearly
additional instrument) and the high dynamicity of                    20 years equipped with standard mouse and keyboard while
interactions usually encountered in this kind of systems.            nowadays, one interacts with more sophisticated input
This paper presents a formal description technique able to           devices such as multi-touch surfaces, Kinect, Wiimote, …
model multi-touch interactive systems. A layered
architecture is also proposed that proposes a generic                However, these new input devices and their associated
structure for organizing models of multi-touch systems. We           interaction techniques have significantly increased the the
focus the presentation on how to represent the dynamic               development complexity of interactive systems. For
instantiation of input devices (i.e. finger) and how they can        instance, multimodal interaction techniques are now
then be exploited dynamically to offer a multiplicity of             common both as input and output modalities. One of the
interaction techniques which are also dynamically                    most challenging examples is the one of multi-touch
instantiated.                                                        systems1. Indeed, even though some studies [4] show that
                                                                     they improve the bandwidth between the users and the
Author Keywords                                                      system, they bring specific challenges such as handling
Multi-touch interactions, model-based approaches, formal             dynamic management of input devices (the fingers) and
description techniques                                               their associated interaction techniques (including fusion and
                                                                     fission of input (e.g. input fusion for a pinch) as well as
                                                                     fusion and fission of rendering (e.g. output fusion for
ACM Classification Keywords                                          fingers clustering)).
D.2.2 [Software] Design Tools and Techniques - Computer-
aided software engineering (CASE), H.5.2 [Information                This paper first presents a formal description technique able
Interfaces]: User Interfaces - Interaction styles.                   to describe in a complete and unambiguous way the
                                                                     behavior of multi-touch systems. As it consists in
INTRODUCTION                                                         extensions of previous work, we make explicit the changes
Over the last decade the field of interactive systems                that have been made to the ICO notation. We present the
engineering had to face multiple challenges at a pace never          basic constructs of the extensions and how they can be
                                                                     applied on a simple example making particularly explicit
EGMI 2014, 1st International Workshop on Engineering                 how dynamic management of both input devices and
Gestures for Multimodal Interfaces, June 17 2014, Rome,              interaction techniques are accounted for. This paper
Italy. Copyright 2014 for the individual papers by the               addresses more specifically multi-touch input devices and
papers' authors. Copying permitted only for private and              interaction techniques but the concepts are applicable to any
academic purposes. This volume is published and                      interactive system where input devices are connected and
copyrighted by its editors.
http://ceur-ws.org/Vol-1190/.                                        1
                                                                       We use in this paper multi-touch systems as a shortcut for
                                                                     interactive systems offering multi-touch interactions
                                                                15
disconnected at runtime and requiring reconfiguration of                  input devices (fingers) makes it possible for interaction
interaction techniques. Secondly, the paper presents a                    designers to define very sophisticated interaction
layered architecture that structures models of multi-touch                techniques making use of several fingers grouped
systems.                                                                  together for instance. Such grouping requires fusions of
                                                                          events from the groups of fingers but also the fusion of
MODELLING CHALLENGES DUE TO                       DYNAMIC                 output information to provide feedback to the users
ASPECTS OF MULTITOUCH SYSTEMS                                             about the current state of recognition of the interaction.
In classical interactive systems, the set of input and output             For example, interaction techniques featuring a group
devices are identified at design time and the interaction                 of two fingers will require modifying the initial
techniques to be used for interacting with the application                rendering of each finger’s graphical feedback as in
are based on this predefined set and also defined                         Figure 1-b). Figure 1-a) presents a graphical feedback
beforehand [3]. Multi-touch systems challenge this by                     of three fingers on a multi-touch application.
requiring the capacity for handling input devices (i.e.
fingers) that may appear and disappear dynamically while             These challenges go beyond the ones brought by
the interaction takes place.                                         multimodal interactions identified in [12].

In such context, when the interactive system is started input
devices are not present and thus not identified. Users’
fingers are considered as input devices and are only
detected as they touch (or get close enough to) the tactile
surface. The input devices (fingers) detected at execution
time need to be dynamically instantiated in order to be
registered and listened to. While this can be easily managed
using programming languages, such aspect is usually not
addressed by modelling techniques in the literature. While
model-based approaches provide well identified benefits
such as abstract description, possible reasoning about
models, complete and unambiguous descriptions, in order
to deal with multi-touch systems they have to address the
following challenges:
    Describe the dynamic management of input devices.
    This includes the description (inside models) of
    dynamic creation (instantiation) of input devices and
    the description of how many of them are present at any           a)
    time. This management also requires the removal of the
    devices from the models when they are freed;
    Make explicit in the models the connection between
    the hardware (input devices) and their software
    counterpart (i.e. device drivers and transducers as
    introduced in [6] and formalized in [1]);
    Describe the set of states, the events produced and the
    event consumed by the device drivers and the
    transducers;
    Describe the interaction techniques that have to handle
    references to dynamically instantiated models related
    to the input devices (drivers and transducers);
    Describe how interaction techniques behavior evolves
    according to the addition and removal of input devices.
    Such capability is extremely demanding on the
    specification     techniques     requiring    dynamic
    management of interaction techniques as demonstrated             b)
    in [13].                                                         Figure 1- a) 3 input device detected; b) output of the clustering
                                                                             of two input devices (merged disks bottom left)
    Described fusion and fission of input and output within
    the interaction technique. Indeed, the use of multiple           THE EXTENDED ICO NOTATION

                                                                16
Based on the study of the related work and the dimensions                                                             marking
described in [9], only the ICO notation allows the explicit                                                    name of the event the
modelling of all the multi-touch characteristics. However,                                  eventName
                                                                                                               transition is linked to
extensive modelling of multi-touch systems has
                                                                                                              the source of the event
demonstrated the need for modifying the ICO notation in                                     eventSource
                                                                                                                      received
order to provide primitives for handling specificities of
multi-touch systems. It is important to note that these                                                        The collection of the
primitives do not constitute extensions to the expressive                3 : Event       eventParameters          parameters of the
power of ICOs but bring the formal description technique                   block                                   received event
closer to what is needed to model multi-touch systems. This                                                     boolean expression
is why the proposed extensions contribute beyond ICOs as                                                            based on the
such extensions could be added to other notations, provided                              eventCondition           eventParameters’
their expressive power is sufficient for modeling multi-                                                         values used for the
touch systems.                                                                                                          firing
                                                                         4 : Action
Introduction                                                                                  action                 an action
                                                                           block
The ICO notation (Interactive Cooperative Objects) is a                      Table 1- Properties of the generic event transition
formal description technique devoted to specify interactive
systems. Using high-level Petri nets [8] for dynamic                   Informal description of dynamic instantiation
behavior description, the notation also relies on object-              ICOs, due to their Petri nets underpinning, are particularly
oriented approach (dynamic instantiation, classification,              efficient to create and destroy elements when they are
encapsulation, inheritance and client/server relationships) to         represented as tokens. As ICOs’ tokens refer to objects or
describe the structural or static aspects of systems.                  other ICOs, it is possible to use such high-level tokens to
                                                                       represent input devices such as fingers on a touchscreen.
The ICO notation is based on a behavioral description of
                                                                       Such tokens refer to other ICO models describing the
the interactive system using the Cooperative objects
                                                                       detailed behavior of the input device. For instance, Figure 4
formalism that describes how the object reacts to external
                                                                       presents the behavior of a finger both in terms of states
stimuli according to its inner state. This behavior, called the
                                                                       (values for position, pressure, ...) and events (e.g. update
Object Control Structure (ObCS) is described by means of
                                                                       corresponding to move events).
Object Petri Net (OPN). An ObCS can have multiple places
and transitions that are linked with arcs as with standard             The ICO model in Figure 3 describes how new input
Petri nets. As an extension to these standard arcs, ICO                devices are instantiated and stored in a manager. The top-
allows using test arcs and inhibitor arcs. Each place has an           left transition in Figure 3 illustrates how new input devices
initial marking (represented by one or several tokens in the           can be added to an ICO model with the creation of a model
place) describing the initial state of the system. As the              of finger type (instruction finger=create Finger(touchinfo)).
paper mainly focuses on behavioral aspects, we do not                  The newly created reference is then stored in a waiting
describe them further (more can be found in [14].                      place (called ToAddFinger) in order to be connected to an
                                                                       interaction technique in charge of handling the events that
ICO notation objects are composed of four components: a
                                                                       will be produced by the new device.
cooperative object for the behavior description, a
presentation part (i.e. Graphical Interface), and two
                                                                       Handling events from dynamically instantiated sources
functions (activation and rendering) describing the links              An ICO model may act as an event handler for events
between the cooperative object and the presentation part.              emitted by other models or java instances. The detailed
ICOs have been used for various types of multi-modal                   description of these mechanisms is available in [16]. In
interfaces [11] and in particular for multi-touch [9]. This            addition, the different transition blocks of Figure 3 (top-left
notation is also currently applied for formal specification in         transition) are presented in Table 1.
the fields of Air Traffic Control interactive applications
[14], space command and control ground systems [15], or                Formal description
interactive military [2] or civil cockpits [1].                        Due to space constraints, the formal definition of the
                                                                       extensions is not given here but its denotational semantics
    Block           Field Name          Field Description              is given in terms of “standard” ICOs as defined in [14].
                                        unique name, not
   1: Name
                        name          necessary linked to the          A LAYERED APPLICATION TO SUPPORT DYNAMIC
    block                                                              HANDLING OF INPUT DEVICES
                                           eventName
    2:                                  boolean expression             This section proposes a layered architecture (see Figure 2)
Precondition        precondition        independent of the             making explicit the various models needed to describe
   block                              event but depending on           multi-touch systems as well as the way they communicate.
                                                                       This architecture allows handling the dynamicity aspects of
                                                                  17
input devices and interaction techniques. The proposed                Low-level transducer description
architecture features “good” properties of software systems           The model presented in Figure 3 is called a transducer as it
mentioned in [17] (flexibility, separation of concerns,               is located (in terms of software architecture) in between the
extensibility and hardware independence). Our proposed                hardware devices and the interaction techniques as
architecture is similar to the layered based architecture             illustrated in Figure 2. There could be a chain of such
described in [7] but explicitly describes the dynamic aspects         models handling events from the lower level (raw events or
related to multi-touch such as dynamic instantiation.                 data from the hardware input devices) to high-level events
                                                                      as a double click (see [1] for more details on transducers).
Figure 2 presents this architecture starting (bottom) making
explicit the flow of events from the hardware (lower level)           The low-level transducer encapsulates the references
to the multi-touch interactive application (higher level). The        towards the upper-level models of the handling mechanism
architecture relies on existing coded layers and ICO models           such as FingerModels and the interaction technique
and is OS independent. The hardware layer describes the               ClusteringModel. The role of this low-level transducer is to
hardware equipment providing tactile input. In our case,              forward events received from the hardware to low-level
this layer relates to the touchscreens considered. The                events in FingerModels (which model the fingers’
hardware driver layer refers to the drivers used by the               behavior).
operating system and produces low-level events (such as
“downs” with their basic attributes (posX, posY, …)) that
propagate the upper levels. On top of this layer, the JavaFX
layer provides object oriented events through an OS
independent layer.
The low-level transducer (bigger box in Figure 2) is in
charge of producing high-level events corresponding to the
interaction techniques recognized by the system. This layer
is modelled using ICO and manages the dynamicity of the
input devices. The detailed description of this layer and its
components is described in the followings sections.


                                                                        Figure 3 – Excerpt of the model of a low level transducer
                                                                      During the initialization, the low-level transducer
                                                                      instantiates     the    ClusteringModel      through     the
                                                                      createClustering transition and stores its reference in the
                                                                      ClusteringModel place. When the low-level transducer
                                                                      receives a “rawToucheventf_down” event from the
                                                                      hardware, the fingerInstantiation transition is fired, the
                                                                      event parameters (the touch iD, and its additional
                                                                      information) are retrieved and used to dynamically
Figure 2 - Layered architecture to support dynamic handling           instantiate a new instance of FingerModel. The
                      of input devices                                addFingerToClustering       transition  then     adds    the
DEMONSTRATING HANDLING OF INPUT DEVICES: A                            FingerModel reference to the cluster model. This is how the
SIMPLE EXAMPLE USING ICOS                                             interaction technique is informed of the detection of new
This paragraph describes the ICO models used for the                  fingers. The low-level transducer then stores the reference
example presented Figure 1-b, which handles dynamically               of the FingerModel in the FingerPool place (which contains
referenced input devices and corresponds to the main                  the list of all the detected fingers). When the transducer
components of the architecture presented Figure 2.                    receives           “rawToucheventf_update”            (resp.
                                                                      “rawToucheventf_up”) events from the hardware, the
                                                                      transition updatingFinger (resp. freeFinger) is then
                                                                 18
triggered and updates accordingly the proper FingerModel.          described offering various properties such as finger tilt
These updates are provided using the communication                 angle, acceleration and direction of the movements.
mechanism of ICO services and not using events since the
                                                                   Lastly, this finger model is an extensible model that can
low-level transducer contains references toward the
                                                                   describe very complex behaviors. For example, if one needs
FingerModels and is able to match the hardware events
                                                                   to describe the behavior of a finger input as in Proton++
with the right model.
                                                                   [10], this can be done in a finger model as the one
                                                                   presented. Indeed this model specifies when the touch


                                              Figure 4 – Generic Model of Finger
                                                                   events are broadcasted and that such broadcasting can be
Modelling touch fingers
Each time the low-level transducer receives an event               controlled in order to match a sequential system sending
corresponding to the detection of a finger on the hardware,        user events every 30ms as in [10].
it creates the model and links it with the interaction
                                                                   Modelling the interaction technique “finger clustering”
technique model(s). When the event received corresponds
                                                                   This paragraph describes how the ICO notation handles
to an update of an already detected finger, the low-level
                                                                   interaction techniques including output fusion of
transducer notifies the corresponding finger model using
                                                                   information related to the reception of events produced by
the services “update”. When the finger is removed from the
                                                                   dynamically instantiated input devices (see Figure 5). In
hardware, the low-level transducer fires the transition
                                                                   this example, the interaction technique model is in charge
freeFinger, which destroys the corresponding FingerModel.
                                                                   of pairing co-located input devices so they can be handled
For readability purposes, the model presented in Figure 4          as a group of fingers. This corresponds to the interaction
features a limited set of fingers properties: position and         presented in Figure 1 where the right-hand side of the figure
pressure. However, more complex finger models have been            presents the rendering associated to the detection of a pair
                                                              19
                                 Figure 5 - Model of the interaction technique “finger clustering”

of fingers (bottom-left of the figure) while the other finger             which triggers the method hideFingerRendering for
remains ungrouped. The model presented in Figure 5 is                     each model. This method hides the elementary
composed of a service (addFinger), two places (ListOfPairs                rendering associated to each finger. When a pair is
storing the pairs of fingers and SingleFingersList storing the            detected, both references are combined in a token
“single” fingers) and event-transitions to update the                     added to the place LisfOfFingerPairs which calls the
clustering according to the evolution of the position of                  method createPairedFingerRendering, which displays
fingers on the touchscreen. Each time a finger model is                   the rendering associated to the two-finger cluster.It is
created (a new finger touches the screen), the low level                  important to note that output is thus connected to state
transducer calls the “addFinger” service and a reference to a             changes in the models (which only occur when tokens
new finger model is set in place SingleFinger. When a                     are added to or removed from places) while inputs are
finger from SingleFingerList (called finger1 for instance)                event based and thus associated to transitions.
moves close enough to another finger (e.g. finger2) in that
place too, two cases are represented:                                   ObCS Node            ObCS
                                                                                                             Rendering method
                                                                          name               event
•   finger2 is close enough of finger1 (condition in the
    event condition zone of transition cluster2Fingers is               SingleFingerList    tokenAdded        showFingerRendering
    true) then transition cluster2Fingers is fired, finger1
    and finger2 are removed from place SingleFingerList                 SingleFingerList   tokenRemoved       hideFingerRendering
    and a new token consisting of the pair (finger1,
    finger2) and their respective position is stored in place
    ListOfFingerPairs.                                                 ListOfFingerPairs    tokenAdded     createPairedFingerRendering

•   finger2 is too far from finger1 (condition in the event
    condition zone of transition noClusterDetected is true)            ListOfFingerPairs   tokenRemoved   removePairedFingerRendering
    then that transition is fired and the new position of
    finger is updated.                                                  Table 2 -Rendering functions of the interaction technique

•   When a pair is detected, the user interface should                CONCLUSION
    display graphically such dynamic grouping. This is                This paper has identified a set of challenges towards the
    defined by the rendering function associated to the               production of complete and unambiguous specifications of
    interaction technique and presented in Table 2. When              multi-touch systems. The main issues deal with the
    two fingers are merged, the token referencing these two           dynamic instantiation of input devices and the dynamic
    models are removed from SingleFingerList place                    reconfiguration of interaction techniques. We have
                                                                 20
highlighted the fact that such concerns have not previously         8. Genrich, H. J. 1991. Predicate/Transitions Nets. In
encountered (at least at this large scale) when engineering            High-Levels Petri Nets: Theory and Application. K.
interactive systems. This paper has presented a twofold way            Jensen and G. Rozenberg, (Eds.), Springer Verlag
for addressing these issues:                                           (1991) pp. 3-43
•   A layered software architecture made up of                      9. Hamon A., Palanque P., Silva J-L., Deleris Y., &
    communicating models, which makes explicit a set of                Barboni E. 2013. Formal description of multi-touch
    components and their inter-relations in order to address           interactions.5th symp. on Engineering interactive
    this dynamicity challenge;                                         computing systems (EICS '13). ACM, 207-216
                                                                    10. Kenrick K., Björn H., DeRose T. & Maneesh A. 2012.
•   A formal description technique able to describe in a
                                                                        Proton++: a customizable declarative multitouch
    complete and unambiguous way such dynamic
                                                                        framework. Proc. of ACM symposium on User interface
    behaviors.
                                                                        software and technology (UIST '12). ACM, 477-486.
While the formal notation contribution is very specific to          11. Ladry J-F., Navarre D., Palanque P. Formal description
the work presented here, the layered architecture is                    techniques to support the design, construction and
independent from it and can be reused within any                        evaluation of fusion engines for sure (safe, usable,
framework dealing with multi-touch interactions.                        reliable and evolvable) multimodal interfaces. ICMI
                                                                        2009: 185-192
REFERENCES
1. Accot J., Chatty S., Maury S. & Palanque P. Formal               12. Lalanne D., Nigay L., Palanque P., Robinson P.,
   Transducers: Models of Devices and Building Bricks for               Vanderdonckt J. & Ladry J-F. Fusion engines for
   Highly Interactive Systems. DSVIS 1997, Springer                     multimodal input: a survey. ACM ICMI 2009: 153-160,
   Verlag, pp. 234-259.                                                 ACM DL
2. Bastide R., Navarre D., Palanque P., Schyn A. &                  13. Spano L-D., Cisternino A., Paternò F., Fenu G. GestIT:
   Dragicevic P. A Model-Based Approach for Real-Time                   a declarative and compositional framework for
   Embedded Multimodal Systems in Military Aircrafts.                   multiplatform gesture definition. EICS 2013: 187-196
   Int. Conference on Multimodal Interfaces (ICMI'04),              14. Navarre D., Palanque P., Ladry J-F. & Barboni E. ICOs:
   ACM DL, 10 pages.                                                    A model-based user interface description technique
3. Bellik Y., Rebaï I., Machrouh E., Barzaj Y., Jacquet C.,             dedicated to interactive systems addressing usability,
   Pruvost G., Sansonnet J.-P.: Multimodal Interaction                  reliability and scalability. ACM Trans. Comput.-Hum.
   within Ambient Environments: An Exploratory Study.                   Interact., 16(4), 18:1–18:56. 2009
   INTERACT (2) 2009: 89-92                                         15. Palanque P., Bernhaupt R., Navarre D., Ould M. &
4. Bi X., Grossman T., Matejka J., and Fitzmaurice G.:                  Winckler M. Supporting Usability Evaluation of
   2011. Magic desk: bringing multi-touch surfaces into                 Multimodal Man-Machine Interfaces for Space Ground
   desktop work. In Proceedings of the SIGCHI                           Segment Applications Using Petri net Based Formal
   Conference on Human Factors in Computing Systems                     Specification. Ninth Int. Conference on Space
   (CHI '11). ACM, New York, NY, USA, 2511-2520.                        Operations, Italy, June 18-22, 2006
5. Bolt, R and Herranz, E. (1992). “Two-Handed Gesture              16. Palanque P. & Schyn A. A Model-Based Approach for
   in Multi-Modal Natural Dialog”, Proceedings of the                   Engineering Multimodal Interactive Systems in
   fifth annual ACM symposium on User interface                         INTERACT 2003, IFIP TC 13 conf. on HCI, 10 pages.
   software and technology, ACM Press, p 7-14                       17. Khandkar S.H. & Maurer F. A domain specific language
6. Buxton W. A three-state model of graphical input, IFIP               to define gestures for multi-touch applications. In
   TC 13 INTERACT’90, 1990, p. 449–456.                                 Proceedings of the 10th Workshop on Domain-Specific
                                                                        Modeling (DSM '10). ACM, New York, NY, USA, ,
7. Echtler F. and Klinker G.. 2008. A multitouch software               Article 2 , 6 pages.
   architecture. In Proceedings of the 5th Nordic
   conference on Human-computer interaction: building
   bridges (NordiCHI '08). ACM, New York, NY, USA,
   463-466.


                                                               21