Criteria, Challenges and Opportunities for Gesture Programming Languages Lode Hoste and Beat Signer Web & Information Systems Engineering Lab Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium {lhoste,bsigner}@vub.ac.be ABSTRACT For example, commercial multi-touch hardware has evolved An increasing number of today’s consumer devices such as from simple two-finger support to multi-user tracking with up mobile phones or tablet computers are equipped with various to 60 fingers1 . Similarly, commercial depth sensors such as sensors. The extraction of useful information such as ges- the Microsoft Kinect were introduced in 2010 and supported tures from sensor-generated data based on mainstream im- the tracking of 20 skeletal joints (i.e. tracking arms and limbs perative languages is a notoriously difficult task. Over the in 3D space). Nowadays, numerous depth sensors such as the last few years, a number of domain-specific programming Leap sensors or the DepthSense cameras by SoftKinetic also languages have been proposed to ease the development of provide short-range finger tracking. Recently, the Kinect for gesture detection. Most of these languages have adopted a Windows supports facial expressions and the Kinect 2 sup- declarative approach allowing programmers to describe their ports heart beat and energy level tracking. This rapid evolu- gestures rather than having to manually maintain a history tion of novel input modalities continues with announcements of event data and intermediate gesture results. While these such as the Myo electromyography gesture armband [21] and declarative languages represent a clear advancement in ges- tablet computers with integrated depth sensors, fingerprint ture detection, a number of issues are still unresolved. In this scanning and eye tracking. paper we present relevant criteria for gesture detection and In this paper, we consider a gesture as a movement of the provide an initial classification of existing solutions based on hands, face or other parts of the body in time. Due to the high these criteria in order to foster a discussion and identify op- implementation complexity, most gesture recognition solu- portunities for future gesture programming languages. tions rely on machine learning algorithms to extract gestu- ral information from sensors. However, the costs of apply- Author Keywords ing machine learning algorithms are not to be underestimated. Gesture language; multimodal interaction; declarative The capture and annotation of training and test data requires programming. substantial resources. Further, the tweaking of the correct learning parameters and analysis for overfitting require some ACM Classification Keywords expert knowledge. Last but not least, one cannot decisively H.5.m. Information Interfaces and Presentation (e.g. HCI): observe and control what has actually been learned. There- Miscellaneous fore, it is desired to have the possibility to program gestures and to ease the programming of gestural interaction. We ar- INTRODUCTION gue that research in software engineering abstractions is of With the increasing interest in multi-touch surfaces (e.g. Sony utmost importance for gesture computing. Tablet, Microsoft Surface or Apple iPad), controller-free sen- sors (e.g. Leap Motion, Microsoft Kinect or Intel’s Percep- In software engineering, a problem can be divided into its ac- tual SDK) and numerous sensing appliances (e.g. Seeeduino cidental and essential complexity [1]. Accidental complexity Films and Nike+ Fuel), developers are facing major chal- relates to the difficulties a programmer faces due to the choice lenges in integrating these modalities into common appli- of software engineering tools and can be reduced by select- cations. Existing mainstream imperative programming lan- ing or developing better tools. On the other hand, essential guages cannot cope with user interaction requirements due to complexity is caused by the characteristics of the problem to the inversion of control where the execution flow is defined be solved and cannot be reduced. The goal of gesture pro- by input events rather than by the program, the high program- gramming languages is to reduce the accidental complexity ming effort for maintaining an event history and the difficulty as much as possible. In this paper, we define a number of cri- of expressing complex patterns. teria to gain an overview about the focus of existing gesture programming languages and to identify open challenges to be further discussed and investigated. EGMI 2014, 1st International Workshop on Engineering Gestures for Multimodal In- terfaces, June 17 2014, Rome, Italy. Copyright © 2014 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by 1 its editors. 3M Multi-Touch Display C4667PW http://ceur-ws.org/Vol-1190/. 22 MOTIVATION AND RELATED WORK and segmentation, and performed an indicative evaluation of Gesture programming languages are designed to support de- nine existing gesture languages as shown in Figure 1. For velopers in specifying their gestural interaction requirements each individual criterion of these nice approaches we provide more easily than with general purpose programming lan- a score ranging from 0 to 5 together with a short explana- guages. A domain-specific language might help to reduce tion which can be found in the online data set. A score of 1 the repetitive boilerplate that cannot be removed in existing means that the approach could theoretically support the fea- languages as described by Van Cutsem2 . General purpose ture but it was not discussed in the literature. A score of 2 programming languages such as Java sometimes require an indicates that there is some initial support but with a lack excessive amount of constructs to express a developer’s in- of additional constructs that would make it useful. Finally, tention which makes them hard to read and maintain. Van a score in the range of 3-5 provides an indication about the Cutsem argues that languages can shape our thought, for in- completeness and extensiveness regarding a particular cri- stance when a gesture can be declaratively described by its terion. Our data set, an up-to-date discussion as well as requirements rather than through an imperative implementa- some arguments for the indicative scoring for each criterion tion with manual state management. A gesture programming of the different approaches is available at http://soft.vub.ac. language can also be seen as a simplifier where, for example, be/∼lhoste/research/criteria/images/img-data.js. We tried to multiple inheritance might not be helpful to describe gestural cluster the approaches based on the most up-to-date infor- interaction. Finally, domain-specific languages can be used as mation. However, some of the criteria could only be evalu- a law enforcer. Some gesture languages, such as Proton [16], ated subjectively and might therefore be adjusted later based disallow a specific sequence of events simply because it over- on discussions during the workshop. An interactive visu- laps with another gesture definition. It further enables the in- alisation of the criteria for each of the approaches can be ference of properties that help domain-specific algorithms to accessed via http://soft.vub.ac.be/∼lhoste/research/criteria. obtain better classification results or reduced execution time. Note that the goal of this assessment was to identify general trends rather than to draw a conclusive categorisation for each Midas [7, 22] by Hoste et al., the Gesture Description Lan- approach. guage (GDL) by Khandkar et al. [14], GeFormt by Kam- mer et al. [13] and the Gesture Description Language (GDL) by Echtler et al. [4] form a first generation of declarative lan- Software Engineering and Processing Engine guages that allow programmers to easily describe multi-touch The following criteria have an effect on the software engi- gestures rather than having to imperatively program the ges- neering properties of the gesture implementation. Further- tures. This fundamental change in the development of ges- more, these criteria might require the corresponding features tures moved large parts of the accidental complexity—such as to be implemented by the processing engine. the manual maintenance of intermediate results and the exten- sion of highly entangled imperative code—to the processing Modularisation engine. By modularising gesture definitions we can reduce the effort to add an extra gesture. In many existing approaches the en- With these existing solutions, gestures are described in a tanglement of gesture definitions requires developers to have domain-specific language as a sequence or simultaneous oc- a deep knowledge about already implemented gestures. This currences of events from one or multiple fingers. The mod- is a clear violation of the separation of concerns principle, ularisation and outsourcing of the event matching process one of the main principles in software engineering which dic- paved the way for the rapid development of more complex tates that different modules of code should have as little over- multi-touch gestures. With the advent of novel hardware lapping functionality as possible. Therefore, in modular ap- such as the Microsoft Kinect sensor with a similar or even proaches, each gesture specification is written in its own sep- higher level of complexity, domain-specific solutions quickly arate context (i.e. separate function, rule or definition). became a critical component for supporting advanced gestu- ral interaction. The definition of complex gestures involves Composition a number of concerns that have to be addressed. The goal of Composition allows programmers to abstract low-level com- this paper is to enumerate these criteria for gesture program- plexity by building complex gestures from simpler building ming languages and to provide an overview how existing ap- blocks. For instance, the definition of a double tap gesture proaches focus on different of these criteria. can be based on the composition of two tap gestures with a defined maximum time and space interval between them. A CRITERIA tap gesture can then be defined by a touch down event shortly We define a number of criteria which can shape (1) the choice followed by a touch up event and with minimal spatial move- of a particular framework, (2) the implementation of the ges- ment in between. Composition is supported by approaches ture and (3) novel approaches to solve open issues in ges- when a developer can reuse multiple modular specifications ture programming languages. These criteria were compiled to define more complex gestures without much further effort. based on various approaches that we encountered over the last few years, also including the domains of machine learn- Customisation ing or template matching which are out of the scope of this Customisation is concerned with the effort a developer faces paper. We aligned the terminology, such as gesture spotting to modify a gesture definition in order that it can be used in 2 http://soft.vub.ac.be/∼tvcutsem/invokedynamic/node/11 a different context. How easy is it for example to adapt the 23 30 1 2 30 1 2 30 1 2 5 5 5 29 3 29 3 29 3 28 4 28 4 28 4 4 4 4 27 5 27 5 27 5 3 3 3 26 6 26 6 26 6 2 2 2 25 7 25 7 25 7 1 1 1 24 8 24 8 24 8 23 9 23 9 23 9 22 10 22 10 22 10 21 11 21 11 21 11 20 12 20 12 20 12 19 13 19 13 19 13 18 14 18 14 18 14 17 16 15 17 16 15 17 16 15 (a) Midas [22] (b) GDL (Khandkar) [14] (c) GeForMT [13] 30 1 2 30 1 2 30 1 2 5 5 5 29 3 29 3 29 3 28 4 28 4 28 4 4 4 4 27 5 27 5 27 5 3 3 3 26 6 26 6 26 6 2 2 2 25 7 25 7 25 7 1 1 1 24 8 24 8 24 8 23 9 23 9 23 9 22 10 22 10 22 10 21 11 21 11 21 11 20 12 20 12 20 12 19 13 19 13 19 13 18 14 18 14 18 14 17 16 15 17 16 15 17 16 15 (d) GDL (Echtler) [4, 2] (e) Proton [16] (f) GestureAgents [10] 30 1 2 30 1 2 30 1 2 5 5 5 29 3 29 3 29 3 28 4 28 4 28 4 4 4 4 27 5 27 5 27 5 3 3 3 26 6 26 6 26 6 2 2 2 25 7 25 7 25 7 1 1 1 24 8 24 8 24 8 23 9 23 9 23 9 22 10 22 10 22 10 21 11 21 11 21 11 20 12 20 12 20 12 19 13 19 13 19 13 18 14 18 14 18 14 17 16 15 17 16 15 17 16 15 (g) EventHurdle [15] (h) GestIT [23] (i) ICO [6] Figure 1. Indicative classification of gesture programming solutions. The labels are defined as follows: (1) modularisation, (2) composition, (3) customi- sation, (4) readability, (5) negation, (6) online gestures, (7) offline gestures, (8) partially overlapping gestures, (9) segmentation, (10) event expiration, (11) concurrent interaction, (12) portability, serialisation and embeddability, (13) reliability, (14) graphical user interface symbiosis, (15) activation policy, (16) dynamic binding, (17) runtime definitions, (18) scalability in terms of performance, (19) scalability in terms of complexity, (20) identifica- tion and grouping, (21) prioritisation and enabling, (22) future events, (23) uncertainty, (24) verification and user profiling, (25) spatial specification, (26) temporal specification, (27) other spatio-temporal features, (28) scale and rotation invariance, (29) debug tooling, (30) editor tooling. 24 existing definition of a gesture when an extra condition is re- Partially Overlapping Gestures quired or the order of events should be changed? For graphi- Several conditions of a gesture definition can be partially cal programming toolkits, the customisation aspect is broad- or fully contained in another gesture definition. This might ened to how easy it is to modify the automatically generated be intentional (e.g. if composition is not supported nor pre- code and whether this is possible at all. Note that in many ferred) or unintentional (e.g. if two different gestures start machine learning approaches customisation is limited due to with the same movement). Keeping track of multiple partial the lack of a decent external representation [11]. matches is a complex mechanism that is supported by some approaches, intentionally blocked by others (e.g. Proton) or Readability ignored by some approaches. Kammer et al. [13, 12] identified that gesture definitions are more readable when understandable keywords are used. They Segmentation present a statistical evaluation of the readability of various A stream of sensor input events might not contain explicit gesture languages which has been conducted with a number hints about the start and end of a gesture. The segmenta- of students in a class setting. In contrast to the readabil- tion concern (also called gesture spotting) gains importance ity, they further define complexity as the number of syntactic given the trend towards the continuous capturing and free- rules that need to be followed for a correct gesture descrip- air interaction such as the Kinect sensor and Z-touch [25], tion, including for example the number of brackets, colons where a single event stream can contain many potential start or semicolons. Languages with a larger number of syntactic events. The difficulty of gesture segmentation manifests it- rules are perceived to be more complex. However, it should self when one cannot know beforehand which potential start be noted that the complexity of a language as defined by events should be used until a middle or even an end candi- Kammer et al. is different from the level of support to form date event is found to form the decisive gesture trajectory. It complex gestures and we opted to include their definition un- is possible that potential begin and end events can still be re- der the readability criterion. placed by better future events. For instance, how does one decide when a flick right gesture (in free air) starts or ends Negation without any knowledge about the future? This generates a lot Negation is a feature that allows developers to express a con- of gesture candidates and increases the computational com- text that should not be true in a particular gesture definition. plexity. Some approaches tackle this issue by using a ve- Many approaches partially support this feature by requiring a locity heuristic with a slack variable (i.e. a global constant strict sequence of events, implying that no other events should defined by the developer) or by applying an efficient incre- happen in between. However, it is still crucial to be able to de- mental computing engine. However most language-based ap- scribe explicit negation for some scenarios such as that there proaches are lacking this functionality. Many solutions make should be no other finger in the spatial neighbourhood or a use of a garbage gesture model to increase the accuracy of the finger should not have moved up before the start of a gesture. gesture segmentation process. Other approaches try to use a garbage state in their gesture model to reduce false positives. This garbage state is simi- Event Expiration lar to silence models in speech processing and captures non- The expiration of input events is required to keep the memory gestures by “stealing” partial gesture state and resetting the and processing complexity within certain limits. The man- recognition process. ual maintenance of events is a complex task and most frame- works offer at least a simple heuristic to automatically expire Online Gestures old events. In multi-touch frameworks, a frequently used ap- Some gestures such as a pinch gesture for zooming require proach is to keep track of events from the first touch down feedback while the gesture is being performed. These so- event to the last touch up of any finger. This might introduce called online gestures can be supported in a framework by some issues when dealing with multiple users if there is al- allowing small online gesture definitions or by providing ad- ways at least one active finger touching the table. Another vanced constructs that offer a callback mechanism with a per- approach is to use a timeout parameter, effectively creating centage of the progress of the larger gesture. Note that small a sliding window solution. An advantage of this approach is online gesture definitions are linked with the segmentation that the maximum memory usage is predefined, however a criterion which defines that gestures can form part of a con- slack value is required. A static analysis of the gesture defi- tinuous event stream without an explicit begin and end condi- nitions could help to avoid the need for such a static value. tion. Concurrent Interaction Offline Gestures In order to allow concurrent interaction, one has to keep track Offline gestures are executed when the gesture is completely of multiple partial instances of a gesture recognition process. finished and typically represent a single command. These For instance, multiple fingers, hands, limbs or users can per- gestures are easier to support in gesture programming lan- form the same gesture at the same time. To separate these guages as they need to pass the result to the application once. instances, the framework can offer constructs or native sup- Offline gestures also increase the robustness due to the ability port for concurrent gesture processing. In some scenarios, it to validate the entire gesture. The number of future events is hard to decide which touch events belong to which hand or that can change the correctness of the gesture is limited when user. For example, in Proton the screen can be split in half compared to online gestures. to support some two player games. A better method is to set 25 a maximum bounding box of the gesture [4] or to define the least one complete circular movement is required and after- spatial properties of each gesture condition. The use of GUI- wards each incremental part (e.g. per quarter) causes a gesture specific contextual information can also serve as a separation activation. mechanism. Nevertheless, it is not always possible to know in advance which combination of fingers will form a gesture, Dynamic Binding leading to similar challenges as discussed for the segmenta- Dynamic binding is a feature that allows developers to define tion criterion where multiple gesture candidates need to be a variable without a concrete value. For instance, the x loca- tracked. tion of an event A should be between 10 and 50 but should be equal to the x location of an event B. At runtime, a value Portability, Serialisation and Embeddability of 20 for the x location of event A will therefore require an Concerns such as portability, serialisation and embeddability event B with the same value of 20. This is particularly use- form the platform independence of an approach. Portability ful to correlate different events if the specification of concrete is defined by how easy it is to run the framework on different values is not feasible. platforms. Some approaches are tightly interwoven with the host language which limits portability and the transfer of a Runtime Definitions gesture definition over the network. This transportation can Refining gesture parameters or outsourcing gesture defini- be used to exchange gesture sets between users or even to tions to gesture services requires a form of runtime modi- offload the gesture recognition process to a dedicated server fication support by the framework. The refinement can be with more processing power. The exchange requires a form instantiated by an automated algorithm (e.g. an optimisation of serialisation of the gesture definitions which is usually al- heuristic) by the developer (during a debugging session) or ready present in domain-specific languages. The embeddabil- by the user to provide their preferences. ity has to do with the way how the approach can be used. Is it Scalability in Terms of Performance necessary to have a daemon process or can the abstractions be The primary goal of gesture languages is to provide an ab- delivered as a library? Another question is whether it is pos- straction level which helps developers to express complex sible to use the abstractions in a different language or whether relations between input events. However, with multimodal this requires a reimplementation. setups, input continuously enters the system and the user ex- Reliability pects the system to immediately react to their gestures. There- The dynamic nature of user input streams implies the pos- fore, performance and the scalability when having many ges- sibility of an abundance of information in a short period of ture definitions is important. Some approaches, such as time. A framework might offer a maximum computational Midas, exploit the language constructs to form an optimised boundary for a given setting [19]. Without such a boundary, direct acyclic graph based on the Rete algorithm [5]. This users might trigger a denial of service when many complex generates a network of the gesture conditions, allows the interactions have to be processed at the same time. Addition- computational sharing between them and keeps track of par- ally, low-level functionality should be encapsulated without tial matches without further developer effort. In recent exten- providing leaky language abstractions that could form poten- sions, Midas has been parallelised and benchmarked with up tial security issues. to 64 cores [19] and then distributed such that multiple ma- chines can share the workload [24]. Other approaches such as Graphical User Interface Symbiosis Proton and GDL by Echtler et al. rely on finite state machines. The integration of graphical user interface (GUI) components However, it is unclear how these approaches can be used with removes the need for a single entry point of gesture callbacks continuous sensor input where segmentation is a major issue. on the application level. With contextual information and EventHurdle [15] tackles this problem by using relative po- GUI-specific gesture conditions the complexity is reduced. sitions between the definitions but might miss some gestures For instance, a scroll gesture can only happen when both fin- due to the non-exhaustive search [8]. gers are inside the GUI region that supports scrolling [4]. This further aids the gesture disambiguation process and thus in- Scalability in Terms of Complexity creases the gesture recognition quality. Another use case is The modularity of gestures allows for a much better scalabil- when a tiny GUI object needs to be rescaled or rotated. The ity in terms of complexity. When adding an extra gesture, no gesture can be defined as such that one finger should be on or minimal knowledge about existing definitions is required. top of the GUI component while the other two fingers are ex- However, when multiple gestures are recognised in the same ecuting a pinch or rotate gesture in the neighbourhood. pool of events, the developer needs to check whether they can co-exist (e.g. performed by different users) or are con- Activation Policy flicting (i.e. deciding between rotation, scaling or both). A Whenever a gesture is recognised, an action can be executed. lot of work remains to be done to disambiguate gestures. For In some cases the developer wants to provide a more detailed example, how do we cope with the setting of priorities or dis- activation policy such as trigger only once or trigger when ambiguation rules between gestures when they are detected entering and leaving a pose. Another example is the sticky at a different timestamp? How can we cope with these dis- bit [4] option that activates the gesture for a particular GUI ambiguation issues when there are many gesture definitions? object. A shoot-and-continue policy [9] denotes the execu- Furthermore, it is unclear how we need to deal with many tion of a complete gesture followed by an online gesture ac- variants of a similar gesture in order that the correct one is tivation. The latter can be used for a lasso gesture where at used during the composition of a complex gesture. 26 Gesture Disambiguation activate other gestures. To analyse gestures based on impre- When multiple gesture candidates are detected from the same cise primitive events, a form of uncertainty is required. This event source, the developer needs a way to discriminate be- might also percolate to higher level gestures (e.g. when two tween them. However, this is not a simple task due to the lack uncertain subgestures are being composed). The downside is of detail in sensor information, unknown future events and that this introduces more complexity for the developer. the uncertainty of the composition of the gesture. Verification and User Profiling Identification and Grouping Whenever a gesture candidate is found, it might be verified The identification problem is related to the fact that sensor using an extra gesture classifier. An efficient segmentation input is not always providing enough details to disambiguate approach may, for example, be combined with a more elabo- a scenario. Echtler et al. [3] demonstrate that two fingers rate classification process to verify whether a detected recog- from different hands cannot be distinguished from two fin- nition is adequate. Verification can also be used to further gers of the same hand on a multi-touch table due to the lack separate critical gestures (e.g. file deletion) from simple ges- of shadowing information. Furthermore, when a finger is tures (e.g. scaling). Note that couples of classifiers (i.e. en- lifted from the table and put down again, there is no easy way sembles) are frequently used in the machine learning domain. to verify whether it is the same finger. Therefore, a double In order to further increase the gesture recognition accuracy, tap gesture cannot easily be distinguished from a two finger a developer can offer a form of user profiling for gestures roll. Similar issues exist with other types of sensors such as that are known to cause confusion. Either a gesture is speci- when a user leaves the viewing angle of a sensor and later fied too precisely for a broader audience or it is specified too enters again. Multimodal fusion helps addressing the identi- loosely for a particular user. This influences the recognition fication problem. The grouping problem is potentially more results and accidental activations. Therefore, the profiling of complex to solve. For instance, when multiple people are users by tracking undo operations or multiple similar invo- dancing in pairs, it is sometimes hard to see who is dancing cations could lead to an adaptation of the gesture definition with whom. Therefore the system needs to keep track of al- for that particular user. User profiling is valuable to improve ternative combinations for a longer time period to group the recognition rates but it might also be interesting to exploit it individuals. Many combinations of multi-touch gestures are for context-sensitive cases. Future work is needed to offer possible when fingers are located near each other. profiling as a language feature. Prioritisation and Enabling Gesture Specification The annotation of gestures with different priority levels is a The description of a gesture requires a number of primitive first form of prioritisation which can have an impact on the statements such as spatial and temporal relations between gesture recognition accuracy. However, it requires knowl- multiple events as described in the following. edge about existing gestures and if there are many gestures it might not be possible to maintain a one-dimensional priority Spatial Specification schema. Nacenta et al. [20] demonstrate that we should not We define the spatial specification of a gesture as the prim- distinguish between a scale and rotate gesture on the frame- itive travelled path to which it has to adhere. The path can by-frame level but by using specialised prioritisation rules be formed by sequential or parallel conditions (expressed by such as magnitude filtering or visual handles. A developer using temporal constructs) where events are constrained in a can further decide to enable or disable certain gestures based spatial dimension such as 10 < event1.x < 50. The use of on the application context or other information. relative spatial operators (e.g. event1.x + 30 > event2.x as used in [8, 15]) also seem useful to process non-segmented Future Events sensor information. Note that approximation is required to One of the major issues with gesture disambiguation is that support the variability of a gesture execution. information in the near future can lead to a completely differ- ent interpretation of a gesture. A gesture definition can, for Temporal Specification instance, fully overlap with a larger, higher prioritised ges- Gestures can be described in multiple conditions. However, ture. At a given point in time, it is difficult to decide whether these conditions cannot always be listed in a sequential order. the application should be informed that a particular gesture Therefore, most gesture languages allow the developer to ex- has been detected or whether we wait for a small time period. press explicit temporal relations between the conditions. An If future events show that the larger gesture does not match, example of such a temporal relation is that two events should users might perceive the execution of the smaller gesture as (or should not) happen within a certain time period. unresponsive. Late contextual information might also influ- Other Spatio-temporal Features ence the fusion process of primitive events that are still in the running to form part of more complex gestures. The question Many frameworks offer additional features to describe a ges- is whether these fused events should be updated to reflect the ture. For instance, Kammer et al. [13] use atomic blocks to new information and how a framework can support this. specify a gesture. These atomic blocks are preprocessors such as direction (e.g. north or southwest) that abstract all low- Uncertainty level details from the developer. They can also rely on a tem- Noise is an important parameter when dealing with gestural plate engine to offer more complex atomic building blocks. interaction. The jittering of multi-touch locations or limb po- This integration is a form of composition and is an efficient sitions might invalidate intended gestures or unintentionally way to describe gestures. Kinematic features can be used 27 to filter gestures based on motion vectors, translation, diver- Midas 40 gence, curl or deformation. Khandkar et al. [14] offer a closed GDL-K GeForMT loop feature to describe that the beginning and end event of a GDL-E 30 gesture should be approximately at the same location. Proton GestureAgents EventHurdle Scale and Rotation Invariance GestIT 20 Scale invariance deals with the recognition of a single gesture ICO trajectory regardless of its scale. Similarly, rotation invari- ance is concerned with the rotation. Most approaches offer 10 this feature by rescaling and rotating the trajectory to a stan- dard predefined size and centroid. However, a major limita- 0 tion is that scale and rotation invariance requires segmenta- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 tion and therefore does not work well for online gestures. Figure 2. A stacked bar chart of the summed values for each criterion Debug Tooling In order to debug gestures, developers usually apply prere- primitive swipes. However, more than one primitive swipe corded positive and negative gesture sets to see whether the variation might exist (long/short, fast/slow) and choosing the given definition is compatible with the recorded data. Gesture correct one without introducing conflicts is challenging. A debugging tools have received little attention in research and way to deal with the issue is to provide test data but this is are typically limited to the testing of accuracy percentages. work intensive. There might be software engineering-based It might be interesting to explore more advanced debugging abstractions that could improve this situation. support such as notifying the developer of closely related ges- Information coming from (22) events in the near future can ture trajectories (e.g. gesture a is missed by x units [17]). lead to a completely different interpretation of a gesture. Es- Editor Tooling pecially in a multimodal context where context or clarifying information comes from additional sensors. An active ges- Gesture development can be carried out in a code-compatible ture might fully overlap with a larger and higher prioritised graphical manner such as done with tablatures [16], hur- gesture or be part of a composition that might or might not dles [15] or spatially [18]. When dealing with 3D input events succeed. In other cases the gesture should not be triggered at from a Kinect, a graphical representation is valuable to get to all due (late) context data. There are currently no adequate the correct spatial coordinates. abstractions helping developers to deal with these concerns. DISCUSSION Finally, we would like to highlight that (23) uncertainty and We can identify some general trends with regard to open (24) user profiling abstractions are also lacking. Dealing with issues and underrepresented criteria in existing work. Fig- uncertainty is currently completely hidden from the program- ure 2 reveals that some criteria such as (1) modularisation and mer. Nevertheless, a gesture might be better neglected when (7) offline gestures are well supported by most approaches composing from two or more uncertain subgestures. Addi- and are important to provide a minimal ability to program tionally, when a user consistently undoes an operation exe- gestures in a structured manner. Commonly used multi-touch cuted by a particular gesture, the gesture might require some gestures such as pinch and rotate are (6) online gestures sup- adjustments. This can be checked by profiling users and by ported by most frameworks. However, additional work can offering several resolutions strategies. be done to streamline the implementation of online gestures by providing (2) composition support, a deeper (14) GUI in- CONCLUSION tegration and more advanced (16) activation policies. We be- Gesture programming languages allow developers to more lieve that these challenges can be resolved with additional en- easily express their interaction patterns. When designing such gineering effort in existing systems. a language, a number of concerns need to be addressed. Our goal was to categorise and explicitly expose these design de- However, we see that more challenging issues such as (9) seg- cisions to provide a better understanding and foster a discus- mentation, (19) scalability in terms of complexity or dealing sion about challenges, opportunities and future directions for with (22) future events are poorly or not at all supported in gesture programming languages. We observed a number of existing approaches. (9) Segmentation is crucial to deal with underrepresented concerns in existing work and highlighted continuous streams of information where no hints are given challenges for future gesture programming languages. by the sensor with regard to potential start and end condi- tions. The recent trend towards near-touch sensors and skele- REFERENCES tal tracking algorithms makes the segmentation concern of 1. Brooks, Jr., F. P. No Silver Bullet: Essence and crucial importance. Accidents of Software Engineering. IEEE Computer 20, With the adoption of the discussed frameworks there is an 4 (April 1987), 10–19. increasing demand to deal with (19) scalability in terms of 2. Echtler, F., and Butz, A. GISpL: Gestures Made Easy. In complexity. It is currently rather difficult to get an overview Proceedings of TEI 2012, 6th International Conference on how many gestures work together. For instance, when a on Tangible, Embedded and Embodied Interaction two-finger swipe needs to be implemented by composing two (Kingston, Canada, February 2012), 233–240. 28 3. Echtler, F., Huber, M., and Klinker, G. Shadow Tracking 15. Kim, J.-W., and Nam, T.-J. EventHurdle: Supporting on Multi-Touch Tables. In Proceedings of AVI 2008, 9th Designers’ Exploratory Interaction Prototyping with International Working Conference on Advanced Visual Gesture-based Sensors. In Proceedings of CHI 2013, Interfaces (Napoli, Italy, May 2008), 388–391. 31th ACM Conference on Human Factors in Computing Systems (Paris, France, April 2013), 267–276. 4. Echtler, F., Klinker, G., and Butz, A. Towards a Unified Gesture Description Language. In Proceedings of HC 16. Kin, K., Hartmann, B., DeRose, T., and Agrawala, M. 2010, 13th International Conference on Humans and Proton: Multitouch Gestures as Regular Expressions. In Computers (Aizu-Wakamatsu, Japan, December 2010), Proceedings of CHI 2012, 30th ACM Conference on 177–182. Human Factors in Computing Systems (Austin, Texas, USA, November 2012), 2885–2894. 5. Forgy, C. L. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial 17. Long Jr, A. C., Landay, J. A., and Rowe, L. A. Intelligence 19, 1 (1982), 17–37. Implications for a Gesture Design Tool. In Proceedings of CHI 1999, 17th ACM Conference on Human Factors 6. Hamon, A., Palanque, P., Silva, J. L., Deleris, Y., and in Computing Systems (Pittsburgh, USA, 1999), 40–47. Barboni, E. Formal Description of Multi-Touch Interactions. In Proceedings of EICS 2013, 5th 18. Marquardt, N., Kiemer, J., Ledo, D., Boring, S., and International Symposium on Engineering Interactive Greenberg, S. Designing User-, Hand-, and Computing Systems (London, UK, June 2013), 207–216. Handpart-Aware Tabletop Interactions with the TouchID Toolkit. Tech. Rep. 2011-1004-16, Department of 7. Hoste, L. Software Engineering Abstractions for the Computer Science, University of Calgary, Calgary, Multi-Touch Revolution. In Proceedings of ICSE 2010, Canada, 2011. 32nd International Conference on Software Engineering (Cape Town, South Africa, May 2010), 509–510. 19. Marr, S., Renaux, T., Hoste, L., and De Meuter, W. Parallel Gesture Recognition with Soft Real-Time 8. Hoste, L., De Rooms, B., and Signer, B. Declarative Guarantees. Science of Computer Programming Gesture Spotting Using Inferred and Refined Control (February 2014). Points. In Proceedings of ICPRAM 2013, 2nd International Conference on Pattern Recognition 20. Nacenta, M. A., Baudisch, P., Benko, H., and Wilson, A. Applications and Methods (Barcelona, Spain, February Separability of Spatial Manipulations in Multi-touch 2013), 144–150. Interfaces. In Proceedings of GI 2009, 35th Graphics Interface Conference (Kelowna, Canada, May 2009), 9. Hoste, L., and Signer, B. Water Ball Z: An Augmented 175–182. Fighting Game Using Water as Tactile Feedback. In Proceedings of TEI 2014, 8th International Conference 21. Nuwer, R. Armband Adds a Twitch to Gesture Control. on Tangible, Embedded and Embodied Interaction New Scientist 217, 2906 (March 2013). (Munich, Germany, February 2014), 173–176. 22. Scholliers, C., Hoste, L., Signer, B., and De Meuter, W. 10. Julia, C. F., Earnshaw, N., and Jorda, S. GestureAgents: Midas: A Declarative Multi-Touch Interaction An Agent-based Framework for Concurrent Multi-Task Framework. In Proceedings of TEI 2011, 5th Multi-User Interaction. In Proceedings of TEI 2013, 7th International Conference on Tangible, Embedded and International Conference on Tangible, Embedded and Embodied Interaction (Funchal, Portugal, January Embodied Interaction (Barcelona, Spain, February 2011), 49–56. 2013), 207–214. 23. Spano, L. D., Cisternino, A., Paternò, F., and Fenu, G. 11. Kadous, M. W. Learning Comprehensible Descriptions GestIT: A Declarative and Compositional Framework of Multivariate Time Series. In Proceedings of ICML for Multiplatform Gesture Definition. In Proceedings of 1999, 16th International Conference on Machine EICS 2013, 5th International Symposium on Learning (Bled, Slovenia, June 1999), 454–463. Engineering Interactive Computing Systems (London, UK, June 2013), 187–196. 12. Kammer, D. Formalisierung gestischer Interaktion für Multitouch-Systeme. PhD thesis, Technische Universität 24. Swalens, J., Renaux, T., Hoste, L., Marr, S., and Dresden, 2013. De Meuter, W. Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors. In Proceedings of 13. Kammer, D., Wojdziak, J., Keck, M., Groh, R., and AGERE! 2013, 3rd International Workshop on Taranko, S. Towards a Formalization of Multi-Touch Programming based on Actors, Agents, and Gestures. In Proceedings of ITS 2010, 5th International Decentralized Control (Indianapolis, USA, October Conference on Interactive Tabletops and Surfaces 2013), 3–12. (Saarbrücken, Germany, November 2010), 49–58. 25. Takeoka, Y., Miyaki, T., and Rekimoto, J. Z-Touch: An 14. Khandkar, S. H., and Maurer, F. A Domain Specific Infrastructure for 3D Gesture Interaction in the Language to Define Gestures for Multi-Touch Proximity of Tabletop Surfaces. In Proceedings of ITS Applications. In Proceedings of DSM 2010, 10th 2010, 5th International Conference on Interactive Workshop on Domain-Specific Modeling (Reno/Tahoe, Tabletops and Surfaces (Saarbrücken, Germany, USA, October 2010). November 2010), 91–94. 29