Criteria, Challenges and Opportunities for
                              Gesture Programming Languages

                                                         Lode Hoste and Beat Signer
                                                   Web & Information Systems Engineering Lab
                                                           Vrije Universiteit Brussel
                                                      Pleinlaan 2, 1050 Brussels, Belgium
                                                          {lhoste,bsigner}@vub.ac.be


ABSTRACT                                                                                    For example, commercial multi-touch hardware has evolved
An increasing number of today’s consumer devices such as                                    from simple two-finger support to multi-user tracking with up
mobile phones or tablet computers are equipped with various                                 to 60 fingers1 . Similarly, commercial depth sensors such as
sensors. The extraction of useful information such as ges-                                  the Microsoft Kinect were introduced in 2010 and supported
tures from sensor-generated data based on mainstream im-                                    the tracking of 20 skeletal joints (i.e. tracking arms and limbs
perative languages is a notoriously difficult task. Over the                                in 3D space). Nowadays, numerous depth sensors such as the
last few years, a number of domain-specific programming                                     Leap sensors or the DepthSense cameras by SoftKinetic also
languages have been proposed to ease the development of                                     provide short-range finger tracking. Recently, the Kinect for
gesture detection. Most of these languages have adopted a                                   Windows supports facial expressions and the Kinect 2 sup-
declarative approach allowing programmers to describe their                                 ports heart beat and energy level tracking. This rapid evolu-
gestures rather than having to manually maintain a history                                  tion of novel input modalities continues with announcements
of event data and intermediate gesture results. While these                                 such as the Myo electromyography gesture armband [21] and
declarative languages represent a clear advancement in ges-                                 tablet computers with integrated depth sensors, fingerprint
ture detection, a number of issues are still unresolved. In this                            scanning and eye tracking.
paper we present relevant criteria for gesture detection and
                                                                                            In this paper, we consider a gesture as a movement of the
provide an initial classification of existing solutions based on
                                                                                            hands, face or other parts of the body in time. Due to the high
these criteria in order to foster a discussion and identify op-
                                                                                            implementation complexity, most gesture recognition solu-
portunities for future gesture programming languages.
                                                                                            tions rely on machine learning algorithms to extract gestu-
                                                                                            ral information from sensors. However, the costs of apply-
Author Keywords
                                                                                            ing machine learning algorithms are not to be underestimated.
Gesture language; multimodal interaction; declarative
                                                                                            The capture and annotation of training and test data requires
programming.
                                                                                            substantial resources. Further, the tweaking of the correct
                                                                                            learning parameters and analysis for overfitting require some
ACM Classification Keywords
                                                                                            expert knowledge. Last but not least, one cannot decisively
H.5.m. Information Interfaces and Presentation (e.g. HCI):                                  observe and control what has actually been learned. There-
Miscellaneous                                                                               fore, it is desired to have the possibility to program gestures
                                                                                            and to ease the programming of gestural interaction. We ar-
INTRODUCTION
                                                                                            gue that research in software engineering abstractions is of
With the increasing interest in multi-touch surfaces (e.g. Sony                             utmost importance for gesture computing.
Tablet, Microsoft Surface or Apple iPad), controller-free sen-
sors (e.g. Leap Motion, Microsoft Kinect or Intel’s Percep-                                 In software engineering, a problem can be divided into its ac-
tual SDK) and numerous sensing appliances (e.g. Seeeduino                                   cidental and essential complexity [1]. Accidental complexity
Films and Nike+ Fuel), developers are facing major chal-                                    relates to the difficulties a programmer faces due to the choice
lenges in integrating these modalities into common appli-                                   of software engineering tools and can be reduced by select-
cations. Existing mainstream imperative programming lan-                                    ing or developing better tools. On the other hand, essential
guages cannot cope with user interaction requirements due to                                complexity is caused by the characteristics of the problem to
the inversion of control where the execution flow is defined                                be solved and cannot be reduced. The goal of gesture pro-
by input events rather than by the program, the high program-                               gramming languages is to reduce the accidental complexity
ming effort for maintaining an event history and the difficulty                             as much as possible. In this paper, we define a number of cri-
of expressing complex patterns.                                                             teria to gain an overview about the focus of existing gesture
                                                                                            programming languages and to identify open challenges to be
                                                                                            further discussed and investigated.
EGMI 2014, 1st International Workshop on Engineering Gestures for Multimodal In-
terfaces, June 17 2014, Rome, Italy.
Copyright © 2014 for the individual papers by the papers’ authors. Copying permitted
only for private and academic purposes. This volume is published and copyrighted by         1
its editors.                                                                                    3M Multi-Touch Display C4667PW
http://ceur-ws.org/Vol-1190/.

                                                                                       22
MOTIVATION AND RELATED WORK                                              and segmentation, and performed an indicative evaluation of
Gesture programming languages are designed to support de-                nine existing gesture languages as shown in Figure 1. For
velopers in specifying their gestural interaction requirements           each individual criterion of these nice approaches we provide
more easily than with general purpose programming lan-                   a score ranging from 0 to 5 together with a short explana-
guages. A domain-specific language might help to reduce                  tion which can be found in the online data set. A score of 1
the repetitive boilerplate that cannot be removed in existing            means that the approach could theoretically support the fea-
languages as described by Van Cutsem2 . General purpose                  ture but it was not discussed in the literature. A score of 2
programming languages such as Java sometimes require an                  indicates that there is some initial support but with a lack
excessive amount of constructs to express a developer’s in-              of additional constructs that would make it useful. Finally,
tention which makes them hard to read and maintain. Van                  a score in the range of 3-5 provides an indication about the
Cutsem argues that languages can shape our thought, for in-              completeness and extensiveness regarding a particular cri-
stance when a gesture can be declaratively described by its              terion. Our data set, an up-to-date discussion as well as
requirements rather than through an imperative implementa-               some arguments for the indicative scoring for each criterion
tion with manual state management. A gesture programming                 of the different approaches is available at http://soft.vub.ac.
language can also be seen as a simplifier where, for example,            be/∼lhoste/research/criteria/images/img-data.js. We tried to
multiple inheritance might not be helpful to describe gestural           cluster the approaches based on the most up-to-date infor-
interaction. Finally, domain-specific languages can be used as           mation. However, some of the criteria could only be evalu-
a law enforcer. Some gesture languages, such as Proton [16],             ated subjectively and might therefore be adjusted later based
disallow a specific sequence of events simply because it over-           on discussions during the workshop. An interactive visu-
laps with another gesture definition. It further enables the in-         alisation of the criteria for each of the approaches can be
ference of properties that help domain-specific algorithms to            accessed via http://soft.vub.ac.be/∼lhoste/research/criteria.
obtain better classification results or reduced execution time.          Note that the goal of this assessment was to identify general
                                                                         trends rather than to draw a conclusive categorisation for each
Midas [7, 22] by Hoste et al., the Gesture Description Lan-              approach.
guage (GDL) by Khandkar et al. [14], GeFormt by Kam-
mer et al. [13] and the Gesture Description Language (GDL)
by Echtler et al. [4] form a first generation of declarative lan-        Software Engineering and Processing Engine
guages that allow programmers to easily describe multi-touch             The following criteria have an effect on the software engi-
gestures rather than having to imperatively program the ges-             neering properties of the gesture implementation. Further-
tures. This fundamental change in the development of ges-                more, these criteria might require the corresponding features
tures moved large parts of the accidental complexity—such as             to be implemented by the processing engine.
the manual maintenance of intermediate results and the exten-
sion of highly entangled imperative code—to the processing               Modularisation
engine.                                                                  By modularising gesture definitions we can reduce the effort
                                                                         to add an extra gesture. In many existing approaches the en-
With these existing solutions, gestures are described in a               tanglement of gesture definitions requires developers to have
domain-specific language as a sequence or simultaneous oc-               a deep knowledge about already implemented gestures. This
currences of events from one or multiple fingers. The mod-               is a clear violation of the separation of concerns principle,
ularisation and outsourcing of the event matching process                one of the main principles in software engineering which dic-
paved the way for the rapid development of more complex                  tates that different modules of code should have as little over-
multi-touch gestures. With the advent of novel hardware                  lapping functionality as possible. Therefore, in modular ap-
such as the Microsoft Kinect sensor with a similar or even               proaches, each gesture specification is written in its own sep-
higher level of complexity, domain-specific solutions quickly            arate context (i.e. separate function, rule or definition).
became a critical component for supporting advanced gestu-
ral interaction. The definition of complex gestures involves             Composition
a number of concerns that have to be addressed. The goal of              Composition allows programmers to abstract low-level com-
this paper is to enumerate these criteria for gesture program-           plexity by building complex gestures from simpler building
ming languages and to provide an overview how existing ap-               blocks. For instance, the definition of a double tap gesture
proaches focus on different of these criteria.                           can be based on the composition of two tap gestures with a
                                                                         defined maximum time and space interval between them. A
CRITERIA
                                                                         tap gesture can then be defined by a touch down event shortly
We define a number of criteria which can shape (1) the choice            followed by a touch up event and with minimal spatial move-
of a particular framework, (2) the implementation of the ges-            ment in between. Composition is supported by approaches
ture and (3) novel approaches to solve open issues in ges-               when a developer can reuse multiple modular specifications
ture programming languages. These criteria were compiled                 to define more complex gestures without much further effort.
based on various approaches that we encountered over the
last few years, also including the domains of machine learn-             Customisation
ing or template matching which are out of the scope of this              Customisation is concerned with the effort a developer faces
paper. We aligned the terminology, such as gesture spotting              to modify a gesture definition in order that it can be used in
2
    http://soft.vub.ac.be/∼tvcutsem/invokedynamic/node/11                a different context. How easy is it for example to adapt the


                                                                    23
                             30    1       2                                                             30    1       2                                                             30    1        2
                                       5                                                                           5                                                                           5
                     29                          3                                                 29                       3                                                 29                         3

                28                                    4                                       28                                 4                                       28                                   4
                                       4                                                                           4                                                                           4


           27                                              5                             27                                           5                             27                                             5

                                       3                                                                           3                                                                           3

      26                                                        6                   26                                                     6                   26                                                       6

                                       2                                                                           2                                                                           2

 25                                                                  7         25                                                               7         25                                                                 7

                                       1                                                                           1                                                                           1


24                                                                        8   24                                                                     8   24                                                                       8


23                                                                        9   23                                                                     9   23                                                                       9


 22                                                                  10        22                                                               10        22                                                                 10


      21                                                        11                  21                                                     11                  21                                                       11


           20                                              12                            20                                           12                            20                                             12


                19                                    13                                      19                                 13                                      19                                   13

                     18                          14                                                18                       14                                                18                         14
                             17    16      15                                                            17   16       15                                                            17   16        15


                             (a) Midas [22]                                                        (b) GDL (Khandkar) [14]                                                         (c) GeForMT [13]

                             30    1       2                                                             30    1       2                                                             30    1        2
                                       5                                                                           5                                                                           5
                     29                          3                                                 29                       3                                                 29                         3

                28                                    4                                       28                                 4                                       28                                   4
                                       4                                                                           4                                                                           4


           27                                              5                             27                                           5                             27                                             5

                                       3                                                                           3                                                                           3

      26                                                        6                   26                                                     6                   26                                                       6

                                       2                                                                           2                                                                           2

 25                                                                  7         25                                                               7         25                                                                 7

                                       1                                                                           1                                                                           1


24                                                                        8   24                                                                     8   24                                                                       8


23                                                                        9   23                                                                     9   23                                                                       9


 22                                                                  10        22                                                               10        22                                                                 10


      21                                                        11                  21                                                     11                  21                                                       11


           20                                              12                            20                                           12                            20                                             12


                19                                    13                                      19                                 13                                      19                                   13

                     18                          14                                                18                       14                                                18                         14
                             17    16      15                                                            17   16       15                                                            17   16        15


                     (d) GDL (Echtler) [4, 2]                                                           (e) Proton [16]                                                        (f) GestureAgents [10]

                             30    1       2                                                             30    1       2                                                             30    1        2
                                       5                                                                           5                                                                           5
                     29                          3                                                 29                       3                                                 29                         3

                28                                    4                                       28                                 4                                       28                                   4
                                       4                                                                           4                                                                           4


           27                                              5                             27                                           5                             27                                             5

                                       3                                                                           3                                                                           3

      26                                                        6                   26                                                     6                   26                                                       6

                                       2                                                                           2                                                                           2

 25                                                                  7         25                                                               7         25                                                                 7

                                       1                                                                           1                                                                           1


24                                                                        8   24                                                                     8   24                                                                       8


23                                                                        9   23                                                                     9   23                                                                       9


 22                                                                  10        22                                                               10        22                                                                 10


      21                                                        11                  21                                                     11                  21                                                       11


           20                                              12                            20                                           12                            20                                             12


                19                                    13                                      19                                 13                                      19                                   13

                     18                          14                                                18                       14                                                18                         14
                             17    16      15                                                            17   16       15                                                            17   16        15


                          (g) EventHurdle [15]                                                          (h) GestIT [23]                                                               (i) ICO [6]


Figure 1. Indicative classification of gesture programming solutions. The labels are defined as follows: (1) modularisation, (2) composition, (3) customi-
sation, (4) readability, (5) negation, (6) online gestures, (7) offline gestures, (8) partially overlapping gestures, (9) segmentation, (10) event expiration,
(11) concurrent interaction, (12) portability, serialisation and embeddability, (13) reliability, (14) graphical user interface symbiosis, (15) activation
policy, (16) dynamic binding, (17) runtime definitions, (18) scalability in terms of performance, (19) scalability in terms of complexity, (20) identifica-
tion and grouping, (21) prioritisation and enabling, (22) future events, (23) uncertainty, (24) verification and user profiling, (25) spatial specification,
(26) temporal specification, (27) other spatio-temporal features, (28) scale and rotation invariance, (29) debug tooling, (30) editor tooling.


                                                                                                              24
existing definition of a gesture when an extra condition is re-          Partially Overlapping Gestures
quired or the order of events should be changed? For graphi-             Several conditions of a gesture definition can be partially
cal programming toolkits, the customisation aspect is broad-             or fully contained in another gesture definition. This might
ened to how easy it is to modify the automatically generated             be intentional (e.g. if composition is not supported nor pre-
code and whether this is possible at all. Note that in many              ferred) or unintentional (e.g. if two different gestures start
machine learning approaches customisation is limited due to              with the same movement). Keeping track of multiple partial
the lack of a decent external representation [11].                       matches is a complex mechanism that is supported by some
                                                                         approaches, intentionally blocked by others (e.g. Proton) or
Readability                                                              ignored by some approaches.
Kammer et al. [13, 12] identified that gesture definitions are
more readable when understandable keywords are used. They                Segmentation
present a statistical evaluation of the readability of various           A stream of sensor input events might not contain explicit
gesture languages which has been conducted with a number                 hints about the start and end of a gesture. The segmenta-
of students in a class setting. In contrast to the readabil-             tion concern (also called gesture spotting) gains importance
ity, they further define complexity as the number of syntactic           given the trend towards the continuous capturing and free-
rules that need to be followed for a correct gesture descrip-            air interaction such as the Kinect sensor and Z-touch [25],
tion, including for example the number of brackets, colons               where a single event stream can contain many potential start
or semicolons. Languages with a larger number of syntactic               events. The difficulty of gesture segmentation manifests it-
rules are perceived to be more complex. However, it should               self when one cannot know beforehand which potential start
be noted that the complexity of a language as defined by                 events should be used until a middle or even an end candi-
Kammer et al. is different from the level of support to form             date event is found to form the decisive gesture trajectory. It
complex gestures and we opted to include their definition un-            is possible that potential begin and end events can still be re-
der the readability criterion.                                           placed by better future events. For instance, how does one
                                                                         decide when a flick right gesture (in free air) starts or ends
Negation                                                                 without any knowledge about the future? This generates a lot
Negation is a feature that allows developers to express a con-           of gesture candidates and increases the computational com-
text that should not be true in a particular gesture definition.         plexity. Some approaches tackle this issue by using a ve-
Many approaches partially support this feature by requiring a            locity heuristic with a slack variable (i.e. a global constant
strict sequence of events, implying that no other events should          defined by the developer) or by applying an efficient incre-
happen in between. However, it is still crucial to be able to de-        mental computing engine. However most language-based ap-
scribe explicit negation for some scenarios such as that there           proaches are lacking this functionality. Many solutions make
should be no other finger in the spatial neighbourhood or a              use of a garbage gesture model to increase the accuracy of the
finger should not have moved up before the start of a gesture.           gesture segmentation process.
Other approaches try to use a garbage state in their gesture
model to reduce false positives. This garbage state is simi-             Event Expiration
lar to silence models in speech processing and captures non-             The expiration of input events is required to keep the memory
gestures by “stealing” partial gesture state and resetting the           and processing complexity within certain limits. The man-
recognition process.                                                     ual maintenance of events is a complex task and most frame-
                                                                         works offer at least a simple heuristic to automatically expire
Online Gestures                                                          old events. In multi-touch frameworks, a frequently used ap-
Some gestures such as a pinch gesture for zooming require                proach is to keep track of events from the first touch down
feedback while the gesture is being performed. These so-                 event to the last touch up of any finger. This might introduce
called online gestures can be supported in a framework by                some issues when dealing with multiple users if there is al-
allowing small online gesture definitions or by providing ad-            ways at least one active finger touching the table. Another
vanced constructs that offer a callback mechanism with a per-            approach is to use a timeout parameter, effectively creating
centage of the progress of the larger gesture. Note that small           a sliding window solution. An advantage of this approach is
online gesture definitions are linked with the segmentation              that the maximum memory usage is predefined, however a
criterion which defines that gestures can form part of a con-            slack value is required. A static analysis of the gesture defi-
tinuous event stream without an explicit begin and end condi-            nitions could help to avoid the need for such a static value.
tion.
                                                                         Concurrent Interaction
Offline Gestures                                                         In order to allow concurrent interaction, one has to keep track
Offline gestures are executed when the gesture is completely             of multiple partial instances of a gesture recognition process.
finished and typically represent a single command. These                 For instance, multiple fingers, hands, limbs or users can per-
gestures are easier to support in gesture programming lan-               form the same gesture at the same time. To separate these
guages as they need to pass the result to the application once.          instances, the framework can offer constructs or native sup-
Offline gestures also increase the robustness due to the ability         port for concurrent gesture processing. In some scenarios, it
to validate the entire gesture. The number of future events              is hard to decide which touch events belong to which hand or
that can change the correctness of the gesture is limited when           user. For example, in Proton the screen can be split in half
compared to online gestures.                                             to support some two player games. A better method is to set


                                                                    25
a maximum bounding box of the gesture [4] or to define the              least one complete circular movement is required and after-
spatial properties of each gesture condition. The use of GUI-           wards each incremental part (e.g. per quarter) causes a gesture
specific contextual information can also serve as a separation          activation.
mechanism. Nevertheless, it is not always possible to know
in advance which combination of fingers will form a gesture,            Dynamic Binding
leading to similar challenges as discussed for the segmenta-            Dynamic binding is a feature that allows developers to define
tion criterion where multiple gesture candidates need to be             a variable without a concrete value. For instance, the x loca-
tracked.                                                                tion of an event A should be between 10 and 50 but should
                                                                        be equal to the x location of an event B. At runtime, a value
Portability, Serialisation and Embeddability                            of 20 for the x location of event A will therefore require an
Concerns such as portability, serialisation and embeddability           event B with the same value of 20. This is particularly use-
form the platform independence of an approach. Portability              ful to correlate different events if the specification of concrete
is defined by how easy it is to run the framework on different          values is not feasible.
platforms. Some approaches are tightly interwoven with the
host language which limits portability and the transfer of a            Runtime Definitions
gesture definition over the network. This transportation can            Refining gesture parameters or outsourcing gesture defini-
be used to exchange gesture sets between users or even to               tions to gesture services requires a form of runtime modi-
offload the gesture recognition process to a dedicated server           fication support by the framework. The refinement can be
with more processing power. The exchange requires a form                instantiated by an automated algorithm (e.g. an optimisation
of serialisation of the gesture definitions which is usually al-        heuristic) by the developer (during a debugging session) or
ready present in domain-specific languages. The embeddabil-             by the user to provide their preferences.
ity has to do with the way how the approach can be used. Is it          Scalability in Terms of Performance
necessary to have a daemon process or can the abstractions be           The primary goal of gesture languages is to provide an ab-
delivered as a library? Another question is whether it is pos-          straction level which helps developers to express complex
sible to use the abstractions in a different language or whether        relations between input events. However, with multimodal
this requires a reimplementation.                                       setups, input continuously enters the system and the user ex-
Reliability                                                             pects the system to immediately react to their gestures. There-
The dynamic nature of user input streams implies the pos-               fore, performance and the scalability when having many ges-
sibility of an abundance of information in a short period of            ture definitions is important. Some approaches, such as
time. A framework might offer a maximum computational                   Midas, exploit the language constructs to form an optimised
boundary for a given setting [19]. Without such a boundary,             direct acyclic graph based on the Rete algorithm [5]. This
users might trigger a denial of service when many complex               generates a network of the gesture conditions, allows the
interactions have to be processed at the same time. Addition-           computational sharing between them and keeps track of par-
ally, low-level functionality should be encapsulated without            tial matches without further developer effort. In recent exten-
providing leaky language abstractions that could form poten-            sions, Midas has been parallelised and benchmarked with up
tial security issues.                                                   to 64 cores [19] and then distributed such that multiple ma-
                                                                        chines can share the workload [24]. Other approaches such as
Graphical User Interface Symbiosis                                      Proton and GDL by Echtler et al. rely on finite state machines.
The integration of graphical user interface (GUI) components            However, it is unclear how these approaches can be used with
removes the need for a single entry point of gesture callbacks          continuous sensor input where segmentation is a major issue.
on the application level. With contextual information and               EventHurdle [15] tackles this problem by using relative po-
GUI-specific gesture conditions the complexity is reduced.              sitions between the definitions but might miss some gestures
For instance, a scroll gesture can only happen when both fin-           due to the non-exhaustive search [8].
gers are inside the GUI region that supports scrolling [4]. This
further aids the gesture disambiguation process and thus in-            Scalability in Terms of Complexity
creases the gesture recognition quality. Another use case is            The modularity of gestures allows for a much better scalabil-
when a tiny GUI object needs to be rescaled or rotated. The             ity in terms of complexity. When adding an extra gesture, no
gesture can be defined as such that one finger should be on             or minimal knowledge about existing definitions is required.
top of the GUI component while the other two fingers are ex-            However, when multiple gestures are recognised in the same
ecuting a pinch or rotate gesture in the neighbourhood.                 pool of events, the developer needs to check whether they
                                                                        can co-exist (e.g. performed by different users) or are con-
Activation Policy                                                       flicting (i.e. deciding between rotation, scaling or both). A
Whenever a gesture is recognised, an action can be executed.            lot of work remains to be done to disambiguate gestures. For
In some cases the developer wants to provide a more detailed            example, how do we cope with the setting of priorities or dis-
activation policy such as trigger only once or trigger when             ambiguation rules between gestures when they are detected
entering and leaving a pose. Another example is the sticky              at a different timestamp? How can we cope with these dis-
bit [4] option that activates the gesture for a particular GUI          ambiguation issues when there are many gesture definitions?
object. A shoot-and-continue policy [9] denotes the execu-              Furthermore, it is unclear how we need to deal with many
tion of a complete gesture followed by an online gesture ac-            variants of a similar gesture in order that the correct one is
tivation. The latter can be used for a lasso gesture where at           used during the composition of a complex gesture.

                                                                   26
Gesture Disambiguation                                                   activate other gestures. To analyse gestures based on impre-
When multiple gesture candidates are detected from the same              cise primitive events, a form of uncertainty is required. This
event source, the developer needs a way to discriminate be-              might also percolate to higher level gestures (e.g. when two
tween them. However, this is not a simple task due to the lack           uncertain subgestures are being composed). The downside is
of detail in sensor information, unknown future events and               that this introduces more complexity for the developer.
the uncertainty of the composition of the gesture.
                                                                         Verification and User Profiling
Identification and Grouping                                              Whenever a gesture candidate is found, it might be verified
The identification problem is related to the fact that sensor            using an extra gesture classifier. An efficient segmentation
input is not always providing enough details to disambiguate             approach may, for example, be combined with a more elabo-
a scenario. Echtler et al. [3] demonstrate that two fingers              rate classification process to verify whether a detected recog-
from different hands cannot be distinguished from two fin-               nition is adequate. Verification can also be used to further
gers of the same hand on a multi-touch table due to the lack             separate critical gestures (e.g. file deletion) from simple ges-
of shadowing information. Furthermore, when a finger is                  tures (e.g. scaling). Note that couples of classifiers (i.e. en-
lifted from the table and put down again, there is no easy way           sembles) are frequently used in the machine learning domain.
to verify whether it is the same finger. Therefore, a double             In order to further increase the gesture recognition accuracy,
tap gesture cannot easily be distinguished from a two finger             a developer can offer a form of user profiling for gestures
roll. Similar issues exist with other types of sensors such as           that are known to cause confusion. Either a gesture is speci-
when a user leaves the viewing angle of a sensor and later               fied too precisely for a broader audience or it is specified too
enters again. Multimodal fusion helps addressing the identi-             loosely for a particular user. This influences the recognition
fication problem. The grouping problem is potentially more               results and accidental activations. Therefore, the profiling of
complex to solve. For instance, when multiple people are                 users by tracking undo operations or multiple similar invo-
dancing in pairs, it is sometimes hard to see who is dancing             cations could lead to an adaptation of the gesture definition
with whom. Therefore the system needs to keep track of al-               for that particular user. User profiling is valuable to improve
ternative combinations for a longer time period to group the             recognition rates but it might also be interesting to exploit it
individuals. Many combinations of multi-touch gestures are               for context-sensitive cases. Future work is needed to offer
possible when fingers are located near each other.                       profiling as a language feature.
Prioritisation and Enabling                                              Gesture Specification
The annotation of gestures with different priority levels is a           The description of a gesture requires a number of primitive
first form of prioritisation which can have an impact on the             statements such as spatial and temporal relations between
gesture recognition accuracy. However, it requires knowl-                multiple events as described in the following.
edge about existing gestures and if there are many gestures it
might not be possible to maintain a one-dimensional priority             Spatial Specification
schema. Nacenta et al. [20] demonstrate that we should not               We define the spatial specification of a gesture as the prim-
distinguish between a scale and rotate gesture on the frame-             itive travelled path to which it has to adhere. The path can
by-frame level but by using specialised prioritisation rules             be formed by sequential or parallel conditions (expressed by
such as magnitude filtering or visual handles. A developer               using temporal constructs) where events are constrained in a
can further decide to enable or disable certain gestures based           spatial dimension such as 10 < event1.x < 50. The use of
on the application context or other information.                         relative spatial operators (e.g. event1.x + 30 > event2.x as
                                                                         used in [8, 15]) also seem useful to process non-segmented
Future Events                                                            sensor information. Note that approximation is required to
One of the major issues with gesture disambiguation is that              support the variability of a gesture execution.
information in the near future can lead to a completely differ-
ent interpretation of a gesture. A gesture definition can, for           Temporal Specification
instance, fully overlap with a larger, higher prioritised ges-           Gestures can be described in multiple conditions. However,
ture. At a given point in time, it is difficult to decide whether        these conditions cannot always be listed in a sequential order.
the application should be informed that a particular gesture             Therefore, most gesture languages allow the developer to ex-
has been detected or whether we wait for a small time period.            press explicit temporal relations between the conditions. An
If future events show that the larger gesture does not match,            example of such a temporal relation is that two events should
users might perceive the execution of the smaller gesture as             (or should not) happen within a certain time period.
unresponsive. Late contextual information might also influ-
                                                                         Other Spatio-temporal Features
ence the fusion process of primitive events that are still in the
running to form part of more complex gestures. The question              Many frameworks offer additional features to describe a ges-
is whether these fused events should be updated to reflect the           ture. For instance, Kammer et al. [13] use atomic blocks to
new information and how a framework can support this.                    specify a gesture. These atomic blocks are preprocessors such
                                                                         as direction (e.g. north or southwest) that abstract all low-
Uncertainty                                                              level details from the developer. They can also rely on a tem-
Noise is an important parameter when dealing with gestural               plate engine to offer more complex atomic building blocks.
interaction. The jittering of multi-touch locations or limb po-          This integration is a form of composition and is an efficient
sitions might invalidate intended gestures or unintentionally            way to describe gestures. Kinematic features can be used

                                                                    27
to filter gestures based on motion vectors, translation, diver-                                                                                                               Midas                                      40


gence, curl or deformation. Khandkar et al. [14] offer a closed                                                                                                               GDL-K

                                                                                                                                                                              GeForMT
loop feature to describe that the beginning and end event of a                                                                                                                GDL-E
                                                                                                                                                                                                                         30
gesture should be approximately at the same location.                                                                                                                         Proton

                                                                                                                                                                              GestureAgents

                                                                                                                                                                              EventHurdle
Scale and Rotation Invariance                                                                                                                                                 GestIT
                                                                                                                                                                                                                         20

Scale invariance deals with the recognition of a single gesture                                                                                                               ICO


trajectory regardless of its scale. Similarly, rotation invari-
ance is concerned with the rotation. Most approaches offer                                                                                                                                                               10


this feature by rescaling and rotating the trajectory to a stan-
dard predefined size and centroid. However, a major limita-                                                                                                                                                              0

tion is that scale and rotation invariance requires segmenta-           1    2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22    23     24   25   26   27   28   29   30


tion and therefore does not work well for online gestures.              Figure 2. A stacked bar chart of the summed values for each criterion

Debug Tooling
In order to debug gestures, developers usually apply prere-             primitive swipes. However, more than one primitive swipe
corded positive and negative gesture sets to see whether the            variation might exist (long/short, fast/slow) and choosing the
given definition is compatible with the recorded data. Gesture          correct one without introducing conflicts is challenging. A
debugging tools have received little attention in research and          way to deal with the issue is to provide test data but this is
are typically limited to the testing of accuracy percentages.           work intensive. There might be software engineering-based
It might be interesting to explore more advanced debugging              abstractions that could improve this situation.
support such as notifying the developer of closely related ges-         Information coming from (22) events in the near future can
ture trajectories (e.g. gesture a is missed by x units [17]).           lead to a completely different interpretation of a gesture. Es-
Editor Tooling
                                                                        pecially in a multimodal context where context or clarifying
                                                                        information comes from additional sensors. An active ges-
Gesture development can be carried out in a code-compatible
                                                                        ture might fully overlap with a larger and higher prioritised
graphical manner such as done with tablatures [16], hur-
                                                                        gesture or be part of a composition that might or might not
dles [15] or spatially [18]. When dealing with 3D input events
                                                                        succeed. In other cases the gesture should not be triggered at
from a Kinect, a graphical representation is valuable to get to
                                                                        all due (late) context data. There are currently no adequate
the correct spatial coordinates.
                                                                        abstractions helping developers to deal with these concerns.
DISCUSSION                                                              Finally, we would like to highlight that (23) uncertainty and
We can identify some general trends with regard to open                 (24) user profiling abstractions are also lacking. Dealing with
issues and underrepresented criteria in existing work. Fig-             uncertainty is currently completely hidden from the program-
ure 2 reveals that some criteria such as (1) modularisation and         mer. Nevertheless, a gesture might be better neglected when
(7) offline gestures are well supported by most approaches              composing from two or more uncertain subgestures. Addi-
and are important to provide a minimal ability to program               tionally, when a user consistently undoes an operation exe-
gestures in a structured manner. Commonly used multi-touch              cuted by a particular gesture, the gesture might require some
gestures such as pinch and rotate are (6) online gestures sup-          adjustments. This can be checked by profiling users and by
ported by most frameworks. However, additional work can                 offering several resolutions strategies.
be done to streamline the implementation of online gestures
by providing (2) composition support, a deeper (14) GUI in-             CONCLUSION
tegration and more advanced (16) activation policies. We be-            Gesture programming languages allow developers to more
lieve that these challenges can be resolved with additional en-         easily express their interaction patterns. When designing such
gineering effort in existing systems.                                   a language, a number of concerns need to be addressed. Our
                                                                        goal was to categorise and explicitly expose these design de-
However, we see that more challenging issues such as (9) seg-           cisions to provide a better understanding and foster a discus-
mentation, (19) scalability in terms of complexity or dealing           sion about challenges, opportunities and future directions for
with (22) future events are poorly or not at all supported in           gesture programming languages. We observed a number of
existing approaches. (9) Segmentation is crucial to deal with           underrepresented concerns in existing work and highlighted
continuous streams of information where no hints are given              challenges for future gesture programming languages.
by the sensor with regard to potential start and end condi-
tions. The recent trend towards near-touch sensors and skele-           REFERENCES
tal tracking algorithms makes the segmentation concern of                   1. Brooks, Jr., F. P. No Silver Bullet: Essence and
crucial importance.                                                            Accidents of Software Engineering. IEEE Computer 20,
With the adoption of the discussed frameworks there is an                      4 (April 1987), 10–19.
increasing demand to deal with (19) scalability in terms of                 2. Echtler, F., and Butz, A. GISpL: Gestures Made Easy. In
complexity. It is currently rather difficult to get an overview                Proceedings of TEI 2012, 6th International Conference
on how many gestures work together. For instance, when a                       on Tangible, Embedded and Embodied Interaction
two-finger swipe needs to be implemented by composing two                      (Kingston, Canada, February 2012), 233–240.


                                                                   28
 3. Echtler, F., Huber, M., and Klinker, G. Shadow Tracking         15. Kim, J.-W., and Nam, T.-J. EventHurdle: Supporting
    on Multi-Touch Tables. In Proceedings of AVI 2008, 9th              Designers’ Exploratory Interaction Prototyping with
    International Working Conference on Advanced Visual                 Gesture-based Sensors. In Proceedings of CHI 2013,
    Interfaces (Napoli, Italy, May 2008), 388–391.                      31th ACM Conference on Human Factors in Computing
                                                                        Systems (Paris, France, April 2013), 267–276.
 4. Echtler, F., Klinker, G., and Butz, A. Towards a Unified
    Gesture Description Language. In Proceedings of HC              16. Kin, K., Hartmann, B., DeRose, T., and Agrawala, M.
    2010, 13th International Conference on Humans and                   Proton: Multitouch Gestures as Regular Expressions. In
    Computers (Aizu-Wakamatsu, Japan, December 2010),                   Proceedings of CHI 2012, 30th ACM Conference on
    177–182.                                                            Human Factors in Computing Systems (Austin, Texas,
                                                                        USA, November 2012), 2885–2894.
 5. Forgy, C. L. Rete: A Fast Algorithm for the Many
    Pattern/Many Object Pattern Match Problem. Artificial           17. Long Jr, A. C., Landay, J. A., and Rowe, L. A.
    Intelligence 19, 1 (1982), 17–37.                                   Implications for a Gesture Design Tool. In Proceedings
                                                                        of CHI 1999, 17th ACM Conference on Human Factors
 6. Hamon, A., Palanque, P., Silva, J. L., Deleris, Y., and
                                                                        in Computing Systems (Pittsburgh, USA, 1999), 40–47.
    Barboni, E. Formal Description of Multi-Touch
    Interactions. In Proceedings of EICS 2013, 5th                  18. Marquardt, N., Kiemer, J., Ledo, D., Boring, S., and
    International Symposium on Engineering Interactive                  Greenberg, S. Designing User-, Hand-, and
    Computing Systems (London, UK, June 2013), 207–216.                 Handpart-Aware Tabletop Interactions with the TouchID
                                                                        Toolkit. Tech. Rep. 2011-1004-16, Department of
 7. Hoste, L. Software Engineering Abstractions for the
                                                                        Computer Science, University of Calgary, Calgary,
    Multi-Touch Revolution. In Proceedings of ICSE 2010,
                                                                        Canada, 2011.
    32nd International Conference on Software Engineering
    (Cape Town, South Africa, May 2010), 509–510.                   19. Marr, S., Renaux, T., Hoste, L., and De Meuter, W.
                                                                        Parallel Gesture Recognition with Soft Real-Time
 8. Hoste, L., De Rooms, B., and Signer, B. Declarative
                                                                        Guarantees. Science of Computer Programming
    Gesture Spotting Using Inferred and Refined Control
                                                                        (February 2014).
    Points. In Proceedings of ICPRAM 2013, 2nd
    International Conference on Pattern Recognition                 20. Nacenta, M. A., Baudisch, P., Benko, H., and Wilson, A.
    Applications and Methods (Barcelona, Spain, February                Separability of Spatial Manipulations in Multi-touch
    2013), 144–150.                                                     Interfaces. In Proceedings of GI 2009, 35th Graphics
                                                                        Interface Conference (Kelowna, Canada, May 2009),
 9. Hoste, L., and Signer, B. Water Ball Z: An Augmented
                                                                        175–182.
    Fighting Game Using Water as Tactile Feedback. In
    Proceedings of TEI 2014, 8th International Conference           21. Nuwer, R. Armband Adds a Twitch to Gesture Control.
    on Tangible, Embedded and Embodied Interaction                      New Scientist 217, 2906 (March 2013).
    (Munich, Germany, February 2014), 173–176.
                                                                    22. Scholliers, C., Hoste, L., Signer, B., and De Meuter, W.
10. Julia, C. F., Earnshaw, N., and Jorda, S. GestureAgents:            Midas: A Declarative Multi-Touch Interaction
    An Agent-based Framework for Concurrent Multi-Task                  Framework. In Proceedings of TEI 2011, 5th
    Multi-User Interaction. In Proceedings of TEI 2013, 7th             International Conference on Tangible, Embedded and
    International Conference on Tangible, Embedded and                  Embodied Interaction (Funchal, Portugal, January
    Embodied Interaction (Barcelona, Spain, February                    2011), 49–56.
    2013), 207–214.
                                                                    23. Spano, L. D., Cisternino, A., Paternò, F., and Fenu, G.
11. Kadous, M. W. Learning Comprehensible Descriptions                  GestIT: A Declarative and Compositional Framework
    of Multivariate Time Series. In Proceedings of ICML                 for Multiplatform Gesture Definition. In Proceedings of
    1999, 16th International Conference on Machine                      EICS 2013, 5th International Symposium on
    Learning (Bled, Slovenia, June 1999), 454–463.                      Engineering Interactive Computing Systems (London,
                                                                        UK, June 2013), 187–196.
12. Kammer, D. Formalisierung gestischer Interaktion für
    Multitouch-Systeme. PhD thesis, Technische Universität         24. Swalens, J., Renaux, T., Hoste, L., Marr, S., and
    Dresden, 2013.                                                      De Meuter, W. Cloud PARTE: Elastic Complex Event
                                                                        Processing based on Mobile Actors. In Proceedings of
13. Kammer, D., Wojdziak, J., Keck, M., Groh, R., and
                                                                        AGERE! 2013, 3rd International Workshop on
    Taranko, S. Towards a Formalization of Multi-Touch
                                                                        Programming based on Actors, Agents, and
    Gestures. In Proceedings of ITS 2010, 5th International
                                                                        Decentralized Control (Indianapolis, USA, October
    Conference on Interactive Tabletops and Surfaces
                                                                        2013), 3–12.
    (Saarbrücken, Germany, November 2010), 49–58.
                                                                    25. Takeoka, Y., Miyaki, T., and Rekimoto, J. Z-Touch: An
14. Khandkar, S. H., and Maurer, F. A Domain Specific
                                                                        Infrastructure for 3D Gesture Interaction in the
    Language to Define Gestures for Multi-Touch
                                                                        Proximity of Tabletop Surfaces. In Proceedings of ITS
    Applications. In Proceedings of DSM 2010, 10th
                                                                        2010, 5th International Conference on Interactive
    Workshop on Domain-Specific Modeling (Reno/Tahoe,
                                                                        Tabletops and Surfaces (Saarbrücken, Germany,
    USA, October 2010).
                                                                        November 2010), 91–94.
                                                               29