=Paper= {{Paper |id=Vol-1190/paper2 |storemode=property |title=User Interface Paradigms for Visually Authoring Mid-Air Gestures: A Survey and a Provocation |pdfUrl=https://ceur-ws.org/Vol-1190/paper2.pdf |volume=Vol-1190 |dblpUrl=https://dblp.org/rec/conf/eics/BaytasYO14 }} ==User Interface Paradigms for Visually Authoring Mid-Air Gestures: A Survey and a Provocation== https://ceur-ws.org/Vol-1190/paper2.pdf
    User Interface Paradigms for Visually Authoring Mid-Air
             Gestures: A Survey and a Provocation
                         Mehmet Aydın Baytaş1, Yücel Yemez2, Oğuzhan Özcan1
                            1                               2
                        Design Lab                            Department of Computer Engineering
               Koç University, 34450 İstanbul                   Koç University, 34450 İstanbul
                                    {mbaytas, yyemez, oozcan}@ku.edu.tr
ABSTRACT                                                                       adoption. One issue that contributes to the low rate of
Gesture authoring tools enable the rapid and experiential                      adoption is the difficulty of balancing the trade-offs
prototyping of gesture-based interfaces. We survey visual                      between complexity and expressive power of the paradigm
authoring tools for mid-air gestures and identify three                        used to represent and manipulate gesture information:
paradigms used for representing and manipulating gesture                       Interfaces employed for gesture authoring may become
information: graphs, visual markup languages and                               convoluted and difficult to use in order to fully tap into the
timelines. We examine the strengths and limitations of                         expressive power of human gesture; or they may omit
these approaches and we propose a novel paradigm to                            useful features as they aim for usability and rapidity.
authoring location-based mid-air gestures based on space
discretization.                                                                In this paper, we survey existing paradigms for visually
                                                                               authoring mid-air gestures and present a provocation, a
Author Keywords                                                                novel gesture authoring paradigm, which we have
Gestural interaction; gesture authoring; visual                                implemented in the form of an end-to-end application for
programming; interface prototyping.                                            introducing gesture control to existing software and novel
                                                                               prototypes.
ACM Classification Keywords
H.5.2 Information Interfaces & Presentation (e.g. HCI):                        The rest of this paper is organized as follows: We first
User Interfaces                                                                present three user interface paradigms – graphs, visual
                                                                               markup languages and timelines – used in current visual
INTRODUCTION                                                                   gesture authoring tools. Existing implementations of each
The recent proliferation of commercial input devices that                      paradigm are examined and discussed in terms of their
can sense mid-air gestures, led by the introduction of the                     capabilities and limitations. Results from evaluations with
Nintendo Wii and the Microsoft Kinect, has enabled both                        real users, if published, are emphasized. We then present a
professional developers and end-users to harness the power                     provocation in the form of a novel user interface paradigm
of full-body gestural interaction. However, despite the                        for authoring mid-air gestures, based on space discretization
availability of the hardware, applications that leverage                       and influenced by existing paradigms. We discuss future
gestural interaction have not been thriving. A striking fact is                work and conclude by presenting a summary of our results.
that while the Kinect has broken records as the fastest-                       PARADIGMS FOR AUTHORING MID-AIR GESTURES
selling consumer electronics device in history, sales of                       Authoring tools for mid-air gestural interfaces are still in
games that utilize the Kinect have been poor [5]. This has                     their infancy. Development tools provided by vendors of
been associated with design and user experience issues                         gesture-sensing input devices are focused on textual
stemming from difficulties in designing and developing                         programming. Ongoing research suggests a set of diverse
software [7]. Specifically, for both adept programmers and                     approaches to the problem of how to represent and
comparatively non-technical but creative users such as                         manipulate three-dimensional gesture data. Existing works
students, designers, artists and hobbyists, the amounts of                     approach the issue in three ways that constitute distinct
time, effort and domain-specific knowledge required to                         paradigms. These are:
implement custom gestural interactions is prohibitive.
                                                                               1.   using 2-dimensional graphs of the data from the
Ongoing research aims to support gestural interaction                               sensors that detect movement;
design and development with gesture authoring tools. These
tools aim at enabling rapid and experiential prototyping,                      2.   using a visual markup language; and,
which are essential practices for creating compelling                          3.   representing movement information using a timeline of
designs [2]. However, few projects have gained widespread                           frames.
                                                                               These paradigms often interact with two programming
EGMI 2014, 1st International Workshop on Engineering Gestures for
Multimodal Interfaces, June 17 2014, Rome, Italy.                              approaches: Demonstration and declaration. Programming
Copyright © 2014 for the individual papers by the papers' authors.             by demonstration enables developers to describe behavior
Copying permitted only for private and academic purposes. This volume is       by example. In the case of gestures, many examples of the
published and copyrighted by its editors.
http://ceur-ws.org/Vol-1190/.

                                                                           8
same behavior are often provided in order to account for the
differences in gesturing between users and over time.
Declarative programming of gestures involves describing
behavior using a high-level specification language. This
specification language may be textual or graphical.
The paradigms we list above do not have to be used
exclusively, and nor do demonstration and declarative
programming. Aspects of different paradigms may find
their place within the same authoring tool. A popular
approach to authoring gestures is to introduce gestures by
demonstration, convert gesture data into a visual                    Figure 1: The Exemplar gesture authoring environment. [3]
                                                                     From left to right, the interface reflects the developer’s
representation, and then declaratively modify it
                                                                     workflow: Data from various sensors connected to the system
In this section, we describe the above approaches in detail,         is displayed as thumbnails and the sensor of interest is
with examples from the literature. We comment on their               selected; filters are applied to the incoming signal; areas of
strengths and weaknesses based on evaluations conducted              interest are marked for pattern recognition or thresholds are
                                                                     set; and the resulting gesture is mapped to output events.
with software that implement them.
                                                                     Exemplar’s user studies suggest that this implementation of
Using Graphs of Movement Data                                        the paradigm is successful in increasing developer
Visualizing and manipulating movement data using 2-                  engagement with the workings and limitations of the
dimensional graphs that represent low-level kinematic                sensors used. Possible areas of improvement include a
information is a popular approach for authoring mid-air              technique to visualize multiple sensor visualizations and
gestures. This approach is often preferred when gesture              events and finer control over timing for pattern matching.
detection is performed using inertial sensors such as
accelerometers and gyroscopes. It also accommodates other            System for Multiple Action Gesture Interface Creation
sensors that read continuously variable data such as                 (MAGIC)
bending, light and pressure. Commonly the horizontal axis            Ashbrook and Starner’s MAGIC [1] is another tool that
of the graph represents time while the vertical axis                 implements the 2-dimensional graphing paradigm. The
corresponds to the reading from the sensor. Often a “multi-          focus of MAGIC is programming by demonstration. It
waveform” occupies the graph, in order to represent data             supports the creation of training sets with multiple
coming in from multiple axes of the sensor. Below, we                examples of the same gesture. It allows the developer to
study three software tools that implement graphs for                 that keep track of the internal consistency of the provided
representing gesture data: Exemplar, MAGIC and GIDE.                 training set; and check against conflicts with other gestures
                                                                     in the vocabulary and an “Everyday Gesture Library” of
Exemplar                                                             unintentional, automatic gestures that users perform during
Exemplar [3] relies on demonstration to acquire gesture              daily activities. MAGIC uses the graph paradigm only to
data and from a variety of sensors - accelerometers,                 visualize gesture data and does not support manipulation on
switches, light sensors, bend sensors, pressure sensors and          the graph. (Figure 2)
joysticks. Once a signal is acquired via demonstration, on
                                                                     One important feature in MAGIC is that the motion data
the resulting graph, the developer marks the area of interest
                                                                     graph may be augmented by a video of the gesture example
that corresponds to the desired gesture. The developer may
                                                                     being performed. Results from user studies indicate that this
interactively apply filters on the signal for offset, scaling,
                                                                     feature has been highly favored by users, during both
smoothing and first-order differentiation. (Figure 1)
                                                                     gesture recording and retrospection. Interestingly, it is
Exemplar offers two methods for recognition: One is
                                                                     reported that the “least-used visualization” in MAGIC “was
pattern matching, where the developer introduces many
                                                                     the recorded accelerometer graph;” with most users being
examples of a gesture using the aforementioned method and
                                                                     “unable to connect the shape of the three lines” that
new input is compared to the examples. The other is
                                                                     correspond to the 3 axes of the accelerometer reading “to
thresholding, where the developer manually introduces
                                                                     the arm and wrist movements that produced them.”
thresholds on the raw or filtered graph and gestures are
                                                                     Features preferred by developers turned out to be the
recognized when motion data falls between the thresholds.
                                                                     videos, “goodness” scores assigned to each gesture
This type of thresholding also supports hysteresis, where
                                                                     according to how they match gestures in and not in their
the developer introduces multiple thresholds that must be
                                                                     own class, and a sorted list depicting the “distance” of a
crossed for a gesture to be registered.
                                                                     selected example to every other example.




                                                                 9
                                                                        Figure 3: The “follow” mode in the GIDE interface. [8]



                                                                     Discussion
                                                                     Graphs that display acceleration data seem to be the
                                                                     standard paradigm for representing mid-air gestures tracked
                                                                     using acceleration sensors. This paradigm supports direct
                                                                     manipulation for segmenting and filtering gesture data, but
                                                                     manipulating acceleration data directly to modify gestures
      Figure 2: MAGIC’s gesture creation interface. [2]              is unwieldy. User studies show that graphs depicting
                                                                     accelerometer (multi-)waveforms are not effective as the
                                                                     sole representation of a gesture, but work well as a
                                                                     component within a multimodal representation along with
Gesture Interaction Designer (GIDE)
                                                                     video.
More recently, GIDE [8] features an implementation of the
graph paradigm for authoring accelerometer-based mid-air
                                                                     Visual Markup Languages
gestures. GIDE leverages a “modified” hidden Markov
                                                                     Using a visual markup language for authoring gestures can
model approach to learn from a single example for each
                                                                     allow for rich expression and may accommodate a wide
gesture in the vocabulary. The user interface implements
                                                                     variety of gesture-tracking devices, e.g. accelerometers and
two distinct features: (1) Each gesture in the vocabulary is
                                                                     skeletal tracking, at the same time. The syntax of these
housed in a “gesture editor” component which contains the
                                                                     visual markup languages can be of varying degrees of
sensor waveform, a video of the gesture being performed,
                                                                     complexity, but depending on the sensor(s) used for gesture
an audio waveform recorded during the performance, and
                                                                     detection, making use of the capabilities of the hardware
other information related to the gesture. (2) A “follow”
                                                                     may not require a very detailed syntax. In this section we
mode allows the developer to perform gestures and get real-
                                                                     examine a software tool, EventHurdle, that implements a
time feedback on the system’s estimate of which gesture is
                                                                     visual markup language for gesture authoring; and we
being performed (via transparency and color) and where
                                                                     discuss a gesture spotting approach based on control points
they are within that gesture. (Figure 3) This feedback on the
                                                                     which has not been implemented as a gesture authoring
temporal position within a gesture is multimodal: The
                                                                     tool, but provides valuable insight.
sensor multi-waveform, the video and the audio waveform
from the video are aligned and follow the gestural input.
                                                                     EventHurdle
GIDE also supports “batch testing” by recording a                    Kim and Nam describe a declarative hurdle-driven visual
continuous performance of multiple gestures and running it
                                                                     gesture markup language implemented in the EventHurdle
against the whole vocabulary to check if the correct
                                                                     authoring tool [6]. The EventHurdle syntax supports gesture
gestures are recognized at the correct times.
                                                                     input from single-camera-based, physical sensor-based and
User studies on GIDE reveal that the combination of multi-           touch-based gesture input. In lieu of a timeline or graph,
waveform, video and audio was useful in making sense of              EventHurdle projects gesture trajectory onto a 2-
gesture data. Video was favored particularly since it allows         dimensional workspace. The developer may perform the
developers to still remember the gestures they recorded              gestures, see the resulting trajectory on the workspace, and
after an extended period of not working on the gesture               declaratively author gestures on the workspace by placing
vocabulary. Another finding from the user studies was the            “hurdles” that intersect the gesture trajectory. Hurdles may
suggestion that the “batch testing” feature where the                be placed in ways that result in serial, parallel and/or
developer records a continuous flow of many gestures to              recursive compositions. (Figure 4) “False hurdles” are
test against could be leveraged as a design strategy –               available for specifying unwanted trajectories. While an
gestures could be extracted from a recorded performance of           intuitive way to visualize movement data from pointing
continuous movement.                                                 devices, touch gestures and blob detection; this approach
                                                                     does not support the full range of expression inherent in 3-
                                                                     dimensional mid-air gesturing.




                                                                10
                                                                      surrounded by boundaries whose size can be adjusted to
                                                                      introduce spatial flexibility and accommodate “noisy”
                                                                      gestures. Third, boundaries can be set for negation when the
                                                                      variation in the gesture trajectory is too much. The authors
                                                                      discuss linear or planar negation boundaries only, but
                                                                      introducing negative control points into the syntax could
                                                                      also be explored. Finally, a “coupled recognition process” is
                                                                      introduced, where a trained classifier can be called to
                                                                      distinguish between potentially conflicting gestures; e.g. a
                                                                      circle and a rectangle that share the same control points.
                                                                      One limitation of this approach is the lack of support for
Figure 4: EventHurdle's visual markup language allows for a           scale invariance. One way of introducing scale invariance
variety of compositions: (from top left) a simple gesture with        may be to automatically scale boundary sizes and temporal
one hurdle; serial and parallel compositions; combinations of
                                                                      constraints with the distance between control points.
serial and parallel compositions; recursive gesturing. [6]
                                                                      However, it is likely that the relationship between optimal
                                                                      values for these variables is nonlinear, which could make
                                                                      automatic scaling infeasible.
Gestures defined in EventHurdle are configurable to be
location-sensitive or location-invariant. By design,
                                                                      Discussion
orientation- and scale-invariance are not implemented in
                                                                      The expressive power and usability of a visual markup
order to avoid unnecessary technical options that may
                                                                      language may vary drastically depending on the specifics of
distract from “design thinking.”
                                                                      the language and the implementation. The general
User studies on EventHurdle comment that the concept of               advantage of this paradigm is that it is suitable for
hurdles and paths is “easily understood” and it “supports             describing and manipulating location-based gesture
advanced programming of gesture recognition.” Other than              information (rather than acceleration-based information
this, supporting features, rather than the strengths and              commonly depicted using graphs). This makes using a
weaknesses of the paradigm or comparison with other                   visual markup language suitable for mid-air gestures
paradigms, have been the focus of user studies.                       detected by depth-sensing cameras, where the interaction
                                                                      space is fixed and the limbs of the users move in relation to
Control Points                                                        each other. Either the motion sensing device or certain parts
Hoste, De Rooms and Signer describe a versatile and                   of the skeletal model could be used to define a reference
promising approach that uses spatiotemporal constraints               frame and gesture trajectories could be authored in a
around control points to describe gesture trajectories [4].           location-based manner using a visual markup language.
While the focus of the approach is on gesture spotting (i.e.
segmentation of a continuous trajectory into discrete                 Timelines
gestures) and not gesture authoring, they do propose a                Timelines of frames are commonly used in video editing
human-readable and manipulable external representation.               applications. They often consist of a series of ordered
(Figure 5) This external representation has significant               thumbnails and/or markers that represent the content of the
expressive power and support for programming constructs               moving picture and any editing done on it, such as adding
such as negation (for declaring unwanted trajectories) and            transitions.
user-defined temporal constraints. While the authors’
approach is to infer control points for a desired gesture from
an example, the representation they propose also enables
the manual placement of control points.
The authors do not describe an implementation that has
been subjected to user studies. However, they discuss a
number of concepts that add to the expressive power of
using control points as a visual markup language to
represent and manipulate gesture information. The first is
that it is possible to add temporal constraints to the markup;
i.e. a floor or ceiling value can be specified for the time           Figure 5: Using control points to represent gestures [4]. (Left)
taken by the tracked limb or device to travel between                 A “noisy” gesture still gets picked up due to relaxed
control points. This is demonstrated not on the graphical             boundaries around control points. (Right) Negation is
                                                                      introduced via vertical boundaries so that large movements in
markup (which can be done easily), but on textual code
                                                                      the vertical axis are distinguished from the desired gesture.
generated to describe a gesture – another valuable feature.
The second such concept is that the control points are


                                                                 11
          System                     UI Paradigm              Programming Approach                 Insights from user studies
                                                                                               Increases engagement with sensor
Exemplar [3]                   Graphs                       Demonstration
                                                                                               workings and limitations.
                                                                                               Users unable to connect waveform
MAGIC [1]                      Graphs (multi-waveform)      Demonstration                      to physical movements. Optional
                                                                                               video is favored.
                               Graphs (multi-waveform                                          Multimodal representation helps
GIDE [8]                                                    Demonstration
                               with video)                                                     make sense of gesture data.
                                                                                               Easily understood. Supports
EventHurdle [6]                Visual markup language       Declaration
                                                                                               “advanced” programming.
Control Points [4]             Visual markup language       Declaration / Demonstration        Not implemented.
Gesture Studio 1               Timeline                     Demonstration                      Not published.
    Table 1: Summary of studies on systems that exemplify three user interface paradigms for visually authoring mid-air gestures.
                                                                      paradigm used to represent gesture information appears to
Gesture Studio
One application that implements a timeline to visualize               be projecting the sensor waveforms onto a graph. Graphs
gesture information is the commercial Gesture Studio.1 The            appear to work well as components that represent sensor-
application works only with sensors that detect gestures              based gestures, allow experimentation with filters and
                                                                      gesture recognition methods, and support direct
through skeletal tracking using an infrared depth camera.
                                                                      manipulation to some extent. User studies show that while
Developers introduce gestures in Gesture Studio by
                                                                      the graphs alone may not allow developers to fully grasp
demonstration, through performing and recording
examples. The timeline is used to display thumbnails for              the connection between movements and the waveform [1],
each frame of the skeleton information coming from the                they have been deemed useful as part of a multimodal
depth sensor. The timeline is updated after the developer             gesture representation [8]. Using hurdles as a visual markup
finishes recording a gesture, while during recording a                language offers an intuitive and expressive medium for
rendering of the skeletal model tracked by the depth sensor           gesture authoring, but it is not able to depict fully 3-
provides feedback. After recording, the developer may                 dimensional gestures. Using spherical control points may be
remove unwanted frames from the timeline to trim gesture              more conducive to direct manipulation while still affording
data for segmentation. Reordering frames is not supported             an expressive syntax, but no implementation of this
since gestures are captured at a high frame rate (depending           paradigm exists for authoring mid-air gestures. Finally,
                                                                      timelines of frames may come in handy for visualizing
on the sensor, usually around 30 frames per second), which
                                                                      dynamic gestures with many moving elements, such as in
would make manual frame-by-frame editing inconvenient.
The process through which these features have been                    skeletal tracking; but used in this fashion they allow only
selected is opaque, since there are no published studies that         visualization and not manipulation.
present the design process or evaluate Gesture Studio in              There are paradigms that allow for the authoring of sensor-
use.                                                                  based gestures both declaratively and through
                                                                      demonstration. For skeletal tracking interfaces, tools based
Discussion                                                            on demonstration exist, but we have not come across visual
In gesture authoring interfaces, timelines make sense when            declarative programming tools for skeletal tracking
gesture tracking encompasses many limbs and dynamic                   interfaces. In the next section, we propose a user interface
movements that span more than a few seconds. Spatial and              paradigm for declaratively authoring mid-air gestures for
temporal concerns for gestures in 2 dimensions, such as               skeletal tracking interfaces.
those performed on surfaces, can be represented on the
same workspace. The representation of mid-air gestures                PROVOCATION: SPACE DISCRETIZATION AS A NOVEL
requires an additional component such as a timeline to                PARADIGM FOR AUTHORING MID-AIR GESTURES
show the change over time.                                            The paradigms that we surveyed above each have their
                                                                      strengths and weaknesses. We wish to propose a novel
Discussion                                                            paradigm for declaratively authoring mid-air gestures,
We have presented a number of systems that exemplify                  which we will call space discretization. This paradigm
three user interface paradigms for visually authoring mid-            conceptually supports both declaration and demonstration
air gestures for computing applications (see Table 1 for a            as ways to introduce gestures, and direct manipulation to
summary). For sensor-based gesturing, the standard                    edit them. The paradigm is adaptable for sensor-based
                                                                      interactions and touch gestures. We will present a rendition
1                                                                     aimed at authoring gestures for skeletal tracking interfaces.
    http://gesturestudio.ca/


                                                                 12
                                                                       Figure 7: A 3-dimensional “swipe” gesture to be performed
    Figure 6: A 2-dimensional “Z” gesture defined using ordered        with the right hand, implemented in Hotspotizer. The front
                    hotspots in discretized space.                     view (A) and the side view (B) depict the third frame, selected
                                                                       from the timeline (C). The 3D viewport (D) depicts all three
Overview and Implementation                                            frames, using transparency to imply the order.
We have implemented this paradigm as part of an
                                                                       However, gestures in Hotspotizer are always location-
application called Hotspotizer. The application has been
                                                                       dependent with respect to the gesturing limb’s position
developed as an end-to-end suite to facilitate rapid
                                                                       relative to the rest of the body. Scale- and orientation-
prototyping of gesture-based interactions and adapting
                                                                       invariance are not automatically supported, but it is possible
arbitrary interface for gesture control. Collections of
                                                                       to arrange hotspots in creative ways that allow the same
gestures can be created, saved, loaded, modified and
mapped to a keyboard emulator within the application. The              gesture to be executed on different scales.
current version is configured to work with the Microsoft               Splitting gesture data into frames, which are navigated
Kinect sensor and is available online as a free download.2             using a timeline, supports authoring dynamic movements.
The paradigm we implemented works by partitioning the                  The side view and front view grids only display hotspots
                                                                       that belong to one frame at a time, since placing all of the
space around the tracked skeletal model into discrete spatial
                                                                       hotspots that belong to different frames of a gesture on the
compartments. In a manner that is similar to the use of
                                                                       same grids results in a convoluted interface. During gesture
control points in Hoste, De Rooms and Signer’s approach,
                                                                       tracking, if the tracked limb enters any one of the hotspots
these discrete compartments can be marked and activated to
                                                                       that belongs to a frame, the entire frame registers a “hit.”
become “hotspots” that register movement when a tracked
                                                                       For a gesture to be registered, its frames must be hit in the
limb enters them. (Figure 6) Our approach may be likened
                                                                       correct order and the time that elapses between subsequent
to modifying the control points paradigm to use cubic
                                                                       frames registering a hit must not exceed a pre-defined
instead of spherical boundaries and allow the placement of
control points only at discrete locations in space. This is            timeout. Conceptually the timeout could be adjustable; in
due to the difficulty of manipulating continuously moveable            the current implementation, for the sake of a simple user
control points in 3 dimensions. Furthermore, using discrete            interface, it is hard-coded to 500ms in Hotspotizer.
hotspots instead of control points allows for the boundaries           In essence, we propose a design for an expressive user
of the control points to be in custom shapes rather than               interface paradigm for authoring mid-air gestures detected
spheres only. Considering the precision of current skeletal            through skeletal tracking. Aspects of this design are based
tracking devices, the difficulty of manipulating free-form             on the control points paradigm described in [4]. We
regions rather than discrete compartments does not pay off.            modified the paradigm to confine the locations of the
                                                                       control points to discrete pre-defined locations and use
In Hotspotizer, the compartments are cubes that measure 15
                                                                       cubic control point boundaries of fixed size, which can be
cm on each side and the workspace is a cube, 300 cm on
                                                                       added together to create custom shapes. We also introduce a
each side, the centroid of which is fixed to the tracked
skeleton’s “hip center” joint returned by the Kinect sensor.           timeline component so that spatial and temporal constraints
(Figure 7) The workspace has been sized to accommodate                 can be manipulated unambiguously.
larger users, and the compartments have been sized,
                                                                       Future Work
through empirical observations, to reflect the sensor’s
                                                                       Future work includes features to enrich the expressiveness
precision. The alignment of the workspace to the user’s
                                                                       of the paradigm and evaluating its performance in use.
body results in gestures being location-invariant with
respect to the user’s position relative to the depth camera.           The current implementation of the paradigm in Hotspotizer
                                                                       supports only declaration – manually specifying hotspots by
                                                                       selecting relevant areas on a grid. The interface may be
2
 http://designlab.ku.edu.tr/design-thinking-research-                  extended to allow the introduction of gestures through
group/hotspotizer/


                                                                  13
demonstration, by inferring hotspots automatically from                 supporting features onto this paradigm and evaluate its
recorded gestures.                                                      performance in use by developers.
“Negative hotspots” to mark compartments that should not
                                                                        ACKNOWLEDGEMENT
be crossed when gesturing are a possibility for future                  The work presented in this paper is part of research
iterations on Hotspotizer. So is supporting gestures                    supported by the Scientific and Technological Research
performed by multiple limbs; possibly by using a multi-                 Council of Turkey (TÜBİTAK), project number 112E056.
track timeline and coupling keyframes where movements of
the limbs should be synchronized.                                       REFERENCES
In order to describe more complex gestures, it may make                 1. Ashbrook, D. and Starner, T. MAGIC: A Motion Gesture
sense to introduce classifier-coupled gesture recognition.                 Design Tool. Proceedings of the 28th international conference
                                                                           on Human factors in computing systems - CHI ’10, ACM
One shortage of the paradigm is that it does not
                                                                           Press (2010), 2159.
accommodate the repeated usage of hotspots within
different frames of a gesture well. If a gesture requires that          2. Buxton, B. Sketching User Experiences: Getting the Design
a certain hotspot be hit twice, for example, the current                   Right and the Right Design. Morgan Kaufmann, Boston, 2007.
implementation does not afford a way of detecting whether               3. Hartmann, B., Abdulla, L., Mittal, M., and Klemmer, S.R.
the first or the second hit is registered as a user performs the           Authoring sensor-based interactions by demonstration with
gesture.                                                                   direct manipulation and pattern recognition. Proceedings of
                                                                           the SIGCHI conference on Human factors in computing
Finally, as the precision of skeletal tracking devices                     systems - CHI ’07, ACM Press (2007), 145.
increases and in order to accommodate devices that track
smaller body parts such as the hands, adjustable workspace              4. Hoste, L., De Rooms, B., and Signer, B. Declarative Gesture
and compartment sizing may be introduced.                                  Spotting Using Inferred and Refined Control Points.
                                                                           Proceedings of the 2nd International Conference on Pattern
Formative evaluations have been conducted throughout the                   Recognition Applications and Methods (ICPRAM 2013),
development Hotspotizer, focusing on prioritizing features                 (2013).
and the visual design of the interface. Results of these,               5. Hughes, D. Microsoft Kinect shifts 10 million units, game
along with summative evaluations that compare the                          sales        remain        poor.      HULIQ,           2012.
application to existing solutions and uncover user strategies              http://www.huliq.com/10177/microsoft-kinect-shifts-10-
for using the tool will be published in the future.                        million-units-game-sales-remain-poor.
                                                                        6. Kim, J.-W. and Nam, T.-J. EventHurdle. Proceedings of the
CONCLUSION                                                                 SIGCHI Conference on Human Factors in Computing Systems
We reviewed existing paradigms for authoring mid-air                       - CHI ’13, ACM Press (2013), 267.
gestures and discussed how graphs of sensor waveforms are
suitable components that represent acceleration-based                   7. Stein, S. Kinect, 2011: Where art thou, motion? CNET, 2011.
                                                                           http://www.cnet.com/news/kinect-2011-where-art-thou-
gesture data; how visual markup languages are better suited
                                                                           motion/.
for location-based gesture data; and how timelines are used
to communicate dynamic gesturing. We presented a novel                  8. Zamborlin, B., Bevilacqua, F., Gillies, M., and D’inverno, M.
gesture authoring paradigm for authoring mid-air gestures                  Fluid gesture interaction design. ACM Transactions on
sensed by skeletal tracking: a visual markup language based                Interactive Intelligent Systems 3, 4 (2014), 1–30.
on space discretization supported by a timeline to visualize
temporal aspects of gesturing. Future work may build




                                                                   14