=Paper=
{{Paper
|id=Vol-1910/paper0108
|storemode=property
|title=Gestural Interaction in Virtual Environments: User Studies and Applications
|pdfUrl=https://ceur-ws.org/Vol-1910/paper0108.pdf
|volume=Vol-1910
|authors=Fabio Marco Caputo
|dblpUrl=https://dblp.org/rec/conf/chitaly/Caputo17
}}
==Gestural Interaction in Virtual Environments: User Studies and Applications==
Gestural interaction in Virtual Environments: user studies and applications Fabio Marco Caputo Department of Computer Science, Verona fabiomarco.caputo@univr.it Abstract. With the current available technology there has been an in- creased interest in the development of virtual reality (VR) applications, some of those already becoming commercial products. This fact also rose many issues related to their usability. One of the main challenges is the design of interfaces to interact with these virtual environments (VE), in particular for those setups relying on hand tracking as a mean of input. This research project aims to tackle some of the critical aspects of the interaction based on hand tracking in VE by proposing novel interaction technique for object manipulation and gesture based interfaces with in- tent of addressing relevant issues affecting usability of this kind of VR applications. 1 Background of the research project Virtual Reality (VR), is a term used for those applications that can simulate physical presence in places of the real world or imagined worlds. These VR ap- plications recreate sensory experiences, which could in theory also include virtual taste, sight, smell, sound, touch, etc. Most current VR environments are mainly composed of visual experiences, through the use of a regular computer screen or with special stereoscopic displays, and some additional sensory information, like sound through headphones or speakers targeted towards users. The availability of low-cost devices for visualization and interaction gestures are reviving the interest in VR and the technology matured enough to offer many applications of potential interest, not only in games and entertainment, but also for other specific uses such as scientific and medical visualization, augmented re- ality, etc. What limits the use of immersive 3D environments in these areas, are often the difficulties of interaction that, despite the increasingly sophisticated and relatively cheap tracking devices, make unusable applications that could have significant utility and a large number of users. Therefore it becomes crucial to study and test the different tasks to evaluate which interaction paradigm is the best, before proposing the applications themselves to potential users. Even though the ability to interact with the virtual environment is not required in order to speak of VR, the most interesting applications of this technology come along with the faculty of performing one or more tasks based on the applica- tion purpose. These tasks involve an interaction between both the user and the virtual objects present in the environment (i.e. grabbing a virtual object). Such feature is anything but trivial; depending on the complexity of the task, a large number of problems may arise. One of the main goal of research in this field is to achieve naturalness of interaction with virtual environments. [4] In order to achieve such goal, research work is currently focusing on two main aspects. The first is the understanding of peoples mental models of interaction with these virtual environments. In fact, due to the interface/devices layer of interaction, users may develop different in- teraction models from those used in a real physical environment regardless of the kind of task they wish to perform. Understanding this is a key factor in the development of the mid-air interaction interface for a different number of tasks (e.g. virtual assembly, shape modeling, etc.) in order to design an interface perceived as natural by most people. The second one is about the development of actual interaction interfaces possibly based on guidelines derived from the knowledge acquired from the kind of research described above. [8] The aim of my PhD activity is to investigate on open issues related to object manipulation and gesture-based interfaces in immersive virtual environments both analyzing user preferences and interaction metaphors that implementing and testing practical solutions with low cost hardware. 2 Open issues and research aims Since the early days of virtual environments, interaction with virtual objects in the scene has been one of the main object of studies. Considering three- dimensional virtual environments, interaction isn’t trivial, mainly due to the required mapping between traditional input devices (2D) and the virtual envi- ronment (3D). Most common solutions resort to techniques that somehow relate the actions performed in the two-dimensional space of the input device (e.g. mouse cursor or touch) to three-dimensional transformations. Since it is usual for people to interact with these kind of environments with traditional displays, 3D content is displayed in a 2D rendered image, which hin- ders content’s perception. To overcome both the limitations of the input and the output devices, mainstream solutions for creating and editing 3D virtual con- tent, namely computer-aided design (CAD) tools, resort to different orthogonal views of the environment. This allows a more direct two-dimensional interaction with limited degrees of freedom. Solutions that offer a single perspective view usually either apply the transformation in a plane parallel to the view plane, or resort to widgets that constraint interactions and ease the 2D-3D mapping. [9] Research has shown that the first approach can sometimes result in unexpected transformations when users are allowed to freely navigate through the virtual environment, and that constrained interactions allow for more accurate manip- ulations. [10] Recent technological advances lead to an increased interest in immersive vir- tual reality settings. Affordable hardware for immersive visualization of virtual environments, such as the Oculus Rift head-mounted display (HMD), ease the perception of three-dimensional content. Moreover, advances in user tracking so- lutions make possible to know where users’ head, limbs and hands are in space. This allows for more direct interactions, mimicking the ones with physical ob- jects. First results of our work showed that mid-air direct interactions with 3D virtual content can reduce tasks’ duration and are appealing to users. Although mid-air interactions show promising results, the accuracy of hu- man spatial interactions is limited. Moreover, the limited dexterity of mid-air hand gestures, aggravated by lack of precision from tracking systems and low- definition of current HMDs, constrain precise manipulations. This precision is of extreme importance when creating or assembling engineering models or archi- tectural mock-ups, for instance. Even if a large number of methods have been proposed in the literature, most demo systems do not provide easy to use interfaces and the possibility of reaching a high level of precision. This is also due to more intrinsic issues of the tracking task such as the inaccuracy in tracking, occlusions, difficulty in segmentation of different gestural primitives, and other non-trivial problems. There is surely a lot of room for improvements, not necessarily related to a better tracking of body landmark but also to a smart global processing of 3D keypoint trajectories. Sev- eral methods have been recently presented for characterizing salient points and for global shape description, with invariance properties and robustness against various kinds of perturbation. Methods with similar characteristics but adapted to the different kind of shape data provided by tracking devices could in principle be applied for the solution of open issues in gestural interaction. An example of smart application of simple geometric processing to realize effective interaction comes from 2D touchscreen interaction, where many gesture recognition applica- tions are not based on complex time series analysis, but on the reduction of the problem to a simple template matching of 2D shapes. The popular 1-dollar recog- nizer [17] and similar derived methods (also proposed for 3D interaction, e.g. [11] are a clear demonstration of the usefulness of this simplification. This may seem to indicate that in this case the dynamic information may be neglected without losing the meaningful part of the signal. In the following two of the main aspects covered by the thesis are presented along with respective issues that are relevant topic of interest for the HCI community and still need to be further investigated. Object manipulation in VR environments There are different ways to interact with virtual object present in a VR scene. The type of interaction is determined by the kind of task the user is allowed to perform and ultimately by the purpose of the application featuring such interac- tions. One specific kind of interaction is the object manipulation. In this case the user is allowed more or less directly depending on the metaphor proposed by the implemented technique to select an object in the scene e perform one or more different transformations (i.e. translation rather than rotations or scaling opera- tions). This kind of interaction has many limitations, however. First, user’s arm has a limited range of interaction due to real world constraints (e.g. it’s safe to assume arms are always anchored to user’s bodies) and sensors detection volume in case of deviceless setups. This limit can be worked around with the use of a navigation technique. Such technique is required as it also allows different visual perspectives, but it should not be required depending on the application. Ma- nipulation of large objects is also an issue as they tend to obscure the users view during positioning tasks unless additional workarounds are implemented to give the user a way to change the point of a view for a more convenient perspective. [3] Gestural interaction for VR interfaces Gestural interaction is particular successful on 2D touchscreens where it is easy to determine the beginning and the end of the gesture by using the contact between fingers and display surface. In 3D deviceless interaction, the beginning of a gesture must be automatically detected from the gesture itself. The use of pattern recognition tools may allow a robust recognition of a gesture learned after the gesture realization, that may be good for a sign language interpreter, for example, but not for a manipulation tool that should give visual feedback within a reasonable time. A possible trick to solve this issue typically applied in interface is to use a second hand or a vocal interface, or the recognition of a coded gesture to tell the system that the gesture is starting or it is finished. Further- more, in a manipulation interaction gesture start involves also the ”grabbing” of the object of interest and gesture end involves also its release. This means that the algorithm should localize with a sufficient accuracy the position of the grab and, more difficult, the desired location of the object release. Especially the last task requires, in our opinion, both a smart geometrical representation of the hand/finger trajectories, possibly invariant to users gestures realization and a smart learning procedure in order to characterize the key actions and the cor- responding desired object position. It is a really challenging task, but the ideas that could be applied to find a reasonable solution may be the same applied in classical robust point matching and landmark location. Keypoint trajecto- ries could be simplified and normalized, mappings between trajectories could be encoded as functions, evolution of connected point could be treated as a sur- face. And, in the same way learning approaches are used to find discriminative keypoints for specialized recognition tasks on 3D meshes [7], it is possible to think that on geometric encoding of hand trajectories, keypoints able to have a user-independent recognition of gesture limits could be learned through the collection of example data. The use of data collection to learn gestural feature is already applied in HCI, for example in gesture elicitation experiments [14, 1], and, in some sense, with the same approach we could learn how users behave when doing ”naturally” simple gestures like grabbing, translating and rotating a virtual object. Registering example gestures and de- coupling 3D trajectories and velocity patterns it would be possible to find specific and invariant keypoints to identify beginning and end of gestures. Another big problem is related to the accuracy of gestures, that in manip- ulation tasks is particularly important. The accuracy of object positioning in manipulation is limited by the lack in accuracy of tracking and the possible occlusions of keypoints. However, this problem could be mitigated by the redun- dancy of the data, and all the research on robust descriptors or partial retrieval can surely help in finding ad hoc solutions for the task. Approaches for partial shape retrieval, like Bag of Words [13, 16] could be, for example, applied to se- lect only the partial information correctly describing the gesture and captured by the device/tracking library used. Furthermore, learning from example, using regression techniques, the relationships between keypoint positions and desired manipulation position could help in improving the accuracy of the gesture local- ization. Another problem in this particular case is the delay in visual feedback, as the detection of the release gesture requires a backwards analysis and if the grabbed object is moved together with the hand, there is a discrepancy between the expected manipulated object position and the visualized position in the vir- tual representation. 3 Research work plan The research project has been summarized in a list of practical goals to achieve by the end of the PhD period, the idea is to cover and provide solutions for interaction with VE based on hand tracking: – Improve the knowledge of mental models and perception mechanism natu- rally developed by users, when approaching VR applications, through ad-hoc experiments. – Derive useful and necessary guidelines to design novel interfaces aimed to achieve a natural interaction with different VR systems. – Design and implement interactions paradigms and interface modules to be tested with users by creating a set of prototypes to use with appropriate hardware setup for the task. – Test the validated paradigms and modules in real-world applications (e.g. 3D scene design, virtual museum) with specific user categories. – Explore different solutions to improve interaction through hand tracking such as machine learning techniques applied to gesture recognition. With the current state of the research project we already achieved a number of goals while working on the remaining ones in the most recent and future works. The specific methodology applied for each work is described in the next section. 4 Current work and results A preliminary study has been done to strengthen the knowledge of VR and Human Computer Interaction (HCI) field in scientific research by exploring the most recent literature along with the most relevant works of the past that marked this branch of the field with crucial findings and results. The topics covered by the surveyed works include but aren’t limited to those mentioned in Section 1. Subsequent efforts has been put on the development of VR prototypes using novel navigation and interaction methods performing some preliminary studies to evaluate their usability. Both these aspects of the work have been conducted accordingly to the research plan and led to some publications and works in progress. Classification of manipulation literature The study of literature was a major contribution on the making of a survey on 3D Object Manipulation on which we worked to identify a convenient new taxonomy aimed to classify most of the research works and papers on object manipulation in 3D environments for both immersive VR and not. In our work we examined over 50 works on 3D manipulation techniques and featured over 30 in the survey by classifying them through the new taxonomy. The proposed taxonomy offers two non-exclusive approaches to classify any given work presenting a manipulation technique. The classification of these works can in be in fact based both on their ”Environment Properties” (Figure 1) and the characteristics of the ”Manipulation Metaphor” underlying the technique. (Figure 2) Fig. 1. Taxonomy of Environment Properties Fig. 2. Taxonomy of Manipulation Metaphors The literature examined for the survey greatly improved the understanding of the key issues for designing novel interaction paradigms for object manipula- tion in immersive virtual environment that could possibly bring an interesting contribution to this specific topic when compared to other already known and established paradigms. Prototyping and evaluation of gestural interface. One of the first works completed in order to derive useful guidelines for future prototypes development was an evaluation of performance and user preferences between 4 different inter- action techniques. [6] The interactive environments for our first works have been realized using low-cost setups composed of a Leap Motion controller for hand tracking and a Oculus Rift DK2 for stereoscopic rendering. In a paper oriented towards Virtual Museum experiences we presented the eval- uation results of a series of techniques in VR. These techniques are specifically aimed to cover the basic interaction requirements for a virtual museum experi- ence such as the navigation in the museum space (Figure 3) and the and through the information of displayed items. The paper features the details about the in- teraction design and evaluation test used to validate all the techniques, along with the collected data. Fig. 3. Examples of two of the four navigation techniques examined [5] The main contribution of this work was the implementation and evaluation of a number of techniques for information display and the environment navigation: Information Display techniques: – Display Buttons – Swipe – Object Selection – Object Picking Navigation techniques: – Palm Rotation – Forward Button – Mobile Control Data was collected, for thirty sessions for both tasks. The most interesting results derive from the total time of task completion. These results already show, performance-wise, relevant differences between the proposed solutions. [5] In more recent works we presented novel solutions for ”natural” single-hand manipulation (e.g. picking/translating, rotating and scaling) of object in Virtual Reality environments. The solution is based on the combination of natural ges- tures with easily recognizable start and end, and no need of explicit or bimanual gesture transitions and smart feedback suggesting potential actions and gesture activation status. The proposed techniques are: the Knob metaphor for perform- ing fast and accurate rotations around selected axis and the Pin method which shows competitive performance results compared to a well known bimanual so- lution (i.e. the Handle Bar metaphor [2, 15]). Our goal was to create a set of solutions to allow an intuitive and easy manip- ulation of objects, given hand and finger information captured by a cheap sensor (e.g. Leapmotion, RealSense), easily adaptable to different contexts and devices. The interaction system acquire the data stream provided by the tracker and pro- cesses hand/finger trajectories to determine determine internal state changes, activate manipulation modes and feedback visualization. Following the hints coming from the literature, we aimed at: – performing the manipulation control with a single hand – separating translation from rotation and constrain rotation axes – being robust against limit of tracking and gesture segmentation accuracy provided by the chosen device/API, especially for object release – finding an intuitive metaphor for rotation – avoiding as much as possible the necessity of non-intuitive gestures to to switch modes The scheme of the designed interaction system for the Knob metaphor is represented in Figure 4. The system starts in idle state, when hands are tracked and their position is displayed in the scene, but no interaction with scene objects is activated. If the hand falls in the region of interaction with a object, specific gestures, however, can trigger the switch to different modes. Furthermore, if the hand is in the right position to start a rotation gesture on the object, a slightly different state is enabled, giving also visual feedback about the enabled rotation axis. In this state a knob rotation gestures starts the rotation as described in the following subsections, locking a rotation direction. Translation and scaling can be activated as well when the hand is in the interaction region, with unambiguous gestures. In this way basic manipulation modes are completely decoupled, but can be activated easily with a single hand gesture. Fig. 4. Knob metaphor FSM representation. For the Pin method we aim to provide a technique that allows DOF separa- tion between translation, rotation and scaling while keeping the interaction and switching between the manipulation actions as smooth and fluent as possible. In Figure 5 a snapshot of the prototype currently in development shows the shape of the Pin widget. The two caps (blue and red) work as grabbing points to enable rotation and scaling mode with a simple tracking of the hand’s palm. The translation is enabled by doing the same just with the object itself instead of a pin cap. The idea is to mix advantages of natural interaction for translation with the advantages of a widget like the Pin for the other transformations. Fig. 5. Sneak peek of the Pin method (WiP). For both techniques validation has been conducted through an information retrieval task in which the user has to make use of all the possible manipulation actions (translation, rotation, scaling) to extract an ”hidden” information (i.e. a small text) on the object rendered in the scene. This particular task was chosen to put more emphasis on learnabilty and interaction fluency rather than high accuracy. Nonetheless the technique also provides a good level of accuracy when considered in the object display applications context. In our latest work in progress we present a simple 3D gesture recognizer based on trajectory matching in which we provide scores of classification and retrieval of command gestures based on the tracking of single hand 3D trajectories. The work shows good scores and some interesting results such as good performances with trajectories resampled with only few points (from 100 down to 3) or just the initial portion of the original trajectory (down to 20% of the trajectory). Additional results include KNN classification scores for different values of K. The tests have been made on 2 datasets respectively of 26 and 14 gestures. Our method shows how a proper pre-process of data, in particular normalization and rigid transformations, followed by a simple point-to-point distance measure can outperform a previous work presenting a solution for the same task. [12] 5 Short Bio I am Fabio Marco Caputo, PhD student at the University of Verona. I work on Human-Computer Interaction with focus on Virtual Environments and 3D Manipulation. I’m currently at end of my second year of the PhD programme and my work is supervised by Prof. Andrea Giachetti. References 1. Aigner, R., Wigdor, D., Benko, H., Haller, M., Lindbauer, D., Ion, A., Zhao, S., Koh, J.: Understanding mid-air hand gestures: A study of human preferences in usage of gesture types for hci. Microsoft Research TechReport MSR-TR-2012-111 (2012) 2. Bettio, F., Giachetti, A., Gobbetti, E., Marton, F., Pintore, G.: A practical vision based approach to unencumbered direct spatial manipulation in virtual worlds. In: Eurographics Italian Chapter Conference. (2007) 145–150 3. Bowman, D.A., Hodges, L.F.: An evaluation of techniques for grabbing and ma- nipulating remote objects in immersive virtual environments. In: Proceedings of the 1997 symposium on Interactive 3D graphics, ACM (1997) 35–ff 4. Bowman, D.A., McMahan, R.P., Ragan, E.D.: Questioning naturalism in 3d user interfaces. Communications of the ACM 55(9) (2012) 78–88 5. Caputo, F.M., Ciortan, I.M., Corsi, D., De Stefani, M., Giachetti, A.: Gestural interaction and navigation techniques for virtual museum experiences. (2016) 6. Caputo, F.M., Giachetti, A.: Evaluation of basic object manipulation modes for low-cost immersive virtual reality. In: Proceedings of the 11th Biannual Conference on Italian SIGCHI Chapter, ACM (2015) 74–77 7. Creusot, C., Pears, N., Austin, J.: A machine-learning approach to keypoint de- tection and landmarking on 3d meshes. International journal of computer vision 102(1-3) (2013) 146–179 8. Cui, J., Kuijper, A., Fellner, D.W., Sourin, A.: Understanding people’s mental models of mid-air interaction for virtual assembly and shape modeling. In: Pro- ceedings of the 29th International Conference on Computer Animation and Social Agents, ACM (2016) 139–146 9. Hand, C.: A survey of 3d interaction techniques. In: Computer graphics forum. Volume 16., Wiley Online Library (1997) 269–281 10. Jankowski, J., Hachet, M.: Advances in interaction with 3d environments. In: Computer Graphics Forum. Volume 34., Wiley Online Library (2015) 152–190 11. Kratz, S., Rohs, M.: A $3 gesture recognizer: simple gesture recognition for devices equipped with 3d acceleration sensors. In: Proceedings of the 15th international conference on Intelligent user interfaces, ACM (2010) 341–344 12. Kratz, S., Rohs, M.: Protractor3d: a closed-form solution to rotation-invariant 3d gestures. In: Proceedings of the 16th international conference on Intelligent user interfaces, ACM (2011) 371–374 13. Lavoué, G.: Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer 28(9) (2012) 931–942 14. North, C., Dwyer, T., Lee, B., Fisher, D., Isenberg, P., Robertson, G., Inkpen, K.: Understanding multi-touch manipulation for surface computing. In: IFIP Conference on Human-Computer Interaction, Springer (2009) 236–249 15. Song, P., Goh, W.B., Hutama, W., Fu, C.W., Liu, X.: A handle bar metaphor for virtual object manipulation with mid-air interaction. In: Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems. CHI ’12, New York, NY, USA, ACM (2012) 1297–1306 16. Wang, X., Feng, B., Bai, X., Liu, W., Latecki, L.J.: Bag of contour fragments for robust shape classification. Pattern Recognition 47(6) (2014) 2116–2125 17. Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or train- ing: a $1 recognizer for user interface prototypes. In: Proceedings of the 20th annual ACM symposium on User interface software and technology, ACM (2007) 159–168