=Paper=
{{Paper
|id=Vol-1516/p3
|storemode=property
|title=Web-Powered Virtual Site Exploration Based on Augmented 360 Degree Video via Gesture-Based Interaction
|pdfUrl=https://ceur-ws.org/Vol-1516/p3.pdf
|volume=Vol-1516
|dblpUrl=https://dblp.org/rec/conf/tvx/WijnantsRDQLL15
}}
==Web-Powered Virtual Site Exploration Based on Augmented 360 Degree Video via Gesture-Based Interaction==
Web-Powered Virtual Site Exploration Based on Augmented
360 Degree Video via Gesture-Based Interaction
Maarten Wijnants Gustavo Rovelo Ruiz Donald Degraen
Peter Quax Kris Luyten Wim Lamotte
Hasselt University – tUL – iMinds, Expertise Centre for Digital Media
Wetenschapspark 2, 3590 Diepenbeek, Belgium
firstname.lastname@uhasselt.be
ABSTRACT The last few years, technological advances in divergent research
Physically attending an event or visiting a venue might not always disciplines have emerged that hold great promise to increase the
be practically feasible (e.g., due to travel overhead). This article persuasiveness of interactive video-driven virtual explorations.
presents a system that enables users to remotely navigate in and A first important example is situated in the video capturing field,
interact with a real-world site using 360° video as primary content in the form of 360° video authoring. 360° video cameras produce
format. To showcase the system, a demonstrator has been built video footage that has an omni-directional (i.e., cylindrical or
that affords virtual exploration of a Belgian museum. The system spherical) Field of View. Compared to classical video, 360° video
blends contributions from multiple research disciplines into a holis- content unlocks options for increased user immersion and
tic solution. Constituting technological building blocks include engagement. Secondly, the traditional keyboard/mouse inter-
360° video, the Augmented Video Viewing (AVV) methodology action technique is recently witnessing increasing competition
that allows for Web-driven annotation of video content, a walk- from more natural alternatives like, for example, touch- or
up-and-use mid-air gesture tracking system to enable natural user gesture-based schemes. In this context, an exploratory study
interaction with the system, Non-Linear Video (NLV) constructs performed by Bleumers et al. has found mid-air gesture-based
to unlock semi-free visitation of the physical site, and the MPEG- interaction to be the preferred input method for the consumption
DASH (Dynamic Adaptive Streaming over HTTP) standard for of the 360° video content format [1]. A final notable evolution
adaptive media delivery purposes. The system’s feature list will is the increasing maturity of the Web as a platform for media
be enumerated and a high-level discussion of its technological dissemination and consumption. The HTML5 specification
foundations will be provided. The resulting solution is completely for instance covers all necessary tools to develop a 360° video
HTML5-compliant and therefore portable to a gamut of devices. player that affords typical Pan-Tilt-Zoom (PTZ) adaptation of the
viewing angle inside the omni-directionally captured video scene.
Author Keywords
Virtual exploration; HTML5; 360° video; Non-Linear Video; SYSTEM OVERVIEW AND USE CASE DESCRIPTION
gestural interaction; MPEG-DASH; Augmented Video Viewing. The work described in this article can best be summarized as of-
ACM Classification Keywords
fering an interactive multimedia content consumption experience
akin to Google Street View, yet hereby relying on 360° video
C.2.5 Computer-Communication Networks: Local and Wide-
instead of static imagery, and at the same time offering advanced
Area Networks—Internet; H.5.1 Information Interfaces and Pre-
interaction opportunities that go beyond basic “on rails” virtual
sentation: Multimedia Information Systems—Artificial, aug-
navigation. The specific use case that will be focused on in this
mented, and virtual realities; H.5.2 Information Interfaces and
manuscript, is the virtual visitation of a particular Belgian museum.
Presentation: User Interfaces—Input devices and strategies, In-
Users can move along predefined paths that have been video cap-
teraction styles; H.5.4 Information Interfaces and Presentation:
tured in 360 degrees. When reaching the end of such a path, users
Hypertext/Hypermedia
are offered the choice between a number of alternative contiguous
INTRODUCTION AND MOTIVATION traveling directions, very much like choosing a direction at a cross-
There is an increasing tendency to disclose real-world events and road. As such, a NLV scenario arises in which the user is granted
spaces in the virtual realm, to enable hindered people to participate a considerable amount of freedom to tour the museum at his per-
from a remote location. Systems that allow for cyber presence at sonal discretion. While navigating along the predetermined paths,
physically distant sites hold value for heterogeneous application users can play/pause the video sequence, dynamically change
domains, including tourism, entertainment and education. their viewing direction, and perform zoom operations.
The use case furthermore includes a gamification component in
the form of simple “treasure hunting” gameplay. In particular,
users are encouraged to look for salient museum items that have
been transparently annotated to increase user engagement with the
content. Finally, both mouse-based and gestural interaction is sup-
ported by the use case demonstrator. The two control interfaces
are identical expressive-wise, in the sense that they grant access
3rd International Workshop on Interactive Content Consumption at TVX’15, June 3rd,
2015, Brussels, Belgium. to the exact same set of functionality (i.e., direct manipulation
Copyright is held by the author(s)/owner(s).
of the user’s viewport into the 360° video content, making
navigational decisions and performing gamification interaction
through pointing and selection, video playback control).
IMPLEMENTATION
Except for its gesture-related functionality, the demonstrator
has exclusively been realized using platform independent Web
standards. The typical execution environment of the player
component of the demonstrator is therefore a Web browser.
The involved media content was recorded using an omni- Figure 1. Gesture-based interaction with the demonstrator.
directional sensor setup consisting of 7 GoPro Hero3+ Black
cameras mounted in a 360Heros rig (http://www.360heros.com/).
The resulting video material was temporally segmented according
to the physical layout of the museum in order to yield individual
clips for each of the traversable paths. The collection of paths
(and their mutual relationships) is encoded as a directed graph.
This graph dictates the branching options in the NLV playback.
Media streaming is implemented by means of MPEG-DASH.
The separate video clips from the content authoring phase were
each transcoded into multiple qualities, temporally split into Figure 2. Three applications of the AVV methodology in the demonstrator.
consecutive media segments of identical duration (e.g., 2 seconds),
and described by means of a MPD. The resulting content was pub- methodology [2]. This methodology (and its Web-compliant
lished by hosting it on an off-the-shelf HTTP server. The W3C implementation) is intended to transform video consumption
Media Source Extensions specification is exploited to allow for from a passive into a more lean-forward type of experience by
the HTML5-powered decoding and rendering of media segments providing the ability to superimpose (potentially interactive)
that are downloaded in an adaptive fashion using JavaScript code. overlays on top of the media content. The navigation options
While the playback of a path is active, the initial media segments are represented as arrows indicating the direction of potential
that pertain to each of the potential follow-up routes (as derived follow-up paths; their visualization is toggled when the playback
from the NLV graph representation) are pre-fetched from the of the current path is about to end. The treasure hunt objects on
HTTP server. The total number of media segments to pre-fetch is the other hand are (invisibly) annotated by means of a transparent
dictated by the corresponding path’s minBufferTime MPD overlay. When the user points to such an object, the visual style
attribute. By making initial media data locally available ahead of of the associated AVV annotation is on-the-fly transformed
time, the startup delay of the selected follow-up path is minimized. (through CSS operations) into a semi-transparent one. If the item
is subsequently selected, an informative AVV-managed call-out
The gesture set that was defined for the demonstrator consists widget is visualized. Finally, the AVV methodology is also
of composite gestures in the sense that they involve performing exploited to present visual feedback of the user’s current pointing
a sequence of discrete, gradually refining postures. As such, it location. This is realized by dynamically instantiating an overlay
becomes feasible to organize the available gestures in a tree-like (carrying a green hand icon) as soon as the user enters pointing
topology, where intermediate layers represent necessary steps mode and by continuously updating its coordinates as the pointing
towards reaching a leaf node, at which point the gesture (and operation is being performed. When pointing offscreen (only
its corresponding action) is actually actuated. It also allows possible with the gestural interface), the hand icon is clamped to
gesture clustering and organization on the basis of their respective the nearest on-screen position and turns red. Figure 2 illustrates
sequence of encompassed postures. Two gestures whose posture the three applications of the AVV approach in the demonstrator.
series are identical up to some intermediate level, share a branch
in the tree up to that level and only then diverge topologically. ACKNOWLEDGMENTS
The research leading to these results has received funding
The gestural interface is implemented by means of a mid-air from the European Union’s Seventh Framework Programme
gesture recognizer (which currently relies on a Kinect 2.0 for (FP7/2007-2013) under grant agreement n° 610370, ICoSOLE
skeleton tracking purposes). It adheres to a walk-up-and-use (“Immersive Coverage of Spatially Outspread Live Events”,
design, which implies that it provides supportive measures that http://www.icosole.eu).
empower users to leverage the system without requiring training.
The supportive measures take the form of a hierarchical gesture REFERENCES
guidance system that exploits the tree-like organization of the 1. Bleumers, L., Van den Broeck, W., Lievens, B.,
gesture set to visually walk the user through the subsequent steps and Pierson, J. Seeing the Bigger Picture: A User Perspective
needed to perform a particular gesture (see Figure 1). on 360◦ TV. In Proc. EuroITV 2012, ACM (2012), 115–124.
To encode and present the navigation options at NLV decision 2. Wijnants, M., Leën, J., Quax, P., and Lamotte, W. Augmented
points and to add interactivity to the treasure hunt objects that Video Viewing: Transforming Video Consumption into an Ac-
appear in the video footage, the demonstrator resorts to the AVV tive Experience. In Proc. MMSys 2014, ACM (2014), 164–167.