=Paper= {{Paper |id=Vol-1716/WSICC_2016_paper_7 |storemode=property |title=The Case for Virtual Director Technology - Enabling Individual Immersive Media Experiences via Live Content Selection and Editing |pdfUrl=https://ceur-ws.org/Vol-1716/WSICC_2016_paper_7.pdf |volume=Vol-1716 |authors=Rene Kaiser,Manolis Falelakis,Wolfgang Weiss,Marian F. Ursu |dblpUrl=https://dblp.org/rec/conf/tvx/KaiserFWU16 }} ==The Case for Virtual Director Technology - Enabling Individual Immersive Media Experiences via Live Content Selection and Editing== https://ceur-ws.org/Vol-1716/WSICC_2016_paper_7.pdf
     The Case for Virtual Director Technology – Enabling
  Individual Immersive Media Experiences via Live Content
                    Selection and Editing

                            Rene Kaiser                                         Manolis Falelakis
           Institute for Information and Communication              Electrical and Computer Engineering Dept.
                             Technologies                               Aristotle University of Thessaloniki
                      JOANNEUM RESEARCH                                         Thessaloniki, Greece
                             Graz, Austria                                   manf@issel.ee.auth.gr
                  rene.kaiser@joanneum.at

                         Wolfgang Weiss                                           Marian F. Ursu
           Institute for Information and Communication              Department of Theatre, Film and Television
                             Technologies                                      University of York
                      JOANNEUM RESEARCH                                      York, United Kingdom
                             Graz, Austria                                  marian.ursu@york.ac.uk
               wolfgang.weiss@joanneum.at


ABSTRACT                                                           Categories and Subject Descriptors
An emergence of applications based on live audio-visual con-       H.4.3 [Information Systems Applications]: Communi-
tent streams could be observed in recent years. While the          cations Applications—Computer conferencing, teleconferenc-
technological infrastructure in terms of bandwidth and de-         ing, and videoconferencing
vice capabilities has advanced, media formats and related
consumption paradigms have not changed as fundamentally.           Keywords
Meanwhile, a considerable amount of research has addressed
                                                                   Virtual Director; social multimedia; telepresence; cinemato-
automatic personalization of multimedia content for the sake
                                                                   graphic principles; live event broadcast; camera selection;
of enabling immersive multimedia experiences, however, mostly
                                                                   viewpoint selection;
considering pre-recorded and not live content. This paper
states the case for more research to be conducted on what we
refer to as Virtual Director technology as one key enabling        1.   INTRODUCTION
technology for the hyper-personalization of live content de-          The increase of live video stream services is very visible
livery. A Virtual Director is software that automatically se-      to consumers through a high rate of new multimedia appli-
lects, frames, mixes and cuts from a number of AV content          cations with ever-improving audio-visual quality. The avail-
streams. It aims to automate the complex and challenging           able bandwidth allows transmitting high resolution streams
tasks that a broadcast director and team undertake during          with low enough delay. Still, most services follow a broad-
a live event broadcast. Virtual Director software can be ap-       cast model and do not aim to deliver a truely personal ex-
plied in a range of use-cases, taking the individual’s needs       perience by taking individual user preferences into account.
into account. There is unlimited scope regarding which fac-        Relatively few commercial systems are adapting content on
tors such components could reason about in decision making.        an atomic level, aiming for hyper-personalization. The value
While such Virtual Director software has been developed as         of such capabilities, however, is unquestioned.
research prototypes, manifold challenges remain open to un-           Research prototypes have been developed that aim to ad-
lock its full potential. This paper presents recent technolog-     dress this problem space. Virtual Director software metaphor-
ical achievements and reflects the potential of the approach       ically aims to mimic and automate the work and knowledge
in two selected application domains, interactive live event        of a live TV broadcast team. This concept [5] [14] [1] is
broadcast and group videoconferencing.                             a key enabler for immersive experiences on top of multi-
                                                                   media systems. Beyond basic tasks of automatically select-
                                                                   ing, framing, mixing and cutting from a number of avail-
                                                                   able live media streams, existing multimedia experiences
                                                                   can be enhanced through new levels of personalization, au-
                                                                   tomatic content adaptation to playout devices, etc. Such
                                                                   components are required to take decisions within real-time
                                                                   constraints to deliver more interactive media consumption
4th International Workshop on Interactive Content Consumption at   forms.
TVX’16, June 22, 2016, Chicago, IL, USA.
Copyright is held by the author/owner(s).                             The real-time aspect is key since what’s happening in a
.                                                                  scene observed by cameras can’t be predicted for the most
part. User preferences may change dynamically as well, lim-
iting the scope to compute options in advance. Via ser-
vices realized with a Virtual Director approach, every user
may get different content, and user profiles or preferences
expressed through whatever (abstract) form of interaction
with the system could be changed during consumption as
well, to which the systems needs to respond immediately.
   Virtual Director technology can take decisions for an in-
dividual user, however, it can also enhance experiences for
social groups, in a co-located space or distributed in sev-
eral places. Example scenarios range from rather passive in
nature, e.g. watching remote theatre performances together
with friends, to rather active, as in a group of friends at-
tending a language course via videoconferencing. A Virtual
Director can either decide on media presentation for social       Figure 1: Evaluation setup for social video com-
groups as a collective, not just for each node individually. It   munication with four people in two rooms that are
can help create an immersive social experience in which ge-       equipped with one screen and three physical HD
ographically distant people feel part of a group, or combine      cameras each. The Virtual Director decides which
any activity with a social communication link.                    video streams to render on screen depending on the
   On a technical level, across multiple application domains,     communication situation.
we aim to build a generic Virtual Director software frame-
work, using a rule-based approach with event processing
technology. Two main challenging aspects on the techni-
                                                                  for a number of reasons: if gestures, facial expressions, eye
cal level are (i) to make use of low-level sensor information
                                                                  gaze, body language and such can be recognized, it makes
in order to achieve an understanding of the scene that is
                                                                  the communication more natural, closer to face-to-face com-
covered by the media streams, and (ii) to execute a set of
                                                                  munication (cp. media naturalness theory [7]). Further, ob-
pragmatic and cinematographic principles for decision mak-
                                                                  servations from our trials indicate that the ability to show
ing in real-time.
                                                                  real-life items to people in remote locations is crucial. The
   As an example in video-mediated communication between
                                                                  distance of participants to screens and the size of remote
larger groups, low-level cues from speech (audio) and face
                                                                  heads are also relevant to achieve a natural communication
detection (video) are interpreted to understand communica-
                                                                  atmosphere.
tion patterns in a process called Semantic Lifting (first chal-
                                                                     A Virtual Director is needed in such systems simply be-
lenge). Based on that, the Virtual Director decides what to
                                                                  cause there are too many media streams to play in parallel.
show to each individual participant, taking cinematographic
                                                                  But beyond that pragmatic reason, such technology can en-
rules into account in deciding when to cut to another cam-
                                                                  hance the experience, and also influence the social commu-
era (see [1] and references therein; second challenge). The
                                                                  nication through its decisions. Popular solutions like Skype
behavior can ultimately lead to immersive experiences and
                                                                  and Hangouts generally work well for a limited number of
contribute to effects like telepresence.
                                                                  participants, but have clear limitations regarding camera se-
   The following sections discuss two application examples,
                                                                  lection behavior compared to natural face to face conversa-
from the domains of videoconferencing and live event broad-
                                                                  tion or professionally edited motion picture. Further, unlike
cast, to illustrate how automatic content selection on live
                                                                  a real space where people meet and chat, the conversation
audio and video streams can have an impact.
                                                                  topology in today’s solutions is rather constant, has no direct
                                                                  way of branching into side-conversations without leaving the
2.   GROUP TELEPRESENCE                                           current communication space, to maintain lateral awareness
  We have built a Virtual Director system (called Orches-         of such.
tration Engine) for social group communication in different          We aim to continue to evaluate the effectiveness (Are indi-
setups, using multiple microphones, loudspeakers, cameras         vidual communication goals met? ) and immersiveness (Do
and screens. The design of our system for social group com-       participants like the experience? ) of Virtual Director ap-
munication has been informed by higher values such as to-         proaches in further experiments. One string of experimen-
getherness [12]. See Figure 1 for an example setup. Based         tation is to further look into the human capabilities that ma-
on that system, a number of evaluation experiments have           chines try to replicate in this context. Human directors nat-
been conducted and published [3] [15] [14] [2].                   urally benefit from their implicit knowledge and understand-
  To support telepresence and other communication goals           ing of the conversational context and a rich set of verbal and
in such a context, the quality of a number of factors is key      non-verbal cues. The Virtual Director is handicapped in this
to enable a conversation that is both effective and enjoy-        sense, having to work with the inherent limitations of real-
able [8]. Audio and video have to be in sync, delays greater      time AV analysis or other sensors, in terms of number of
than 200 milliseconds are generally regarded as disturbing        features and detection accuracy, and the closed set of rules.
— see [13]. Audio is regarded more important than video              We have started to explore other sources that could in-
in group communication, however, a certain balance needs          form the decision making process, for example extracting
to be maintained regarding the different modalities. Audio        information from a social network that a videoconferencing
can be enhanced by using microphone arrays and quality en-        environment could be integrated with (see [10]). The aim
hancing features (e.g. dimming background noise, echo can-        behind this research was to see how conversation patterns
cellation). The resolution of transmitted videos is important     can be predicted and which factors influence them.
3.   PERSONALIZED LIVE EVENT BROAD-                               4.   DISCUSSION AND OUTLOOK
     CAST – ’NARROWCAST’                                             We have illustrated the potential of Virtual Director tech-
   In a second application domain we have been working on         nology in the context of immersive multimedia applications
a Virtual Director for live event broadcast, based on a scene-    using live audiovisual content streams. There are still lots of
capture approach with a panoramic camera [11] and multiple        issues to be addressed by future research activities. For ex-
microphones [9]. Note that the well established term broad-       ample, it is difficult to extract all the necessary information
cast is used here even though our aim is at a personalized        from low level cues or to structure a comprehensive set of
narrowcast. While in traditional broadcast every user re-         cinematographic rules. Humans naturally benefit from their
ceives the same – except e.g. screen size and distance, color     implicit knowledge and their feelings to foresee certain sit-
calibration, audio playout quality, aspect ratio cropping – in    uations, which is very challenging for software components
setups employing a Virtual Director component every user          to replicate.
or group of users might receive a personalized output, e.g.          We have implemented Virtual Director research proto-
including a bias to a sports team, person, or type of action.     types in two different application domains. The domain
Personalization means taking user preferences into account        of group videoconferencing benefits from a Virtual Direc-
along all subprocesses from sensor interpretation and cam-        tor through better communication experience by the partic-
era framing to decision making regarding when to cut from         ipants. We hypothesize that the benefit is bigger the more
which viewpoint to another. It also concerns the user’s in-       complex the setup is, either regarding number of partici-
frastructure, for example adapting to the screen size. The        pants, or cameras/screens per participant, or any of the
ideal zoom level and panning speed is very different when         many other aspects in such setups. A Virtual Director in
you intend to produce for either cinema projections or small      the domain of interactive live event broadcast enables mass
mobile phone screens, yet few live productions take this into     customization in content production where viewers can in-
account and produce for multiple output channels in paral-        dividually select what and how to watch. Ongoing research
lel.                                                              efforts should lead in both application domains to new forms
   A sample scene is depicted in Figure 2. The system auto-       of interactivity and a more immersive multimedia experi-
matically produces individual content streams for different       ence.
playout devices and user preferences. Viewers may watch              Other application domains for Virtual Director approaches
different parts of the scene according to their interests –       include specific group communication scenarios like refugee
while for some content types like sports there is at least a      support, remote learning (e.g. in massive open online courses,
superficial common understanding what is most relevant and        MOOCs), telehealth/telemedicine, remote care, distributed
how to frame it in video, for performance shows like the one      theatre performances [4], and new forms of participative
depicted there are few rules.                                     democracy. The approach appears to be especially relevant
   On one hand, several quality factors like content trans-       for novel types of media content, e.g. panoramic and 360 ◦
mission delay play a lesser role in this domain compared to       video content, or live content for virtual reality (VR) play-
videoconferencing since media is sent in one direction only.      out devices. In the VR domain especially there are issues
In the other hand, we found in user evaluations with both         that prevent users from watching lengthy content, i.e. vir-
production professionals and users without such particular        tual reality sickness (cybersickness, motion sickness). Even
knowledge that expectations regarding the visual cinematic        if these issues get solved, the challenge remains how to en-
quality are based on professionally edited TV programmes –        able users to switch between lean forward interactive content
some of which are fully scripted, so every camera movement        consumption and lean backward passive watching. A Virtual
and cutting decision can be planned.                              Director approach might be very useful in such scenarios.
   The behavior of our Virtual Director prototype [6] for this       Overall, it can be stated that this research area is still in
domain was crafted with limited production grammar engi-          its infancy, but given its obvious potential, more research
neering resources. We concluded that its decision quality         needs to be conducted to deliver components that can serve
could not compare with the professional live editing skills       users in real scenarios outside research labs. On a more de-
and is especially lacking in perceived creativity, storytelling   tailed level, remaining research challenges are for example to
skills and intuition. Nevertheless, a Virtual Director pro-       consider the scalability of the approach in applications that
vides the advantage of quicker reactions to low-level cues,       require distributed decision making, standardized represen-
which seems to play an important role in such a setup, and        tation formats for Virtual Director behaviour, tool support
a consistency in decision-making that results in a more re-       for the authoring of Virtual Director behaviour, and design
liable experience. Due to factors such as fatigue, difficulty     patterns that enable the decoupling and re-use of bodies of
in hearing and seeing the events in the scene, and inherent       Virtual Director behaviour.
differentiation in human mixing responses to salient events,
human broadcast professionals will not be consistent in their     Acknowledgement
decisions, while a machine ceteris paribus always responds        The research leading to these results has received funding
the same.                                                         from the European Community’s Seventh Framework Pro-
   We further conclude that purely reactive behavior has cer-     gramme under grant agreements no. 214793 TA21 – To-
tain limitations and predictive situation/scene understand-       gether Anywhere, Together Anytime, no. 248138 FascinatE2
ing is desired for future research iterations. We argue that      – Format-Agnostic SCript-based INterAcTive Experience, no.
the added value of our concept for enhancing multimedia ex-       287760, Vconect3 – Video Communications for Networked
periences lies in the parallelization of individually tailored
                                                                  1
content selection decisions on a scalability level that a hu-       http://www.ta2-project.eu/
                                                                  2
man production team can’t realize for economic reasons.             http://www.fascinate-project.eu/
                                                                  3
                                                                    http://vconect-project.eu/
Figure 2: Performance in a large stage as captured by a panoramic video camera system. The Virtual Director
in this setup automatically framed a set of animated virtual cameras within the panorama and took different
automatic cutting decisions for different users in parallel.


Communities and no. 610370, ICoSOLE4 – Immersive Cov-                ubiquitous era, pages 59–63, New York, NY, USA,
erage of Spatially Outspread Live Events.                            2008. ACM.
                                                                 [9] O. A. Niamut, R. Kaiser, G. Kienast, A. Kochale,
                                                                     J. Spille, and O. Schreer. Towards a format-agnostic
5.     REFERENCES                                                    approach for production, delivery and rendering of
    [1] M. Falelakis, M. Groen, M. Frantzis, R. Kaiser, and          immersive media. In ACM MMSys, Oslo, Norway,
        M. Ursu. Automatic orchestration of video streams to         2013.
        enhance group communication. In Proceedings of the      [10] J. Schantl, C. Wagner, R. Kaiser, and M. Strohmaier.
        2012 ACM MM International Workshop on                        The utility of social and topical factors in anticipating
        Socially-Aware Multimedia, pages 25–30. ACM, 2012.           repliers in twitter conversations. In ACM Web Science
    [2] M. Falelakis, M. F. Ursu, E. Geelhoed, R. Kaiser, and        (WebSci2013), 2013.
        M. Frantzis. Connecting living rooms: An experiment     [11] O. Schreer, I. Feldmann, C. Weissig, P. Kauff, and
        in orchestrated social video communication. In               R. Schäfer. Ultrahigh-resolution panoramic imaging
        Proceedings of ACM TVX ’16, 2016.                            for format-agnostic video production. Proceedings of
    [3] M. Groen, M. Ursu, S. Michalakopoulos, M. Falelakis,         the IEEE, 101(1):99–114, 2013.
        and E. Gasparis. Improving video-mediated               [12] M. Steen and I. van de Poel. Making values explicit
        communication with orchestration. Computers in               during the design process. IEEE Technol. Soc. Mag.,
        Human Behavior, 28(5):1575 – 1579, 2012.                     31(4):63–72, 2012.
    [4] R. Kaiser, M. F. Ursu, M. Falelakis, and A. Horti.      [13] I. T. Union. One-way Transmission Time:
        Enabling distributed theatre performances through            Recommendation G.114 (05/03). ITU-T
        multi-camera telepresence: Capturing system                  recommendations. ITU, 2003.
        behaviour in a script-based approach. In Proceedings    [14] M. F. Ursu, M. Falelakis, M. Groen, R. Kaiser, and
        of the 3rd International Workshop on Immersive               M. Frantzis. Experimental Enquiry into Automatically
        Media Experiences, ImmersiveME ’15, pages 21–26,             Orchestrated Live Video Communication in Social
        New York, NY, USA, 2015. ACM.                                Settings. In Proceedings of the ACM International
    [5] R. Kaiser and W. Weiss. Media Production, Delivery           Conference on Interactive Experiences for TV and
        and Interaction for Platform Independent Systems:            Online Video, TVX’15, pages 63–72. ACM, 2015.
        Format-Agnostic Media, chapter Virtual Director.        [15] M. F. Ursu, M. Groen, M. Falelakis, M. Frantzis,
        Wiley, 2014.                                                 V. Zsombori, and R. Kaiser. Orchestration: Tv-like
    [6] R. Kaiser, W. Weiss, and G. Kienast. The FascinatE           Mixing Grammars Applied to Video-communication
        Production Scripting Engine. In Advances in                  for Social Groups. In Proceedings of the 21st ACM
        Multimedia Modeling, volume 7131 of Lecture Notes in         International Conference on Multimedia, MM’13,
        Computer Science, pages 682–692. Springer Berlin             pages 333–342. ACM, 2013.
        Heidelberg, 2012.
    [7] N. Kock. The psychobiological model: Towards a new
        theory of computer-mediated communication based on
        darwinian evolution. Organization Science,
        15(3):327–348, 2004.
    [8] P. Ljungstrand and S. Björk. Supporting group
        relationships in mediated domestic environments. In
        MindTrek ’08: Proceedings of the 12th international
        conference on Entertainment and media in the
4
    http://icosole.eu/