The Case for Virtual Director Technology – Enabling Individual Immersive Media Experiences via Live Content Selection and Editing Rene Kaiser Manolis Falelakis Institute for Information and Communication Electrical and Computer Engineering Dept. Technologies Aristotle University of Thessaloniki JOANNEUM RESEARCH Thessaloniki, Greece Graz, Austria manf@issel.ee.auth.gr rene.kaiser@joanneum.at Wolfgang Weiss Marian F. Ursu Institute for Information and Communication Department of Theatre, Film and Television Technologies University of York JOANNEUM RESEARCH York, United Kingdom Graz, Austria marian.ursu@york.ac.uk wolfgang.weiss@joanneum.at ABSTRACT Categories and Subject Descriptors An emergence of applications based on live audio-visual con- H.4.3 [Information Systems Applications]: Communi- tent streams could be observed in recent years. While the cations Applications—Computer conferencing, teleconferenc- technological infrastructure in terms of bandwidth and de- ing, and videoconferencing vice capabilities has advanced, media formats and related consumption paradigms have not changed as fundamentally. Keywords Meanwhile, a considerable amount of research has addressed Virtual Director; social multimedia; telepresence; cinemato- automatic personalization of multimedia content for the sake graphic principles; live event broadcast; camera selection; of enabling immersive multimedia experiences, however, mostly viewpoint selection; considering pre-recorded and not live content. This paper states the case for more research to be conducted on what we refer to as Virtual Director technology as one key enabling 1. INTRODUCTION technology for the hyper-personalization of live content de- The increase of live video stream services is very visible livery. A Virtual Director is software that automatically se- to consumers through a high rate of new multimedia appli- lects, frames, mixes and cuts from a number of AV content cations with ever-improving audio-visual quality. The avail- streams. It aims to automate the complex and challenging able bandwidth allows transmitting high resolution streams tasks that a broadcast director and team undertake during with low enough delay. Still, most services follow a broad- a live event broadcast. Virtual Director software can be ap- cast model and do not aim to deliver a truely personal ex- plied in a range of use-cases, taking the individual’s needs perience by taking individual user preferences into account. into account. There is unlimited scope regarding which fac- Relatively few commercial systems are adapting content on tors such components could reason about in decision making. an atomic level, aiming for hyper-personalization. The value While such Virtual Director software has been developed as of such capabilities, however, is unquestioned. research prototypes, manifold challenges remain open to un- Research prototypes have been developed that aim to ad- lock its full potential. This paper presents recent technolog- dress this problem space. Virtual Director software metaphor- ical achievements and reflects the potential of the approach ically aims to mimic and automate the work and knowledge in two selected application domains, interactive live event of a live TV broadcast team. This concept [5] [14] [1] is broadcast and group videoconferencing. a key enabler for immersive experiences on top of multi- media systems. Beyond basic tasks of automatically select- ing, framing, mixing and cutting from a number of avail- able live media streams, existing multimedia experiences can be enhanced through new levels of personalization, au- tomatic content adaptation to playout devices, etc. Such components are required to take decisions within real-time constraints to deliver more interactive media consumption 4th International Workshop on Interactive Content Consumption at forms. TVX’16, June 22, 2016, Chicago, IL, USA. Copyright is held by the author/owner(s). The real-time aspect is key since what’s happening in a . scene observed by cameras can’t be predicted for the most part. User preferences may change dynamically as well, lim- iting the scope to compute options in advance. Via ser- vices realized with a Virtual Director approach, every user may get different content, and user profiles or preferences expressed through whatever (abstract) form of interaction with the system could be changed during consumption as well, to which the systems needs to respond immediately. Virtual Director technology can take decisions for an in- dividual user, however, it can also enhance experiences for social groups, in a co-located space or distributed in sev- eral places. Example scenarios range from rather passive in nature, e.g. watching remote theatre performances together with friends, to rather active, as in a group of friends at- tending a language course via videoconferencing. A Virtual Director can either decide on media presentation for social Figure 1: Evaluation setup for social video com- groups as a collective, not just for each node individually. It munication with four people in two rooms that are can help create an immersive social experience in which ge- equipped with one screen and three physical HD ographically distant people feel part of a group, or combine cameras each. The Virtual Director decides which any activity with a social communication link. video streams to render on screen depending on the On a technical level, across multiple application domains, communication situation. we aim to build a generic Virtual Director software frame- work, using a rule-based approach with event processing technology. Two main challenging aspects on the techni- for a number of reasons: if gestures, facial expressions, eye cal level are (i) to make use of low-level sensor information gaze, body language and such can be recognized, it makes in order to achieve an understanding of the scene that is the communication more natural, closer to face-to-face com- covered by the media streams, and (ii) to execute a set of munication (cp. media naturalness theory [7]). Further, ob- pragmatic and cinematographic principles for decision mak- servations from our trials indicate that the ability to show ing in real-time. real-life items to people in remote locations is crucial. The As an example in video-mediated communication between distance of participants to screens and the size of remote larger groups, low-level cues from speech (audio) and face heads are also relevant to achieve a natural communication detection (video) are interpreted to understand communica- atmosphere. tion patterns in a process called Semantic Lifting (first chal- A Virtual Director is needed in such systems simply be- lenge). Based on that, the Virtual Director decides what to cause there are too many media streams to play in parallel. show to each individual participant, taking cinematographic But beyond that pragmatic reason, such technology can en- rules into account in deciding when to cut to another cam- hance the experience, and also influence the social commu- era (see [1] and references therein; second challenge). The nication through its decisions. Popular solutions like Skype behavior can ultimately lead to immersive experiences and and Hangouts generally work well for a limited number of contribute to effects like telepresence. participants, but have clear limitations regarding camera se- The following sections discuss two application examples, lection behavior compared to natural face to face conversa- from the domains of videoconferencing and live event broad- tion or professionally edited motion picture. Further, unlike cast, to illustrate how automatic content selection on live a real space where people meet and chat, the conversation audio and video streams can have an impact. topology in today’s solutions is rather constant, has no direct way of branching into side-conversations without leaving the 2. GROUP TELEPRESENCE current communication space, to maintain lateral awareness We have built a Virtual Director system (called Orches- of such. tration Engine) for social group communication in different We aim to continue to evaluate the effectiveness (Are indi- setups, using multiple microphones, loudspeakers, cameras vidual communication goals met? ) and immersiveness (Do and screens. The design of our system for social group com- participants like the experience? ) of Virtual Director ap- munication has been informed by higher values such as to- proaches in further experiments. One string of experimen- getherness [12]. See Figure 1 for an example setup. Based tation is to further look into the human capabilities that ma- on that system, a number of evaluation experiments have chines try to replicate in this context. Human directors nat- been conducted and published [3] [15] [14] [2]. urally benefit from their implicit knowledge and understand- To support telepresence and other communication goals ing of the conversational context and a rich set of verbal and in such a context, the quality of a number of factors is key non-verbal cues. The Virtual Director is handicapped in this to enable a conversation that is both effective and enjoy- sense, having to work with the inherent limitations of real- able [8]. Audio and video have to be in sync, delays greater time AV analysis or other sensors, in terms of number of than 200 milliseconds are generally regarded as disturbing features and detection accuracy, and the closed set of rules. — see [13]. Audio is regarded more important than video We have started to explore other sources that could in- in group communication, however, a certain balance needs form the decision making process, for example extracting to be maintained regarding the different modalities. Audio information from a social network that a videoconferencing can be enhanced by using microphone arrays and quality en- environment could be integrated with (see [10]). The aim hancing features (e.g. dimming background noise, echo can- behind this research was to see how conversation patterns cellation). The resolution of transmitted videos is important can be predicted and which factors influence them. 3. PERSONALIZED LIVE EVENT BROAD- 4. DISCUSSION AND OUTLOOK CAST – ’NARROWCAST’ We have illustrated the potential of Virtual Director tech- In a second application domain we have been working on nology in the context of immersive multimedia applications a Virtual Director for live event broadcast, based on a scene- using live audiovisual content streams. There are still lots of capture approach with a panoramic camera [11] and multiple issues to be addressed by future research activities. For ex- microphones [9]. Note that the well established term broad- ample, it is difficult to extract all the necessary information cast is used here even though our aim is at a personalized from low level cues or to structure a comprehensive set of narrowcast. While in traditional broadcast every user re- cinematographic rules. Humans naturally benefit from their ceives the same – except e.g. screen size and distance, color implicit knowledge and their feelings to foresee certain sit- calibration, audio playout quality, aspect ratio cropping – in uations, which is very challenging for software components setups employing a Virtual Director component every user to replicate. or group of users might receive a personalized output, e.g. We have implemented Virtual Director research proto- including a bias to a sports team, person, or type of action. types in two different application domains. The domain Personalization means taking user preferences into account of group videoconferencing benefits from a Virtual Direc- along all subprocesses from sensor interpretation and cam- tor through better communication experience by the partic- era framing to decision making regarding when to cut from ipants. We hypothesize that the benefit is bigger the more which viewpoint to another. It also concerns the user’s in- complex the setup is, either regarding number of partici- frastructure, for example adapting to the screen size. The pants, or cameras/screens per participant, or any of the ideal zoom level and panning speed is very different when many other aspects in such setups. A Virtual Director in you intend to produce for either cinema projections or small the domain of interactive live event broadcast enables mass mobile phone screens, yet few live productions take this into customization in content production where viewers can in- account and produce for multiple output channels in paral- dividually select what and how to watch. Ongoing research lel. efforts should lead in both application domains to new forms A sample scene is depicted in Figure 2. The system auto- of interactivity and a more immersive multimedia experi- matically produces individual content streams for different ence. playout devices and user preferences. Viewers may watch Other application domains for Virtual Director approaches different parts of the scene according to their interests – include specific group communication scenarios like refugee while for some content types like sports there is at least a support, remote learning (e.g. in massive open online courses, superficial common understanding what is most relevant and MOOCs), telehealth/telemedicine, remote care, distributed how to frame it in video, for performance shows like the one theatre performances [4], and new forms of participative depicted there are few rules. democracy. The approach appears to be especially relevant On one hand, several quality factors like content trans- for novel types of media content, e.g. panoramic and 360 ◦ mission delay play a lesser role in this domain compared to video content, or live content for virtual reality (VR) play- videoconferencing since media is sent in one direction only. out devices. In the VR domain especially there are issues In the other hand, we found in user evaluations with both that prevent users from watching lengthy content, i.e. vir- production professionals and users without such particular tual reality sickness (cybersickness, motion sickness). Even knowledge that expectations regarding the visual cinematic if these issues get solved, the challenge remains how to en- quality are based on professionally edited TV programmes – able users to switch between lean forward interactive content some of which are fully scripted, so every camera movement consumption and lean backward passive watching. A Virtual and cutting decision can be planned. Director approach might be very useful in such scenarios. The behavior of our Virtual Director prototype [6] for this Overall, it can be stated that this research area is still in domain was crafted with limited production grammar engi- its infancy, but given its obvious potential, more research neering resources. We concluded that its decision quality needs to be conducted to deliver components that can serve could not compare with the professional live editing skills users in real scenarios outside research labs. On a more de- and is especially lacking in perceived creativity, storytelling tailed level, remaining research challenges are for example to skills and intuition. Nevertheless, a Virtual Director pro- consider the scalability of the approach in applications that vides the advantage of quicker reactions to low-level cues, require distributed decision making, standardized represen- which seems to play an important role in such a setup, and tation formats for Virtual Director behaviour, tool support a consistency in decision-making that results in a more re- for the authoring of Virtual Director behaviour, and design liable experience. Due to factors such as fatigue, difficulty patterns that enable the decoupling and re-use of bodies of in hearing and seeing the events in the scene, and inherent Virtual Director behaviour. differentiation in human mixing responses to salient events, human broadcast professionals will not be consistent in their Acknowledgement decisions, while a machine ceteris paribus always responds The research leading to these results has received funding the same. from the European Community’s Seventh Framework Pro- We further conclude that purely reactive behavior has cer- gramme under grant agreements no. 214793 TA21 – To- tain limitations and predictive situation/scene understand- gether Anywhere, Together Anytime, no. 248138 FascinatE2 ing is desired for future research iterations. We argue that – Format-Agnostic SCript-based INterAcTive Experience, no. the added value of our concept for enhancing multimedia ex- 287760, Vconect3 – Video Communications for Networked periences lies in the parallelization of individually tailored 1 content selection decisions on a scalability level that a hu- http://www.ta2-project.eu/ 2 man production team can’t realize for economic reasons. http://www.fascinate-project.eu/ 3 http://vconect-project.eu/ Figure 2: Performance in a large stage as captured by a panoramic video camera system. The Virtual Director in this setup automatically framed a set of animated virtual cameras within the panorama and took different automatic cutting decisions for different users in parallel. Communities and no. 610370, ICoSOLE4 – Immersive Cov- ubiquitous era, pages 59–63, New York, NY, USA, erage of Spatially Outspread Live Events. 2008. ACM. [9] O. A. Niamut, R. Kaiser, G. Kienast, A. Kochale, J. Spille, and O. Schreer. Towards a format-agnostic 5. REFERENCES approach for production, delivery and rendering of [1] M. Falelakis, M. Groen, M. Frantzis, R. Kaiser, and immersive media. In ACM MMSys, Oslo, Norway, M. Ursu. Automatic orchestration of video streams to 2013. enhance group communication. In Proceedings of the [10] J. Schantl, C. Wagner, R. Kaiser, and M. Strohmaier. 2012 ACM MM International Workshop on The utility of social and topical factors in anticipating Socially-Aware Multimedia, pages 25–30. ACM, 2012. repliers in twitter conversations. In ACM Web Science [2] M. Falelakis, M. F. Ursu, E. Geelhoed, R. Kaiser, and (WebSci2013), 2013. M. Frantzis. Connecting living rooms: An experiment [11] O. Schreer, I. Feldmann, C. Weissig, P. Kauff, and in orchestrated social video communication. In R. Schäfer. Ultrahigh-resolution panoramic imaging Proceedings of ACM TVX ’16, 2016. for format-agnostic video production. Proceedings of [3] M. Groen, M. Ursu, S. Michalakopoulos, M. Falelakis, the IEEE, 101(1):99–114, 2013. and E. Gasparis. Improving video-mediated [12] M. Steen and I. van de Poel. Making values explicit communication with orchestration. Computers in during the design process. IEEE Technol. Soc. Mag., Human Behavior, 28(5):1575 – 1579, 2012. 31(4):63–72, 2012. [4] R. Kaiser, M. F. Ursu, M. Falelakis, and A. Horti. [13] I. T. Union. One-way Transmission Time: Enabling distributed theatre performances through Recommendation G.114 (05/03). ITU-T multi-camera telepresence: Capturing system recommendations. ITU, 2003. behaviour in a script-based approach. In Proceedings [14] M. F. Ursu, M. Falelakis, M. Groen, R. Kaiser, and of the 3rd International Workshop on Immersive M. Frantzis. Experimental Enquiry into Automatically Media Experiences, ImmersiveME ’15, pages 21–26, Orchestrated Live Video Communication in Social New York, NY, USA, 2015. ACM. Settings. In Proceedings of the ACM International [5] R. Kaiser and W. Weiss. Media Production, Delivery Conference on Interactive Experiences for TV and and Interaction for Platform Independent Systems: Online Video, TVX’15, pages 63–72. ACM, 2015. Format-Agnostic Media, chapter Virtual Director. [15] M. F. Ursu, M. Groen, M. Falelakis, M. Frantzis, Wiley, 2014. V. Zsombori, and R. Kaiser. Orchestration: Tv-like [6] R. Kaiser, W. Weiss, and G. Kienast. The FascinatE Mixing Grammars Applied to Video-communication Production Scripting Engine. In Advances in for Social Groups. In Proceedings of the 21st ACM Multimedia Modeling, volume 7131 of Lecture Notes in International Conference on Multimedia, MM’13, Computer Science, pages 682–692. Springer Berlin pages 333–342. ACM, 2013. Heidelberg, 2012. [7] N. Kock. The psychobiological model: Towards a new theory of computer-mediated communication based on darwinian evolution. Organization Science, 15(3):327–348, 2004. [8] P. Ljungstrand and S. Björk. Supporting group relationships in mediated domestic environments. In MindTrek ’08: Proceedings of the 12th international conference on Entertainment and media in the 4 http://icosole.eu/