SALERO - Semantic Audiovisual Entertainment Reusable Objects Werner Haas, Georg Thallinger, Pedro Cano, Charlie Cullen and Tobias Bürger  Abstract— The Integrated Project SALERO aims to advance II. INTELLIGENT CONTENT CREATION the state of the art in digital media to the point where it becomes The first goal is to obtain a better understanding of the possible to create audiovisual content for cross-platform delivery using intelligent content tools, with greater quality at lower cost, relations between media types, genres, workflows and styles to provide audiences with more engaging entertainment and as a pre-requisite to the adaptation and transfer of content information at home or on the move. SALERO will build on and elements across productions and platforms. To this end, extend research in media technologies, web semantics and metadata, media semantics and ontologies need to be context based image retrieval, to reverse the trend toward ever- analysed, researched and developed that define the parameters increasing cost of creating media. necessary for the creation and manipulation of semantically Index Terms— audiovisual intelligent objects, content creation, aware media objects of various types. Practical methods of context aware behaviour context-based information retrieval will be researched that simplify the location and retrieval of characters, sounds, images, movements or behaviours from very large datasets I. VISION & OBJECTIVES and media storage systems. Improved methods and tools for SALERO’s [1] overall vision is to define and develop language processing and speech synthesis, as a means of ‘intelligent content’ for media production, consisting of supporting the generation of multilingual media content, need multimedia objects with context-aware behaviour for self- to be developed. adaptive use and delivery across different platforms. A. Media Semantics and Ontologies ‘Intelligent Content’ should enable the creation and re-use of The objective of this research strand is twofold: the main complex, compelling media by artists who need to know little objective is to devise a machine process able description for of the technical aspects of how the tools that they use actually the semantic features of a multimedia object and the context it work. should be used in. This will be tackled by building up a set of ontologies taking into account a layered approach – using an x Complete realisation of SALERO’s vision is a long-term appropriate representation technique for every level of meta- goal. This gives rise to three overarching R&D information – that relies as much as possible on current objectives: description standards for multimedia. The second objective is x Address characters, objects, sounds, language sets and to design and implement necessary tools and applications to behaviours, build up, maintain and query ontologies for multimedia x Research into methodologies for creating and finding objects. intelligent content, B. Media Forms, Programme Styles & Structures x Develop toolsets to create, manage, edit, retrieve and Media objects have different specification needs on deliver content objects. different platforms from game consoles and online services to DVD, television and cinema, related to the overall expressive and stylistic objectives of the production. Audience Manuscript received August 31, 2006. The R&D work carried out for the IP SALERO is partially funded under FP 6 of the European Commission expectations are related to the production genre, be it a within the IST Workprogramme 2004 (IST FP6-2004-027122). western, soap opera, comedy, tragedy, thriller, action- W. Haas, G. Thallinger, are with JOANNEUM RESEARCH, Graz, Austria adventurer, a medieval sword & magic MMPORG (Massively (phone: +43 316 876 1119; e-mail: firstname.lastname@joanneum.at). P. Cano is with Universitat Pompeu Fabra, Barcelona, Spain (e-mail: Multi-Player Online Role-Playing Game). The genre pcano@iua.upf.edu). expectations (whether the engagers are passive watchers of Ch. Cullen is with Dublin Institute of Technology, Dublin, Ireland (e-mail: Jerry Seinfeld on television, or active console game players charlie.cullen@dit.ie). represented on the screen by Lara Croft) are elegantly T. Bürger is with University of Innsbruck, Innsbruck, Austria (e-mail: tobias.buerger@deri.org). expressed by the active questions of Philip Parker [2]. C. Context Based Information Retrieval III. TOOLSETS, DEMONSTRATION & TRAINING We expect that the intelligent content elements developed Software toolkits, software systems, plug-ins and interfaces by SALERO will adapt themselves to the context of the will be developed that allow the control of appearances, production. We will therefore need to research ways of sounds, semantic behaviour and properties of intelligent defining, creating (or locating), managing and delivering content objects for media production and post-production, and content objects of different kinds in a range of contexts. We can be used in conjunction with existing industry programs. address the context-based retrieval of media objects within a They will be validated and evaluated through a series of media production environment. The idea is to re-use objects, experimental productions, based on scenarios defined by motion data and other production related data for the creation artists and creative media professionals. of new production materials. It is not only about finding and Results will be promoted by a broad initiative, developing re-using elements at production time, but also using retrieval demonstration test beds and training structures for technologies for the creation of interactive media productions. professionals and researchers, as well as by addressing the That means for example that a character may react or adapt to relevant standardisation bodies. a scene in a way that is based on the users input. A number of factors affect the use of retrieval techniques within media production environments. The most important one is that they IV. RELATED WORK should be integrated into the production environment: A number of research groups are dealing with ontology retrieval should happen as part of the programme development based description of multimedia items often by applying and not as a cumbersome, extra activity. We also need reasoning to low level features extracted, e.g. [5], [7]. Use of intelligent context sensitive retrieval mechanisms that identify ontology languages for media annotation has been both user context and task context. investigated in [8], [6] for video and in [9] for audio. D. Speech and Language Deployment of semantic technologies in media production in The aim of this activity is to enable programmes created in tools used every day by the media professional has been rarely one language to be re-purposed and/or synthesised in another investigated. language or dialect by researching and developing a ‘speech The SMaRT networking cluster [10] (which SALERO is corpus/concordancer’. A speech corpus[3], tagged for various member of) combines research in the fields of: Semantic Web, features such as rhythm, pitch contours, intensity contours and Multimedia and Signal Analysis to address emerging research emotional dimensions [4] will be used to inform lip-synching, challenges in Semantic Multimedia. character animation and TTS synthesis stages by establishing emotional rules - initially for English - with which to potentially repurpose ‘neutral’ or ‘nearest match’ speech REFERENCES segments in the database for the other language or dialect. [1] SALERO web page, http://www.SALERO.eu [2] P. Parker, “The art and science of screenwriting”, Intellect, 1998. Once the rules have been established for English, they will be [3] D.F. Campbell, M. Meinardi, and B. Richardson, “Let the corpus adapted for Spanish or Catalan. speak!”, 40th IATEFL Annual Conference and Exhibition, 9 – 12 April This requires a framework for defining voice stereotypes 2006, Harrogate, UK. [4] K.R. Scherer, Emotion as a multi-component process: A model and some for age, genre, emotional dimension etc and a suitable tagging cross cultural data. Review of Personality and Social Psychology, 1984. system for corpus transcripts-initially for Catalan, Spanish and 5: p. 37-63. English. In the case of English, tagging for speech rhythms [5] S. Dasiopoulou, V. Mezaris, I. Kompatsiaris, V.K. Papastathis, and M.G. Strintzis: "Knowledge-assisted semantic video object detection", IEEE and other acoustic features within the recorded speech clips is Transactions on Circuits and Systems on Video Technology, Vol. 15, seen to play a crucial role in developing a more natural No. 10, pp. 1210 - 1224, (2005). corpus. The tagging of the resultant speech corpus will be [6] J. Heflin, “OWL Web Ontology Language: Use cases and applied to rule-based analysis, synthesis, lip-synching and requirements”, W3C Recommendation, http://www.w3.org/TR/webont- req/, (2004). character animation. [7] V. Mezaris, Y. Kompatsiaris, N. Boulgouris and M. Strintzis, “Real-time compressed domain spatiotemporal segmentation and ontologies for E. Characters, Characteristics & Effects video indexing and retrieval”, IEEE Transactions on Circuits and The research in this activity deals with both visual objects Systems on Video Technology, Vol. 14, No. 5, pp. 606 – 620, (2004). [8] J.R. van Ossenbruggen; F.-M. Nack; and L. Hardman; “That obscure (such as characters) and audio objects (such as effects, object of desire: multimedia metadata on the web (Part II)”, IEEE speech). It provides grounding work for the linking of visual, Multimedia, Vol. 12, No. 1, pp. 54 - 63, (2005). audio, and behavioural objects, whose initial intelligence is [9] P. Cano, M. Koppenberger, S. Le Groux, J. Ricard, N. Wack, and P. expected to be increased along the lifetime of the project. It Herrera, 2005. “Nearest-neighbour automatic sound classification with a wordnet taxonomy”. Journal of Intelligent Information Systems Vol.24 develops along different levels, from the low level provision .2 pp. 99-111 (2005). of basic affordable rendering engines for media; through [10] http://kspace.qmul.net:8080/kspace/kspacesmartcluster.jsp intermediate level, such as the modelling and animation of characters; to high level aspects, e.g. programme generators.