=Paper= {{Paper |id=Vol-233/paper-27 |storemode=property |title=SALERO - Semantic Audiovisual Entertainment Reusable Objects |pdfUrl=https://ceur-ws.org/Vol-233/p55.pdf |volume=Vol-233 |dblpUrl=https://dblp.org/rec/conf/samt/HaasTCCB06 }} ==SALERO - Semantic Audiovisual Entertainment Reusable Objects== https://ceur-ws.org/Vol-233/p55.pdf
                          SALERO -
          Semantic Audiovisual Entertainment Reusable
                           Objects
                         Werner Haas, Georg Thallinger, Pedro Cano, Charlie Cullen and Tobias Bürger



   Abstract— The Integrated Project SALERO aims to advance                                   II. INTELLIGENT CONTENT CREATION
the state of the art in digital media to the point where it becomes
                                                                                    The first goal is to obtain a better understanding of the
possible to create audiovisual content for cross-platform delivery
using intelligent content tools, with greater quality at lower cost,             relations between media types, genres, workflows and styles
to provide audiences with more engaging entertainment and                        as a pre-requisite to the adaptation and transfer of content
information at home or on the move. SALERO will build on and                     elements across productions and platforms. To this end,
extend research in media technologies, web semantics and                         metadata, media semantics and ontologies need to be
context based image retrieval, to reverse the trend toward ever-                 analysed, researched and developed that define the parameters
increasing cost of creating media.
                                                                                 necessary for the creation and manipulation of semantically
  Index Terms— audiovisual intelligent objects, content creation,
                                                                                 aware media objects of various types. Practical methods of
context aware behaviour                                                          context-based information retrieval will be researched that
                                                                                 simplify the location and retrieval of characters, sounds,
                                                                                 images, movements or behaviours from very large datasets
                       I. VISION & OBJECTIVES                                    and media storage systems. Improved methods and tools for
   SALERO’s [1] overall vision is to define and develop                          language processing and speech synthesis, as a means of
‘intelligent content’ for media production, consisting of                        supporting the generation of multilingual media content, need
multimedia objects with context-aware behaviour for self-                        to be developed.
adaptive use and delivery across different platforms.                              A. Media Semantics and Ontologies
‘Intelligent Content’ should enable the creation and re-use of                      The objective of this research strand is twofold: the main
complex, compelling media by artists who need to know little                     objective is to devise a machine process able description for
of the technical aspects of how the tools that they use actually                 the semantic features of a multimedia object and the context it
work.                                                                            should be used in. This will be tackled by building up a set of
                                                                                 ontologies taking into account a layered approach – using an
x    Complete realisation of SALERO’s vision is a long-term                      appropriate representation technique for every level of meta-
     goal. This gives rise to three overarching R&D                              information – that relies as much as possible on current
     objectives:                                                                 description standards for multimedia. The second objective is
x    Address characters, objects, sounds, language sets and                      to design and implement necessary tools and applications to
     behaviours,                                                                 build up, maintain and query ontologies for multimedia
x    Research into methodologies for creating and finding                        objects.
     intelligent content,                                                          B. Media Forms, Programme Styles & Structures
x    Develop toolsets to create, manage, edit, retrieve and                         Media objects have different specification needs on
     deliver content objects.                                                    different platforms from game consoles and online services to
                                                                                 DVD, television and cinema, related to the overall expressive
                                                                                 and stylistic objectives of the production. Audience
   Manuscript received August 31, 2006. The R&D work carried out for the
IP SALERO is partially funded under FP 6 of the European Commission              expectations are related to the production genre, be it a
within the IST Workprogramme 2004 (IST FP6-2004-027122).                         western, soap opera, comedy, tragedy, thriller, action-
   W. Haas, G. Thallinger, are with JOANNEUM RESEARCH, Graz, Austria             adventurer, a medieval sword & magic MMPORG (Massively
(phone: +43 316 876 1119; e-mail: firstname.lastname@joanneum.at).
   P. Cano is with Universitat Pompeu Fabra, Barcelona, Spain (e-mail:
                                                                                 Multi-Player Online Role-Playing Game). The genre
pcano@iua.upf.edu).                                                              expectations (whether the engagers are passive watchers of
   Ch. Cullen is with Dublin Institute of Technology, Dublin, Ireland (e-mail:   Jerry Seinfeld on television, or active console game players
charlie.cullen@dit.ie).
                                                                                 represented on the screen by Lara Croft) are elegantly
   T. Bürger is with University of Innsbruck, Innsbruck, Austria (e-mail:
tobias.buerger@deri.org).                                                        expressed by the active questions of Philip Parker [2].
  C. Context Based Information Retrieval                                    III. TOOLSETS, DEMONSTRATION & TRAINING
   We expect that the intelligent content elements developed          Software toolkits, software systems, plug-ins and interfaces
by SALERO will adapt themselves to the context of the              will be developed that allow the control of appearances,
production. We will therefore need to research ways of             sounds, semantic behaviour and properties of intelligent
defining, creating (or locating), managing and delivering          content objects for media production and post-production, and
content objects of different kinds in a range of contexts. We      can be used in conjunction with existing industry programs.
address the context-based retrieval of media objects within a      They will be validated and evaluated through a series of
media production environment. The idea is to re-use objects,       experimental productions, based on scenarios defined by
motion data and other production related data for the creation     artists and creative media professionals.
of new production materials. It is not only about finding and         Results will be promoted by a broad initiative, developing
re-using elements at production time, but also using retrieval     demonstration test beds and training structures for
technologies for the creation of interactive media productions.    professionals and researchers, as well as by addressing the
That means for example that a character may react or adapt to      relevant standardisation bodies.
a scene in a way that is based on the users input. A number of
factors affect the use of retrieval techniques within media
production environments. The most important one is that they                               IV. RELATED WORK
should be integrated into the production environment:                 A number of research groups are dealing with ontology
retrieval should happen as part of the programme development       based description of multimedia items often by applying
and not as a cumbersome, extra activity. We also need              reasoning to low level features extracted, e.g. [5], [7]. Use of
intelligent context sensitive retrieval mechanisms that identify   ontology languages for media annotation has been
both user context and task context.                                investigated in [8], [6] for video and in [9] for audio.
  D. Speech and Language                                           Deployment of semantic technologies in media production in
   The aim of this activity is to enable programmes created in     tools used every day by the media professional has been rarely
one language to be re-purposed and/or synthesised in another       investigated.
language or dialect by researching and developing a ‘speech           The SMaRT networking cluster [10] (which SALERO is
corpus/concordancer’. A speech corpus[3], tagged for various       member of) combines research in the fields of: Semantic Web,
features such as rhythm, pitch contours, intensity contours and    Multimedia and Signal Analysis to address emerging research
emotional dimensions [4] will be used to inform lip-synching,      challenges in Semantic Multimedia.
character animation and TTS synthesis stages by establishing
emotional rules - initially for English - with which to
potentially repurpose ‘neutral’ or ‘nearest match’ speech                                        REFERENCES
segments in the database for the other language or dialect.        [1]  SALERO web page, http://www.SALERO.eu
                                                                   [2]  P. Parker, “The art and science of screenwriting”, Intellect, 1998.
Once the rules have been established for English, they will be     [3]  D.F. Campbell, M. Meinardi, and B. Richardson, “Let the corpus
adapted for Spanish or Catalan.                                         speak!”, 40th IATEFL Annual Conference and Exhibition, 9 – 12 April
   This requires a framework for defining voice stereotypes             2006, Harrogate, UK.
                                                                   [4] K.R. Scherer, Emotion as a multi-component process: A model and some
for age, genre, emotional dimension etc and a suitable tagging          cross cultural data. Review of Personality and Social Psychology, 1984.
system for corpus transcripts-initially for Catalan, Spanish and        5: p. 37-63.
English. In the case of English, tagging for speech rhythms        [5] S. Dasiopoulou, V. Mezaris, I. Kompatsiaris, V.K. Papastathis, and M.G.
                                                                        Strintzis: "Knowledge-assisted semantic video object detection", IEEE
and other acoustic features within the recorded speech clips is
                                                                        Transactions on Circuits and Systems on Video Technology, Vol. 15,
seen to play a crucial role in developing a more natural                No. 10, pp. 1210 - 1224, (2005).
corpus. The tagging of the resultant speech corpus will be         [6] J. Heflin, “OWL Web Ontology Language: Use cases and
applied to rule-based analysis, synthesis, lip-synching and             requirements”, W3C Recommendation, http://www.w3.org/TR/webont-
                                                                        req/, (2004).
character animation.                                               [7] V. Mezaris, Y. Kompatsiaris, N. Boulgouris and M. Strintzis, “Real-time
                                                                        compressed domain spatiotemporal segmentation and ontologies for
  E. Characters, Characteristics & Effects                              video indexing and retrieval”, IEEE Transactions on Circuits and
   The research in this activity deals with both visual objects         Systems on Video Technology, Vol. 14, No. 5, pp. 606 – 620, (2004).
                                                                   [8] J.R. van Ossenbruggen; F.-M. Nack; and L. Hardman; “That obscure
(such as characters) and audio objects (such as effects,                object of desire: multimedia metadata on the web (Part II)”, IEEE
speech). It provides grounding work for the linking of visual,          Multimedia, Vol. 12, No. 1, pp. 54 - 63, (2005).
audio, and behavioural objects, whose initial intelligence is      [9] P. Cano, M. Koppenberger, S. Le Groux, J. Ricard, N. Wack, and P.
expected to be increased along the lifetime of the project. It          Herrera, 2005. “Nearest-neighbour automatic sound classification with a
                                                                        wordnet taxonomy”. Journal of Intelligent Information Systems Vol.24
develops along different levels, from the low level provision           .2 pp. 99-111 (2005).
of basic affordable rendering engines for media; through           [10] http://kspace.qmul.net:8080/kspace/kspacesmartcluster.jsp
intermediate level, such as the modelling and animation of
characters; to high level aspects, e.g. programme generators.