A Multimedia Service with MPEG-7 Metadata and Context Semantics Yiwei Cao, Ralf Klamma, and Maziar Khodaei Lehrstuhl für Informatik 5 RWTH Aachen University Ahornstr. 55 52056 Aachen Germany {cao|klamma|khodaei}@dbis.rwth-aachen.de Abstract: With the emergent and rapid development of mobile technologies, more and more multimedia applications run on handheld devices in mobile networks. This raises new challenges to mobile multimedia information systems. In this paper three technical aspects are considered as tightly intertwining and crucial for developing mobile multimedia applications: context awareness, multimedia adaptation, and mobility. An approach to integrating multimedia metadata standards such as MPEG-7 and MPEG-21 into context-aware ontologies effectively and seamlessly is proposed and related algorithms are extended. A context-aware mobile multimedia application (CA3M) is implemented and evaluated as a proof of concept. The evaluation results of the prototypes show the soundness of the concept. Open issues and development potentials for further research in the area of context-aware mobile multimedia information system development are addressed. 1 Introduction Mobile applications have been developed with advancement and rapidly in recent years. Mobile devices like iPhones make a revolution of mobile application development. The conventional barriers of mobile application development like limited capacities have made great progress. It is quite normal to have an 8 GB, 16 GB hard disk or even larger. The display sizes have also been maximized within the device size available. All that innovative progress makes mobile multimedia applications more and more attractive and important for users. Nowadays mobile devices and applications are featured with new focuses such as integrity, multimodality, “one device fitting all” or “only carrying one gadget at a time” etc. Multimodality is realized both in information input and output channels. Google supports speech-based keyword search, while iPhone has three built-in sensors, the accelerometer, the proximity sensor, and the ambient light sensor, to detect movements even movement intentions of the mobile device, besides GPS and camera etc. Making a call is becoming the side function of cell phones, because cell phones are also MP3- players, video players, cameras, recorders, gaming-consoles, digital books, personal digital assistants to organize contacts and appointments, and have much other functionality. A recent report shows that the variety of functionality available on cell phones raises usage complexity as well. And many users still only stick to the phone call function of their cell phones 1 . Thus, the intuitiveness of the mobile user interfaces is still limited. Mobile multimedia information systems with enhanced context awareness are the potential approaches to reduce the complicated user device interaction. In addition, fault tolerance and speed are identified as the most critical aspects for general multimedia applications, because both audiovisual media and metadata including control information are processed simultaneously [24]. Hence, we consider the context uncertainty problems as an important aspect at mobile multimedia application development. Furthermore, three technical aspects are considered as tightly intertwining and crucial for developing mobile multimedia applications: context awareness, multimedia adaptation, and mobility. Uncertainty reasoning is based on context reasoning which is enabled both by metadata based multimedia adaptation and context- aware multimedia adaptation (cf. Figure 1). The rest of the paper is organized as follows. The state-of-the-art technologies are discussed in Section 2. An approach to integrating multimedia metadata standards such as MPEG-7 and MPEG-21 into context-aware ontologies effectively and seamlessly is proposed in Section 3. Section 4 provides an insight into implementation and evaluation of a context-aware mobile multimedia application (CA3M). CA3M delivers a service of context-aware multimedia search and multimedia adaptation which can be accessed from mobile devices. Section 5 concludes this paper with open issues and development potentials for further research in the area of context-aware mobile multimedia information system development. 2 Related work Much research work has been done in the area of context modeling, context-aware information systems, mobile computing, and multimedia systems, multimedia metadata standards, and ontologies. Research progresses in these areas contribute to the ubiquitous computing as well as pervasive computing paradigms. The context information might be uncertain which needs to be reduced via an appropriate context model. 1 Rhizomik Initiative, http://rhizomik.net. Figure 1: Conceptual approach 2.1 Context awareness, mobility and multimedia adaptation First of all, these three main concepts are reviewed from related literature. Any piece of information for interaction between user communities and applications can be observed as a context. Dey defines context as piece of information characterizing certain entity including person, location, or a physically assessable object [7]. Correspondingly, context-aware systems refer to any information system that deals with and makes use of context information. Representation of context information is an important part of research in pervasive computing and considered as a sub field of knowledge engineering, within knowledge management and artificial intelligence. Context-aware mobile systems can be applied in such fields as presenting context information to users, services adaptation to mobile users, as well as handing context information at real time [25]. Mobility assures mobile device users of reaching information according to certain context such as user communities, location, temporal and device capacities anywhere at any time [19]. Kakihara and Sorensen associated mobility with three dimensions which act as a whole: spatial, temporal, and context-dependent [11]. Among them, multimedia content is of high interest. Multimedia refers to content and data from more than one digital resource such as text, photographs, graphics, animation, audio, and video etc. [24]. Multimedia annotation, adaptation and different interactions are main aspects for multimedia applications. In order to deliver right information to right person at right time and right location, multimedia search results are adapted to users’ preferences and up-to- data context information in certain environments. 2.2 Context uncertainty It is difficult to avoid the problems with data uncertainty especially in context-aware systems. Inconsistency occurs between models and the real world, as well as among different local environment models. Four main factors are stated to lead to that inconsistency in [9]: • unknown when no information about the property is available; • ambiguous when several different reports about the property are available (for example, when two distinct location readings for a given person are supplied by separate positioning devices); • imprecise when the reported state is a correct yet inexact approximation of the true state (for example, when a person’s location is known to be within a limited region, but the position within this region cannot be pinpointed to the required (application-determined) degree of precision); or • erroneous when there is a mismatch between the actual and reported states of the property. Correspondingly, Quality of Context (QoC) or Quality of Information is often discussed in these cases and quality of context information depends greatly on the change of context sources, by which context is provided [14]. 2.3 Context modeling In order to deal with the context information as well as context uncertainty mentioned before, specific frameworks and data structures are needed to capture, manage, process and retrieve context information. In the field of knowledge engineering and artificial intelligence, context information must be at first collected and presented to the application in order to enable efficient context-aware adaptation. Therefore, a common representation format for the context information is required [17]. A well-defined context model is needed to define and store context data in machine readable forms in order to enhance interoperability. In [23] Strang and Linnhoff-Popien made an in-depth survey across several context-aware systems and compared the most relevant context modeling approaches including Key-Value, Markup Scheme, Graphical, OO, Logic- based and Ontology-based models. These approaches are based on different data structures and represent context information for machine processing and reasoning. 2.4 MPEG-7, MPEG-21 and RDF With regard to various multimedia metadata, MPEG-7 provides a large set of pre- defined elements to describe multimedia contents. In particular, these elements are composed of two different types: Description Schemes (DS) and Descriptors (D). Depending on the fields of applications, a specific description scheme can be defined by freely combinable descriptors (i.e. tags). Each descriptor itself refers to a specific feature or attribute of multimedia content [12, 15]. In addition, the Description Definition Language (DDL) of MPEG-7 makes this standard more powerful than other metadata standards, since it allows the creation of new descriptors and description schemes within the standard. Hence, it makes the vocabulary of MPEG-7 to be extensible by employing the XMLSchema. Together with MPEG-7, MPEG-21 provides a comprehensive framework for multimedia adaptation. In general, MPEG-21 consists of twelve parts [2, 12]. These parts are independent of each other. So the excerpts of the standard might also be applied. Alone for the purpose of managing multimedia contents in mobile end devices we confine the system to the MPEG-21 Digital Item Adaptation (DIA) and MPEG-21 Digital Item Declaration Language (DIDL). The MPEG-21 DIDL contains six top level descriptors (tags), of which the first four are particularly important for mobile data management. Figure 2: The ramm.x vocabulary [16] However, neither MPEG-7 nor MPEG-21 multimedia metadata standard defines ontology which can be used for context modeling. Several ontology-based MPEG binding specifications are surveyed in the related research fields. RDFa-deployed Multimedia Metadata (ramm.x) offers mapping from multimedia standards such as MPEG-7 to the Semantic Web whose documents are often described in RDF (Resource Description Framework). The ramm.x can be used for reuse of existing multimedia metadata, converting, validation and exchange of metadata via an easy-to- use vocabulary set (cf. Figure 2). Main requirements of ramm.x include enabling the resulting description to be consumable by a Semantic Web agent through being encoded in RDF; embedding reference to description in existing multimedia metadata format in (X)HTML; providing reference to services capable of mapping between a specific multimedia metadata format and RDF; and multiple descriptions available for one media asset (e.g. in different formats, covering different aspects), etc. Based on the ramm.x, an MPEG-7 ontology demonstrates the first successful practical realization [18] with MPEG-7 schema version of 2001. The MPEG-7 ontology specified by Rhizomik [18] shows its advantage in comparison to those approaches like MPEG-7 Ontology [10] Core Ontology for Multimedia (COMM) [4, 5] and aceMedia Visual Descriptor Ontology (aceMedia VDO) [1] (cf. Table 1). Supporting Framework Mapping Comments format MPEG-7 Ontology [10] OWL-full - - COMM [4] OWL-DL - - aceMedia [1] RDFS-DS - - Rhizomik MPEG-7 Ontology OWL-full XSD2OWL, Confidence value can be [18] XML2RDF set for uncertainty reasoning Table 1 A comparison of MPEG-7 ontologies 3 System design of a mobile context-aware multimedia service A service is designed and implemented to meet the following requirements. How can preferences of user communities, spatial and temporal context information contribute to the context-aware multimedia search? How can Semantic Web and multimedia metadata standards enhance multimedia search together? Figure 3: CA3M system architecture 3.1 Conceptual approach of CA3M The CA3M (Context-Aware Mobile Multimedia) service enables multimedia search based on keywords specified by users and users’ context. CA3M uses the Web Service technologies in order to provide easy access to mobile applications on the client side. On the server side, a set of services provide functionality such as context acquirement, context reasoning, and context querying. On the client side, mobile user interfaces can process users’ requests and retrieve demanding multimedia results. For rapid prototyping we employ our previous research result, the Lightweight Application Server (LAS) [20] as the basic framework. LAS provides HTTP and SOAP connectors, which makes the implementation of client-server communication easier. In addition, the LAS components provide functionality such as the connector to a database. Ontologies using OWL, RDF and a media repository with metadata are maintained by a native XML database, e.g. eXist [8]. Moreover, the main benefit of LAS is that the services on it can be flexibly extended for specific applications by using the LAS Java APIs (cf. Figure 3). Thus, CA3M is designed and implemented as LAS services within the LAS framework. Figure 4 illustrates the information flow in CA3M. Besides Context reasoning and multimedia search functionality, rating and clustering are provided. As the first step, user communities are grouped into different clusters according to their own interests or preferences. Users’ rating to multimedia search results is carried out based on different user clusters and individual users. The rating mechanism by user communities can reduce context uncertainty via users’ feedback to certain multimedia search results. 3.2 The MPEG-7 to RDF converter The binding of MPEG-7 and Semantic Web encoded in RDF is one of the basic components beside the CA3M service. The contribution of this research work is mainly in two aspects. One is the adapted converting service which enables MPEG-7 and MPEG-21 multimedia standards to be understandable in order to realize the Semantic Web. This is an extension of the existing research by the Rhizomik Initiative1. The converter consists of transformation from RDF to HTML, XML Schema to OWL, and XML to RDF. An ontology schema based on a mapping of MPEG-7 schema to RDF is additionally specified. The other contribution is that CA3M supports both XQuery (XML Query Language) and SPARQL (Simple Protocol and RDF Query Language). SPARQL is a tree-structured query language for RDF documents. The semantics of SPARQL is similar to SQL, while the query processing mechanisms of both are different [21, 22]. Figure 4: CA3M information flow 4 Implementation and evaluation MPEG-7 Schema is mapped to ontology before MPEG-7 documents are transformed into RDF. Our converter is adapted and realized for the latest MPEG-7 Schema of 2004. For the context-aware query processing, two steps are carried out sequentially. Firstly, all queries related to context information are expressed and evaluated in XQuery. The to- be-searched multimedia metadata at this phase is in MPEG-21 DIDL format. Second, SPARQL queries are prepared for executing on RDF documents, after the converter has processed the metadata documents. The MPEG-7 to RDF converter works within two packages, XSD2OWL and XML2RDF. XSD2OWL facilitates the transformation of MPEG-7 Schema into MPEG-7 Ontology, while XML2RDF transforms XML based MPEG-7 documents into RDF formats. A LAS service called MPEG7ToRDF Service has been implemented. A set of evaluation work is done on the CA3M service prototype. In our project Virtual Campfire [6], there are a great amount of multimedia files with metadata stored on streaming servers, FTP servers, HTTP servers, and several multimedia or XML databases in a distributed computing environment. Over 300 main multimedia metadata files are converted from MPEG-7 into RDF. Then SPARQL queries are executed to obtain context-aware multimedia search results across the 300 MPEG-7 documents and their related MPEG-7 documents which e.g. save information about location using MPEG-7 SemanticPlaceType. The first evaluation result shows that more efforts could be made to optimize the service execution, because around 30 seconds are needed to handle the SPARQL queries at the current development stage. 5 Conclusions and outlook Context awareness, mobility and multimedia adaptation are discussed as three main factors to enable context-aware multimedia adaptation and search on mobile platforms. This work is based on two accomplished prototypes of multimedia information systems with metadata standards based multimedia adaptation and context modeling and reasoning in the previous work [13, 3]. To make good use of and to advance the existing tools, a conceptual approach is proposed to enhance multimedia adaptation and search combining comprehensive multimedia standards such as MPEG-7/21 and context modeling and context awareness. Both technologies intertwine and are able to support context reasoning in a better way. Furthermore, they make it possible and promising to shape and perform measurements in order to reduce context uncertainty. In our ongoing research, we will focus on mobility issues and context uncertain problems related to multimedia and context awareness. Within the UMIC (Ultra High- speed Mobile Information and Communication) research cluster of the German Excellence Initiative, a lot of research work for mobile context-aware multimedia services is challenging. P2P data management with regard to MPEG-7/21 metadata standards will be addressed. Data uncertainty and context uncertainty problems need to be in-depth identified, analyzed and handled. Mobile (Web) services will be developed to bridge the mobile social software on the higher application layer and the mobile wireless network technologies on the lower network layer. The performance of these services should also be improved. Acknowledgment This work has been supported by the UMIC Research Centre, RWTH Aachen University. We would like to thank our colleagues for the fruitful discussions. Reference [1] Ace Media Project, http://www.acemedia.org/aceMedia, {December 2008}. [2] Burnett, I.; Van de Walle, R.; Hill, K.; Bormans, J.; Pereira, F.: MPEG-21: goals and achievements. IEEE Multimedia, 10(4): 60–70, 2003. [3] Cao, Y.; Klamma, R.; Hou, M.; Jarke, M.: Follow Me, Follow You - Spatiotemporal Community Context Modeling and Adaptation for Mobile Information Systems, In: Proc. of the 9th International Conference on Mobile Data Management, April 27-30, 2008, Beijing, China, pp. 108-115. [4] COMM: Core Ontology for Multimedia, http://comm.semanticweb.org/, {December, 2008}. [5] Bloehdorn, S.; Petridis, K.; Saathoff, C.; Simou, N.; Tzouvaras, V.; Avrithis, Y.; Handschuh, S.; Kompatsiaris, Y.; Staab, S.; Strintzis, M. G.: Semantic Annotation of Images and Videos for Multimedia Analysis. In: Proceedings of the Second European Semantic Web Conference (ESWC 2005), Springer, 2005, pp. 592-607. [6] Cao, Y.; Spaniol, M.; Klamma, R.; Renzel, D.: Virtual Campfire - A Mobile Social Software for Cross-Media Communities, K. Tochtermann, H. Maurer, F. Kappe, A. Scharl (Eds.): Proceedings of I-Media'07, International Conference on New Media Technology and Semantic Systems, Graz, Austria, September 5 - 7, 2007, J.UCS (Journal of Universal Computer Science) Proceedings, 2007, pp. 192-195. [7] Dey, A. K.; Abowd, G. D.: Towards a Better Understanding of Context and Context- Awareness. In: HUC '99: Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing Bd. 1707. London, UK: Springer, 1999; pp. 304-307. [8] eXist, Open Source Native XML Database, http://exist.sourceforge.net/, {December 2008}. [9] Henricksen, K.; Indulska, J.: Modelling and Using Imperfect Context Information, In: PERCOMW 04: Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops. Washington, DC, USA: IEEE Computer Society, 2004, pp. 33-37. [10] Hunter, J.: Adding Multimedia to the Semantic Web - Building an MPEG-7 Ontology. In: Proceedings of the First Semantic Web Working Symposium (SWWS), Stanford, USA, 2001, pp. 261-281. [11] Kakihara, M.; Sorensen, C.: Mobility: an extended perspective. In: Proceedings of the Hawaii International Conference on System Sciences, New York, NY, USA: IEEE Computer Society, January 2002; pp. 1756-1766. [12] Kosch, H.: Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21, CRC Press, 2003. [13] Klamma, R.; Spaniol, M.; Cao, Y.: Community Aware Content Adaptation for Mobile Technology Enhanced Learning, In: W. Nejdl, K. Tochtermann (Eds.): Innovative Approaches to Learning and Knowledge Sharing, Proceedings of the 1st European Conference on Technology Enhanced Learning (EC-TEL 2006), Hersonissou, Greece, October 1-3, LNCS 4227, Springer-Verlag, 2006, pp. 227-241. [14] Lei, H.; Sow, D. M.; Davis, J. S.; Banavar, G.; Ebling, M. R.: The design and applications of a context service. In: SIGMOBILE Mobile Computing Communication, Rev. 6 (2002), No. 4, pp. 45-55. [15] Martinez, J. M.; Gonzalez, C.; Fernandez, O.; Garcia, C.; de Ramon, J.: Towards universal access to content using MPEG-7, In: Proceedings of the 10th ACM International Conference on Multimedia, ACM Press, 2002, pp. 199–202. [16] ramm.x: RDFa-deployed Multimedia Metadata. http://sw.joanneum.at/rammx/, {December 2008}. [17] Rothermel, K.; Bauer, M.; Becker, C.: Sonderforschungsbereich 627: Nexus - Umgebungsmodelle für mobile kontextbezogene Systeme. it - Information Technology 45(5): 293-; 2003. [18] Roberto Garcia Gonzalez @ Rhizomik. http://rhizomik.net/~roberto/, {December 2008}. [19] Roy, N. L. S.; Scheepers, H.; Kendall, E.; Saliba, A.: A comprehensive model incorporating mobile context to design for mobile use. In: Proceedings of the 5th Conference on Human Computer Interaction in Southern Africa, January, 2006; pp. 22-30. [20] Spaniol, M.; Klamma, R.; Janßen, H.; Renzel, D.: LAS: A Lightweight Application Server for MPEG-7 Services in Community Engines, In: Proceedings of I-KNOW ’06, 6th International Conference on Knowledge Management, Graz, Austria, September 6 - 8, ser. J.UCS (Journal of Universal Computer Science) Proceedings, K. Tochtermann and H. Maurer, Eds. Springer-Verlag, 2006, pp. 592–599. [21] SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/, {December 2008}. [22] SPARQL Tutorial. http://jena.sourceforge.net/ARQ/Tutorial/, {December 2008}. [23] Strang T.; Linnhoff-Popien, C.: A Context Modeling Survey, In: First International Workshop on Advanced Context Modelling, Reasoning and Management at UbiComp, Nottingham, UK, September 2004. [24] Steinmetz, R.; Nahrstedt, K.: Multimedia Systems, Springer-Verlag, 2004. [25] Zhang, D.; Gu, T.; Pung, H.: A Middleware for Building Context-Aware Mobile Services. In: Proceedings of IEEE Vehicular Technology Conference, 2004.