PHAROS – Platform For Search of Audiovisual Resources Across Online Spaces Stefan Debald, Wolfgang Nejdl, Francesco Saverio Nucci, Raluca Paiu and Michel Plu  and organizations to unlock the values found in audiovisual Abstract— As an old lighthouse (PHAROS in ancient Greek) content, a platform that will take user and search requirements provided a proper navigation tool for people to find their route as key design principles and which will be deeply integrated when lost at sea, in the same way PHAROS will ensure the right with user and context technologies. The developed technology way to navigate in the modern enormous information ocean. will sustain itself in the future by enabling new players to Index Terms— audiovisual search engines, federation, build on top of the platform. To achieve this ambitious task, information retrieval. PHAROS mobilizes 13 strong technological players, research I. MOTIVATION institutions and user groups, all sharing common goals. The range of scientific innovations will be tied together and T HE AMOUNT of data available on the web, in organizations and enterprises is multiplying and data is increasingly becoming audiovisual. Search has become the default way of consolidated with existing research through intelligent content publication and subscription mechanisms, scalable, flexible and open frameworks for content processing and search that interacting with data and by 2008, 50% of applications are support the full range of content types, enabling them to be predicted to include a search facility as a primary interface for deployed through the PHAROS platform. To guide this end users. The ever-increasing data complexity leads to the process, PHAROS has defined 5 objectives that provide the necessity of a coherent approach to the growing variety of backbone for the organization of the work throughout the audiovisual formats, standards and tools. Users find project. themselves overwhelmed by the multitude of new audiovisual Ob1. (Core Technologies) – Develop a scalable search search tools, while businesses are at loss for stable direction. framework which lets users search, explore, discover and Digital data is the greatest value that many organizations analyze contextually-relevant data which can be audio- possess and the ability to use it, rather than just store it, will visual, structured or unstructured in origin. A scalable be one of the most important aspects of strategy in the coming content refinement framework will be developed bringing decade. Access to digital data is the front window and the together multilingual transcription, contextual metadata operational backbone for most organizations. The growth of extraction and content-based audio-visual analysis to add data volume is rapidly shifting to audiovisual content, yet the semantic meaning to audio-visual, structured data in a way technologies that allow processing and retrieval of this content that prepares it for information retrieval. are either mainly experimental, or only vaguely capable of Ob2. (Context and Users Technologies) – Analyze, design handling true queries and content. and develop the context and user technologies taking into Audiovisual search is therefore one of the major challenges account personalization, trust and adaptability. This will for organizations and businesses today, and search-based allow a social audio-visual interaction model to be technologies which can provide contextually relevant, integrated into the search engine, rather than using a integrated and scalable access to distributed and traditional non-participatory information access model. heterogeneous collections of information is essential. PHAROS will create user interaction models where live user traffic continually improves the user experience via core II. PROJECT OBJECTIVES AND MISSION primitives such as social network analysis and TrustRank. The PHAROS Integrated Project (IP), to be funded from Ob3. (PHAROS Platform) – Ensure the interoperability of 2007-2009, aims at developing an innovative audiovisual the Core Technologies and Context and User Technologies technology platform, which will enable consumers, businesses (Ob1 and Ob2 above) in a Service-Oriented Architecture- based application enablement environment that will enable Stephan Debald is with Fast Search & Transfer ASA (FAST), N-0120, Oslo, Norway (e-mail: Stefan.Debald@fast.no) effective deployment of diverse information access Wolfgang Nejdl and Raluca Paiu are with L3S Research Center, Hannover solutions incorporating audio-visual content sources. - 30539, Germany (e-mail: {nejdl, paiu}@l3s.de) Ob4. (Showcase) – An innovative showcase, built on the Francesco Saverio Nucci is with Engineering SpA - R&D Lab, Rome – 00158, Italy (e-mail: francesco.nucci@eng.it) PHAROS Platform, will be used to gather user feedback and Michel Plu is with France Télécom R&D, 22307 Lannion Cedex, France validate the PHAROS approach with specific attention to the (e-mail : michel.plu@orange-ft.com) innovation in PHAROS. Ob5. (Federation and Impact) – Define a suitable sustainability model with an open, federating and aggregating approach, guaranteeing the replicability of PHAROS results in a multi-industry scenario. III. PROGRESS OVER STATE-OF-THE-ART To achieve these ambitious objectives, PHAROS will extend the state-of-the-art in the areas of core search technologies, as well as context and user technologies. Regarding core search technologies, both XML search and content-based search are relevant. Previous work has addressed representation and semantic interoperability [6], as well as XML retrieval [4], [8]. Content based retrieval uses features of multimedia objects to facilitate their retrieval. [2], [10] focused exactly on this topic. However, emerging types FIG. 1 Structure of PHAROS Implementation Plan of search patterns require both XML and content-based search to be integrated and made mature enough for industrial The research and technological development within exploitation. PHAROS will extend the state-of-the-art in this PHAROS can be described in terms of the work done in area by developing a scalable search platform with advanced Stream 1 (Core Technologies), Stream 2 (User and Context query brokering to orchestrate audiovisual information access Technologies) and Stream 3 (PHAROS Platform). These combining pluggable content-based matching engines and streams provide the foundation on which Stream 4 (Showcase) schema agnostic XML based search kernels. is built. Stream 5 (Management and Federation) ensures the Context and user technologies have been tackled from smooth running of the project and oversees its impact in the various points of view: social media [9], [7], spam detection outside scientific and business world. and ranking [3], [5] as well as security, trust and privacy [1]. PHAROS will address all these aspects and more specifically will focus on exploiting user actions and interactions in personal and public spaces to provide advanced and semantically rich recommendations and personalized ranking REFERENCES algorithms. User and community profiles will enable extreme [1] Aichroth P., Puchta S., Hasselbach J. “Personalized Previews: An precision for search, and will exploit all kinds of user- Alternative Concept of Virtual Goods Marketing”, Virtual Goods generated metadata. Advanced spam detection algorithms, Conference, 2004 suitable for personalized ranking, will be also provided and [2] Aslam J., Montague M. “Models for Metasearch”. SIGIR, 2001 [3] Bharat K., Henzinger M. R. “Improved algorithms for topic distillation new lightweight forms of content protection will be in a hyperlinked environment”, SIGIR, 1998 investigated. [4] Carmel D., Maarek Y. S., Mandelbrod M., Mass Y., Soffer A. “Searching XML Documents via XML Fragments”. SIGIR 2003. [5] Carvalho A., Chirita P. A., Silva de Moura E., Calado P., Nejdl W. “Site IV. THE PHAROS APPROACH Level Noise Removal for Search Engines”. WWW, 2006 PHAROS distinguishes itself from other audiovisual search [6] Dong X., Halevy A. “Malleable Schemas”. WebDB, 2005. [7] Ghita S., Nejdl W., Paiu R. “Semantically Rich Recommendations in solutions by developing a platform which integrates content Social Networks for Sharing, Exchanging and Ranking Semantic refinement and content retrieval with user and context Context”. ISWC 2005 technologies. The platform is distributed in nature, providing [8] Kakade V., Raghavan P. “Encoding XML in Vector Spaces”. ECIR 2005. the much-needed flexibility that enables a wide variety of [9] Kumar R., Novak J., Raghavan P., Tomkins A. “On the bursty evolution applications to be built on top of it. The platform allows user of blogspace”. WWW, 2003 and context models as well as content refinement, retrieval [10] McDonald K., Smeaton A. “A Comparison of Score, Rank and and storage to be client or server-based, thus enabling Probability-based Fusion Methods for Video Shot Retrieval”. CIVR, 2005 businesses in several verticals to diversify and provide services in one or more of these areas instead of encouraging single players to monopolize the market. The project is based on a five-layer structure defined to map the general objectives into operational areas. The following diagram depicts the overall structure of the PHAROS implementation plan in terms of “streams” and references to the work-plan activity.