X-Media: Large Scale Knowledge Acquisition, Sharing and Reuse across-Media Fabio Ciravegna and Steffen Staab  which the complexity arises: Abstract— X-Media is an integrated Project funded by the xCross-Media: evidence is often distributed in different European Commission, which addresses the issue of knowledge media; it is possible that knowledge expressed in just one management in complex distributed environments. It will study, medium does not carry enough evidence. Connecting develop and implement large scale methodologies and techniques information in more than one medium is often required. for knowledge management able to support sharing and reuse of knowledge that is distributed across different media (images, xKnowledge integration: large distributed archives require documents and data) and repositories (data bases, knowledge the ability to map the distribution of information, to weight bases, document repositories, etc.). The project started in March every single source and to distribute searches carefully; this 2006 and will last for 4 years. is very difficult and often search is performed just in some of the archives, disregarding others that can bring very useful Index Terms—Cross Media Knowledge Acquisition, Cross- information; media Knowledge Sharing, Architectures for Knowledge xFocusing: large amount of information implies that Management managing knowledge becomes more complex and needs powerful focusing methodologies. Focus of searching I. INTRODUCTION changes in time and from user to user, and requires a balanced mixture of exploration and searching; W hile in the past, medium size, mainly textual, centralized archives used to be the only resources for knowledge management, nowadays large companies handle very xUncertainty and Dynamicity: information is often ambiguous, incomplete, or referring to a specific context - large quantities of multimedia information in distributed therefore archives can contain noise and imprecision, as well archives. Their intranets connect thousands of computers and as obsolete information; each piece of knowledge must reach sizes of dozens of millions of documents. In addition, the therefore be judged based on provenance, evidence, etc. increased use of the WWW as a source of information has xInfrastructure: different media cannot easily be shared. A made the boundary between intra- and inter-net very thin. This folder of text documents may be sent via email, but a folder dramatically increases the size of the information space. of images may not, and may instead require a shared image Moreover, databases and archives are used to store huge repository. For 10 GByte of data remote access to the amounts of information that is vital for the organization life, underlying data base is to be considered. such as data on products, financial information, etc. Collecting and aggregating multimedia knowledge is of fundamental Current knowledge management technologies and practises importance in order to gain competitiveness and to reduce cannot cope with such new situation, as they mainly provide costs. For example thousands of documents are produced simple mechanisms (e.g. keyword searching) for supporting during the design and manufacturing of a class of jet engines. knowledge workers manually pierce together the information During service, a single engine produces about 1Gbyte of from different sources. vibration data per flight; if irregularities are found, part of the data is stored. Every time an engine is serviced, financial II. X-MEDIA information is produced. If problems are found, pictures are X-Media addresses the issue of knowledge management in taken, reports are written. Each individual engine has a complex distributed environments. It studies, develops and potential “folder” of information describing the whole lifecycle implements large scale methodologies and techniques for of the engine that can easily sum up to several Gigabytes of knowledge management able to support sharing and reuse of information, potentially Terabytes, and contains highly knowledge that is distributed in different media (images, interrelated information stored in different media. The growing size and the multi-media nature of the archives documents and data) and repositories (data bases, knowledge have serious implication on the way knowledge management bases, document repositories, etc.). can be implemented. There are a number of dimensions along X-Media studies, designs and develops: 1) Robust and scalable knowledge acquisition and data F. Ciravegna is with the Department of Computer Science of the University analysis tools operating across media boundaries (text, of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK. (e- mail: f.ciravegna@dcs.shef.ac.uk) images and data) to automatically cross-relate and annotate S. Staab is with the Department of Computer Science, University Koblenz- text and images with metadata. Landau, 56016 Koblenz, Germany. (e-mail: staab@uni-koblenz.de) 2) Novel and cutting-edge knowledge fusion methods to Area 2: automated knowledge acquisition from documents, support knowledge workers in making decisions when images and raw data confronted with – possibly contradicting – knowledge Functional to the methodologies for knowledge sharing derived from different resources; investigated in Area 1, is the ability to acquire knowledge 3) Effective and efficient new paradigms for knowledge across media in a rich, semantically-oriented way. X-Media retrieval, sharing and reuse working across media which develops a set of tools able to support sharing methodologies enable users to define and parameterize views on the in a seamless and automatic way. Media addressed are raw available knowledge according to their needs. data, texts and images (e.g. results or parameters in 4) Techniques able to represent and manage (i) uncertainty, experiments, raw images, textual documents, etc.). The (ii) trust and provenance as well as (iii) dynamic aspects of outcome of the acquisition technologies will be a semantic knowledge; representation of the content (conceptualization) to be used for 5) A methodology and a technical infrastructure able to knowledge management purposes. Enrichment of multimedia deliver knowledge from across media to the knowledge documents with additional layers of automatically generated workers, taking into account the complexity of managing annotation is the main medium of associating different media with different size of data. conceptualizations to resources. Current technology focuses on 6) A generic and flexible architecture allowing end users to single medium technologies to acquire knowledge in multi easily customize it and integrate it with their KM practices media environments; this means that retrieval methods use or needs as well as a mainly open source reference mainly one medium (e.g. text) even in multimedia implementation and libraries which technology providing environments. X-Media designs and develops technologies for companies can reuse. information extraction that work truly cross media and that can be used in cases where information in one medium is necessary Technologies will be able to support knowledge workers in an to understand the information in the other. effective way, (i) hiding the complexity of the underlying search/retrieval process, (ii) resulting in a natural access to Area 3: Infrastructure knowledge, (iii) allowing interoperability between A knowledge acquisition, integration and sharing environment heterogeneous information resources and (iv) including is defined. Since X-Media is an application-oriented heterogeneity of data type (data, image, texts). The expected integrated project, integration is required on the impact on organizations is to dramatically improve access to, implementation as well as on the conceptual level. The main sharing of and use of information by humans and between outcome of this area of activity will be a methodology and a machines. Expected benefits are a dramatic reduction of technical infrastructure able to deliver knowledge from across management costs and increasing feasibility of complex media to the knowledge workers, taking into account the knowledge management tasks. The project plan is structured complexity of managing media with different size of data. along the four areas described below. Area 4: Application and Testing Area 1: knowledge sharing and reuse The technology above is used to define showcases and X-Media studies and implements technologies and prototype applications. Two main testbeds are defined by the methodologies for easy and intelligent access to and reuse of two industrial users (Rolls Royce and Fiat). They concern formalized and non formalized knowledge. The reuse takes competitor analysis in the car industry and product lifecycle into consideration the user context to help focus searches and monitoring in aerospace. System trials with final users will reuse. Reuse and sharing is enabled via cross-media ontology showcase the technology and pave the way to further supported automatic indexing. The technology works in a exploitation. largely automated way, but it is centered on supporting users’ work, rather than replacing them. This is because the activity III. CONSORTIUM of a knowledge worker is complex and humans are Partners: University of Sheffield (coordinator, UK), irreplaceable agents in this process. University of Koblenz (D), ITC-Irst (I), University of Ljubljana (Slovenia) University of Freiburg (D), CERTH (G), In this context, we are studying, designing and developing: Labri (F), University of Karlsruhe (D) and the Open (1) Effective and efficient new paradigms for knowledge University (UK). Quinary (I), Ontoprise (D), Solcara (UK), retrieval, sharing and reuse which enable users to define and CognIT (N), Rolls Royce (UK) and Centro Ricerche Fiat (I). parameterize views on the available knowledge according to their needs. ACKNOWLEDGMENT (2) Novel and cutting-edge knowledge fusion methods to X-Media is funded by the European Commission as part of support knowledge workers in making decisions when Framework 6 of IST, contract no FP6-26978. confronted with – possibly contradicting – knowledge Project web page: http://www.x-media-project.org. derived from different resources. (3) techniques able to represent and manage (i) uncertainty, (ii) For information: Prof. Fabio Ciravegna, email: trust and provenance as well as (iii) dynamic aspects of xmedia-coordinator@dcs.shef.ac.uk knowledge.