Towards Cross-Media Document Annotation Ajay Chakravarthy Department of Computer Science, Regent Court, Portobello Street, Sheffield S1 4DP A.Chakravarthy@dcs.shef.ac.uk Collecting and aggregating multimedia knowledge is of fundamental importance for every organisation in order to gain competitiveness and to reduce costs. It is possible that knowledge contained in just one medium E.G. text documents, does not carry the full evidence looked for. Therefore connecting information stored in more than one medium is often required. It is clear that current knowledge management technologies and practises cannot cope with such situations, as they mainly provide simple mechanisms (E.G. keyword searching). Currently knowledge workers manually pierce together the information from different sources. In this report we focus and envisage research methodologies that will enable the semantic enrichment of multimedia documents, both on multiple media and across media through annotation. Annotation of a document is a complex and labour intensive task. So far, research has focused [1][2][4] on supporting the annotation of single media. Much less attention has been paid to the issue of annotating material across media. For this reason there is a growing interest in developing methodologies able to capture the content and the context of multimedia documents, in order to enable effective searching (and document-based knowledge management in general). Previous research in personal image management [6] and text annotation [3] demonstrated how annotating images or documents could be a way to organise information and transform it into knowledge that can be used easily later. Metadata enables the creation of a knowledge base which can then be queried as a way both to retrieve documents (via content and context) and to query the structured data (E.G. creating charts illustrating trends). We address many of these problems with AKTive Media1 which is a system implemented during the PhD. AKTive Media is a user centric ontology based cross-media annotation system. The goal is to automate the process of annotation by means of knowledge sharing and reuse, thereby reducing user effort during the annotation process. The system actively queries web services and central annotational triple stores as a background service to look for context specific knowledge. The aim is to provide a seamless interface that guides the user through the process, reducing the complexity of the task. Language technologies and a web service architecture are adopted to provide a context specific annotation mechanism that uses suggestions inferred from both the ontology and from the previously stored annotations to help the user: the ontology is pre filtered to present only the top-level concepts (the most generic ones); The produced knowledge is then used as a way to establish connections with and to 1 http://www.dcs.shef.ac.uk/~ajay/html/cresearch.html navigate the information space: Example when the user annotates a part of an image of a car engine as “abrasion-damage” on a “crank-shaft” the system uses those annotations to retrieve other related images and documents. New relationships can then be established with the found knowledge, E.G. the damage can be related to other previous cases, and through free-text comments the relationship may be made explicit (E.G. this type of failure happens constantly on this blade in hot conditions, and this is proved by document x). AKTive Media has information extraction (IE) plug-ins built in (T-Rex)[5] which automate the annotation of textual documents. The main difference between our approach when compared to other state of the art annotation approaches [4][6] is that we use knowledge across media for annotation and also to further relate these annotation instances. This helps in greatly reducing user effort during manual annotation of documents, by providing intelligent suggestions derived from across media, the IE engine and from the central annotation server. The other major difference is that in AKTive Media, an effort is made to bridge the semantic gap between low level image features and semantically annotated metadata provided by users during annotation. We achieve this by providing means to index image collections and enable the user to query over the index using the visual content of the source image that is being annotated, the user can then use free hand mark-up over regions of the images to perform semi-automatic image segmentation and map high level ontology concepts to these segmented regions. This research methodology has been deployed in various research projects including AKT, Memories for Life, X-Media and we have scheduled a detailed user evaluation in Rolls Royce UK at the end of year, for the annotation of strip reports. Referencess 1. Ciravegna F., Dingli A., Petrelli D. and Wilks Y.: User-System Cooperation in Document Annotation based on Information Extraction. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), 1-4 October 2002 - Siguenza (Spain) 2. Dzbor M., Domingue J., Motta E. Towards a semantic web browser. Knowledge Media Institute, The Open University, Milton Keynes. UK. 2002 3. Handschuh S., Staab S., and Ciravegna F.. S-CREAM- Semi-automatic CREAtion of Metadata. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), 1-4 October 2002 - Siguenza (Spain), Lecture Notes in Artificial Intelligence 2473, Springer Verlag 4. Hendler J., Bijan P., Grovel M., Schain A., Golbek J., Halaschek-Wiener C., PhotoStuff – An Image Annotation Tool for the Sematnci Web, 2003. University of Maryland, MIND Lab, 8400 Baltimore Ave., College Park, MD 20742, USA 2NASA Headquarters, Washington, DC 20546, USA. 5. Iria, J. T-Rex: A Flexible Relation Extraction Framework. In Proceedings of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK'05), Manchester, January 2005. 6. Kuchinsky A., Pering C., Creech M.L., Freeze D., Serra B., Gwizdka J., FotoFile: A Consumer Multimedia Organisation and Retrieval System, Proceedings of ACM CHI99 Conference on Human Factors in Computing Systems, 496-503, 1999.