Intelligent Search in a Collection of Video Lectures Angela Fogarolli University of Trento, Dept. of Information and Communication Tech., Via Sommarive 10, 38050 Trento, Italy afogarol@dit.unitn.it 1 Abstract In recent years, the use of streamed digital video as a teaching and learning resource has become an increasingly attractive option for many educators as an innovation which expands the range of learning resources available to students by moving away from static text-and-graphic resources towards a video-rich learning environment. Streamed video is already widely used in some universities and it is mostly being used for transmitting unenhanced recordings of live lectures. What we are proposing is a way of enriching this video streaming scenario in the eLearning context. We want to extract information from the video and the correlated materials and make them searchable. Thus, the aim of this thesis is to create a semantically searchable collection of video lectures. In the literature, surprisingly little information can be found about speech and document retrieval in combination with lecture recording. There are interest- ing examples of e-lecture creation and delivery e.g. [5], audio retrieval of lecture recording [3] that explore automatic processing of speech, or systems such as the eLecture portal [1] which indexes the audio and also the text of the lecture slides. But to the best of our knowledge there is no system which combines and synchronizes the different modalities in a searchable collection. What we propose is enabling the search and navigation through the different media types presented in a frontal lecture with the addition of the video record- ing. In video indexing domain Snoek and Worring [4] have proposed to define multimodality as ”the capacity of an author of the video document to express a predefined semantic idea, by combining a layout with a specific content, using at least two information channels”. The channels or modalities of a video document described in [4] are the visual, auditory and textual modality. We believe that using more than one modality – as explained in [2] – could increase productivity also in the context of e-learning, where it is really frequent to scan for information. Our main focus is to enable search on two modalities; in particular we will index the auditory modality of video lecture content based on transcription obtained with automatic speech recognition tools and on tex- tual modality using text indexing on the related materials. Furthermore, we do not just want to present an enhanced version of the current state of the art in e-lecture retrieval but we also envision to add semantic capabilities to the search functionalities in order to provide a superior learning experience to the student (personalized search, personalized learning path, relevant contextualization, au- tomatic video content profiling...). The application we are proposing is a way to provide a tool for students to enable more flexibility in e-Lectures consumption. The student could seek inside a collection of learning lectures and related materials (desktop activi- ties recording, PowerPoint presentations, interactive whiteboard tracks...). The search could also be personalized to meet the student demands. For each hit the system would display the lectures video-recording and the temporally synchro- nized learning materials. Another benefit we want to archive using Semantic Web techniques is to present a profile information of the content of the video lecture, this could lead to an improvement of the state of the art in the video-indexing field allowing automatic profile annotation of the content of the video. The research work specifically addressed by this thesis will investigate the following challenges: – Finding an innovative way for mastering the gap between information ex- traction and knowledge representation in our context. For each video and related learning resource an RDF representation would be extracted. The created graph would be navigated during the search task to find the re- quested information and suggest related topics using ontology linkage. We will use ontologies for high level lecture description and query understanding. – Automatic content description of the presented learning material. A textual description of the content of a video result, lecture or course would be pre- sented at the user. This could be realized presenting a profile information of the knowledge extracted from the video and the related material. – Evaluation of the tool value for improving the student performance and for shortening the learning time we will conduct before and after the semantic enhancement. References [1] Christoph Hermann, Wolfgang Hürst, and Martina Welte. The electure portal: An advanced archive for lecture recordings. In Informatics Education Europe Confer- ence, 2006. [2] D. Jones, Shen A., and D W., Reynolds. Two experiments comparing reading with listening for human processing of conversational telephone. In Interspeech 2005 Eurospeech Conference, 2005. [3] A. Park, T. Hazen, and J. Glass. Automatic processing of audio lectures for infor- mation retrieval: Vocabulary selection and language modeling. In ICASSP, March 2005. [4] C.G.M. Snoek and M. Worring. Multimodal video indexing: A review of the state- of-the-art. In Multimedia Tools and Applications, number 25, pages 5–35, 2005. [5] Z. Zhu, C. McKittrick, and W. Li. Virtualized classroom automated production, media integration and user-customized presentation. In Multimedia Data and Doc- ument Engineering, July 2004. Semantic Web.