=Paper=
{{Paper
|id=Vol-379/paper-12
|storemode=property
|title=Text Mining As Support For Semantic Video Indexing And Analysis
|pdfUrl=https://ceur-ws.org/Vol-379/paper11.pdf
|volume=Vol-379
}}
==Text Mining As Support For Semantic Video Indexing And Analysis==
TEXT MINING AS SUPPORT FOR SEMANTIC VIDEO INDEXING AND ANALYSIS
Jan Nemrava, Vojtěch Svátek Paul Buitelaar, Thierry Declerck
University of Economics, Prague DFKI Saarbrücken
W. Churchil Sq. 4, 130 68 Prague-CZ Stuhlsatzenhausweg 3, 66123 Saarbrücken-D
ABSTRACT 2. RESOURCES COMPLEMENTARY TO A/V
STREAMS
This paper presents our work in the field of semantic
multimedia annotation and indexing with the use of The exploitation of related (complementary) textual
complementary textual resources analysis. We describe the resources, especially when these are endowed with temporal
advantages of complementary sources of information as a references can largely increase the quality of the video
support for annotation and test whether these data can be analysis, indexing and retrieval. Of course the number of
used for automatic annotation and event detection. domains containing freely available detailed temporal
descriptions is limited, but those where this information is
1. INTRODUCTION available in a large scale can be very effectively used.
Multiple parallel descriptions of one event will further
In this paper we present our work using the increase the coverage and eliminate false events. Good
complementary textual resources in video analysis. This, examples can be found in the sports domain. Current
for the selected domain (soccer in our case) concerns research in sports video analysis focuses on event
various textual sources such as structured data (match recognition and classification based on the extraction of
tables with teams, player names, score goals, substitutions, low-level features and is—when based solely on the low-
etc.) and semi-structured, textual web data (minute-by- level features—limited to a very small number of different
minute match reports – unstructured text accompanied with event types, e.g. 'scoring-event' [8]. On the other hand,
temporal information). Events and entities detected in these complementary resources can serve as a valuable source for
sources are marked up with semantic classes derived from more fine-grained event recognition and classification.
an ontology on soccer by use of information extraction We distinguish between two different kinds of information
tools. Since the target audience comes from various sources according to their direct vs. indirect connection to
research areas, this text will be focused on the potential use the video material. Primary complementary resources
of this approach rather than on the description of the include such information that is directly attached to the
details. Temporal alignment of primary video data (soccer media, namely, overlay texts, audio track and spoken
match videos) with semantically organized events and commentaries. Secondary complementary resources include
entities from the textual and structured complementary information that is independent from the media itself but
resources can be used as indicator for video segment related to its content, but it must be identified and
extraction and semantic classification; e.g. the occurrence processed first.
of a 'Header' event in the complementary resources will be
used to train and later classify the corresponding video 3. COMPLEMENTARY TEXTUAL RESOURCES
segment accordingly. This information can then be used for AND VIDEO INDEXING
semantic indexing and retrieval of events in soccer videos,
but also for the targeted extraction of audio/visual (A/V) Major sports events, such as the FIFA Soccer World Cup
features (motion, audio-pitch, field-line, close-up). We Tournament that was held in Germany in 2006, provide a
denote such extraction of A/V features based on textual wide range of available textual resources, ranging from
evidence "cross-media feature extraction". semi-structured data in the form of tables on web sites to
There is quite a lot research effort carried out in the field of textual summaries and other match reports. The video
semantic annotation and indexing in the sports domain. material was analyzed independently of the research
Some of them, such as the work by [9], also use the described here, see [8]. The results of analysis are taken as
complementary resources, but (in this case) not to the input for our research and consist of video segmentation,
extent as we do. For further related work see [4] [2] [1]. where each second is defined by a set of feature detectors,
i.e. Crowd detection, Speech-Band Audio Activity, On- 5. CONCLUSIONS AND FUTURE WORK
Screen Graphics, Motion activity measure and Field Line We presented an approach to the use of resources that
orientation. are complementary to A/V streams, such as videos of
A dataset for ontology-based [7] information extraction and football matches, for the semantic indexing of such streams.
ontology learning from text (SmartWeb corpus) consists of We further presented an experiment with event detection
a soccer ontology, a corpus of semi-structured and textual based on general A/V detectors supported by textual
match reports and a knowledge base of automatically annotation. In [5] we showed that such event detection
extracted events and entities. based on general detectors can quite satisfactorily act as
Minute-by-minute reports are usually published at soccer binary classifier, but when trained to provide classification
web sites and enable people to ’watch‘ the match in textual for more classes it performs significantly worse. Using
form on the web. Processing several such reports in parallel classifiers similar to those we have tested together with
increases the coverage of events and eliminates false complementary textual minute-by-minute information
positive events. We therefore rely on the following 6 (providing minute-based rough estimates where a particular
different sources in this case: ARD, bild.de, LigaLive (all in event occurred) can help in refining the video indexing and
German), and Guardian, DW-World, DFB.de (all in retrieval. The potential of this work is not only in the
English); we apply the SProUT tool [3] on them. This effort annotation for indexing and retrieval of multimedia but also
resulted in an interactive non-linear event browsing demo as feedback to the video learning algorithm, so we see its
presented in [6]. The next section describes experiments role in the area of OCR and other video analysis areas
with event detection based on the general A/V detectors. where we have to deal with text.
4. CROSS-MEDIA FEATURE EXTRACTION 6. ACKNOWLEDGEMENTS
The aim of the semantic annotation is to allow This research was supported by the European Commission
(semi)automatic detection of events in the video based on under contract FP6-027026 for the K-Space project. We
previously learned examples. The aim of this experiment thank D Sadlier and Noel O’Connor (DCU, Ireland) for
was to test whether the general detectors are able to serve as providing the A/V data and analysis results.
sufficient source of information. For this experiment we
used two manually-annotated soccer match videos, one as a 7. REFERENCES
training set and the other for the tests. We created [1] Bertini M., et al.: Automatic annotation and semantic
additional derived features describing the previous and the retrieval of video sequences using multimedia ontologies.
next values of the detectors in the same time range as the MULTIMEDIA '06. ACM, New York, NY
event instance itself, providing us with better chance to [2] Castano S., et al.: Ontology Dynamics with Multimedia
Information: The BOEMIE Evolution Methodology. In Proc.
capture the behavior of the detector in time. We used of International Workshop on Ontology Dynamics (IWOD)
decision trees as machine learning algorithm and built up a ESWC 2007 Workshop, Innsbruck, Austria
binary classifier for each of the observed events. The task of [3] Drozdzynski W., et al.: Shallow Processing with Unification
the classifier was to decide whether the particular segment and Typed Feature Structures - Foundations and
is or is not an event. By our observation, the detectors we Applications. In KI 1/2004.
used are too generic for fine-grained event detection but [4] Lanagan J and Smeaton A.F.: SportsAnno: What do you
they can help detect a certain event in a given (usually one think?, RIAO 2007 - Large-Scale Semantic Access to
minute long) time range where the event was identified in Content, Pittsburgh, PA, USA, 30 May - 1 June 2007.
the text. The table below shows that different detectors are [5] Nemrava J., et al.: Text Mining Support for Semantic
Indexing and Analysis of A/V Streams, OntoImage Workshop
important for different event types. This potentially allows at LREC 2008, Marrakech, Morocco, May 2008
detecting instances of event types based on observing only [6] Nemrava J., et al.:An Architecture for Mining Resources
those detectors that are discriminative for them (this Complementary to Audio-Visual Streams. In: Proc. of the
assumption is also used by the decision tree algorithm). The KAMC workshop at SAMT07, Italy, Dec. 2007.
letters P, C, N represent Previous, Current or Next Value of [7] Oberle D., et al.: DOLCE ergo SUMO: On Foundational and
the detector for particular event type. More details can be Domain Models in SWIntO (SmartWeb Integrated Ontology)
found in [5]. Journal of Web Semantics: Science, Services and Agents on
the World Wide Web 5 (2007) 156-174.
[8] Sadlier D., O'Connor N.: Event Detection in Field Sports
Video using Audio-Visual Features and a Support Vector
Machine. IEEE Transactions on Circuits and Systems for
Video Technology, Oct 2005
[9] Xu H., Chua T.: The fusion of audio-visual features and
Results of the cross-media feature selection external knowledge for event detection in team sports video.
In Proceedings of the 6th ACM SIGMM Workshop on
Multimedia information Retrieval, 2004