=Paper= {{Paper |id=Vol-379/paper-12 |storemode=property |title=Text Mining As Support For Semantic Video Indexing And Analysis |pdfUrl=https://ceur-ws.org/Vol-379/paper11.pdf |volume=Vol-379 }} ==Text Mining As Support For Semantic Video Indexing And Analysis== https://ceur-ws.org/Vol-379/paper11.pdf
      TEXT MINING AS SUPPORT FOR SEMANTIC VIDEO INDEXING AND ANALYSIS


             Jan Nemrava, Vojtěch Svátek                                    Paul Buitelaar, Thierry Declerck
         University of Economics, Prague                                          DFKI Saarbrücken
        W. Churchil Sq. 4, 130 68 Prague-CZ                          Stuhlsatzenhausweg 3, 66123 Saarbrücken-D


                        ABSTRACT                                       2. RESOURCES COMPLEMENTARY TO A/V
                                                                                     STREAMS
This paper presents our work in the field of semantic
multimedia annotation and indexing with the use of                 The exploitation of related (complementary) textual
complementary textual resources analysis. We describe the          resources, especially when these are endowed with temporal
advantages of complementary sources of information as a            references can largely increase the quality of the video
support for annotation and test whether these data can be          analysis, indexing and retrieval. Of course the number of
used for automatic annotation and event detection.                 domains containing freely available detailed temporal
                                                                   descriptions is limited, but those where this information is
                    1. INTRODUCTION                                available in a large scale can be very effectively used.
                                                                   Multiple parallel descriptions of one event will further
    In this paper we present our work using the                    increase the coverage and eliminate false events. Good
complementary textual resources in video analysis. This,           examples can be found in the sports domain. Current
for the selected domain (soccer in our case) concerns              research in sports video analysis focuses on event
various textual sources such as structured data (match             recognition and classification based on the extraction of
tables with teams, player names, score goals, substitutions,       low-level features and is—when based solely on the low-
etc.) and semi-structured, textual web data (minute-by-            level features—limited to a very small number of different
minute match reports – unstructured text accompanied with          event types, e.g. 'scoring-event' [8]. On the other hand,
temporal information). Events and entities detected in these       complementary resources can serve as a valuable source for
sources are marked up with semantic classes derived from           more fine-grained event recognition and classification.
an ontology on soccer by use of information extraction             We distinguish between two different kinds of information
tools. Since the target audience comes from various                sources according to their direct vs. indirect connection to
research areas, this text will be focused on the potential use     the video material. Primary complementary resources
of this approach rather than on the description of the             include such information that is directly attached to the
details. Temporal alignment of primary video data (soccer          media, namely, overlay texts, audio track and spoken
match videos) with semantically organized events and               commentaries. Secondary complementary resources include
entities from the textual and structured complementary             information that is independent from the media itself but
resources can be used as indicator for video segment               related to its content, but it must be identified and
extraction and semantic classification; e.g. the occurrence        processed first.
of a 'Header' event in the complementary resources will be
used to train and later classify the corresponding video             3. COMPLEMENTARY TEXTUAL RESOURCES
segment accordingly. This information can then be used for                    AND VIDEO INDEXING
semantic indexing and retrieval of events in soccer videos,
but also for the targeted extraction of audio/visual (A/V)         Major sports events, such as the FIFA Soccer World Cup
features (motion, audio-pitch, field-line, close-up). We           Tournament that was held in Germany in 2006, provide a
denote such extraction of A/V features based on textual            wide range of available textual resources, ranging from
evidence "cross-media feature extraction".                         semi-structured data in the form of tables on web sites to
There is quite a lot research effort carried out in the field of   textual summaries and other match reports. The video
semantic annotation and indexing in the sports domain.             material was analyzed independently of the research
Some of them, such as the work by [9], also use the                described here, see [8]. The results of analysis are taken as
complementary resources, but (in this case) not to the             input for our research and consist of video segmentation,
extent as we do. For further related work see [4] [2] [1].         where each second is defined by a set of feature detectors,
i.e. Crowd detection, Speech-Band Audio Activity, On-                    5. CONCLUSIONS AND FUTURE WORK
Screen Graphics, Motion activity measure and Field Line               We presented an approach to the use of resources that
orientation.                                                     are complementary to A/V streams, such as videos of
A dataset for ontology-based [7] information extraction and      football matches, for the semantic indexing of such streams.
ontology learning from text (SmartWeb corpus) consists of        We further presented an experiment with event detection
a soccer ontology, a corpus of semi-structured and textual       based on general A/V detectors supported by textual
match reports and a knowledge base of automatically              annotation. In [5] we showed that such event detection
extracted events and entities.                                   based on general detectors can quite satisfactorily act as
Minute-by-minute reports are usually published at soccer         binary classifier, but when trained to provide classification
web sites and enable people to ’watch‘ the match in textual      for more classes it performs significantly worse. Using
form on the web. Processing several such reports in parallel     classifiers similar to those we have tested together with
increases the coverage of events and eliminates false            complementary textual minute-by-minute information
positive events. We therefore rely on the following 6            (providing minute-based rough estimates where a particular
different sources in this case: ARD, bild.de, LigaLive (all in   event occurred) can help in refining the video indexing and
German), and Guardian, DW-World, DFB.de (all in                  retrieval. The potential of this work is not only in the
English); we apply the SProUT tool [3] on them. This effort      annotation for indexing and retrieval of multimedia but also
resulted in an interactive non-linear event browsing demo        as feedback to the video learning algorithm, so we see its
presented in [6]. The next section describes experiments         role in the area of OCR and other video analysis areas
with event detection based on the general A/V detectors.         where we have to deal with text.

      4. CROSS-MEDIA FEATURE EXTRACTION                                        6. ACKNOWLEDGEMENTS
The aim of the semantic annotation is to allow                   This research was supported by the European Commission
(semi)automatic detection of events in the video based on        under contract FP6-027026 for the K-Space project. We
previously learned examples. The aim of this experiment          thank D Sadlier and Noel O’Connor (DCU, Ireland) for
was to test whether the general detectors are able to serve as   providing the A/V data and analysis results.
sufficient source of information. For this experiment we
used two manually-annotated soccer match videos, one as a                             7. REFERENCES
training set and the other for the tests. We created             [1] Bertini M., et al.: Automatic annotation and semantic
additional derived features describing the previous and the          retrieval of video sequences using multimedia ontologies.
next values of the detectors in the same time range as the           MULTIMEDIA '06. ACM, New York, NY
event instance itself, providing us with better chance to        [2] Castano S., et al.: Ontology Dynamics with Multimedia
                                                                     Information: The BOEMIE Evolution Methodology. In Proc.
capture the behavior of the detector in time. We used                of International Workshop on Ontology Dynamics (IWOD)
decision trees as machine learning algorithm and built up a          ESWC 2007 Workshop, Innsbruck, Austria
binary classifier for each of the observed events. The task of   [3] Drozdzynski W., et al.: Shallow Processing with Unification
the classifier was to decide whether the particular segment          and Typed Feature Structures - Foundations and
is or is not an event. By our observation, the detectors we          Applications. In KI 1/2004.
used are too generic for fine-grained event detection but        [4] Lanagan J and Smeaton A.F.: SportsAnno: What do you
they can help detect a certain event in a given (usually one         think?, RIAO 2007 - Large-Scale Semantic Access to
minute long) time range where the event was identified in            Content, Pittsburgh, PA, USA, 30 May - 1 June 2007.
the text. The table below shows that different detectors are     [5] Nemrava J., et al.: Text Mining Support for Semantic
                                                                     Indexing and Analysis of A/V Streams, OntoImage Workshop
important for different event types. This potentially allows         at LREC 2008, Marrakech, Morocco, May 2008
detecting instances of event types based on observing only       [6] Nemrava J., et al.:An Architecture for Mining Resources
those detectors that are discriminative for them (this               Complementary to Audio-Visual Streams. In: Proc. of the
assumption is also used by the decision tree algorithm). The         KAMC workshop at SAMT07, Italy, Dec. 2007.
letters P, C, N represent Previous, Current or Next Value of     [7] Oberle D., et al.: DOLCE ergo SUMO: On Foundational and
the detector for particular event type. More details can be          Domain Models in SWIntO (SmartWeb Integrated Ontology)
found in [5].                                                        Journal of Web Semantics: Science, Services and Agents on
                                                                     the World Wide Web 5 (2007) 156-174.
                                                                 [8] Sadlier D., O'Connor N.: Event Detection in Field Sports
                                                                     Video using Audio-Visual Features and a Support Vector
                                                                     Machine. IEEE Transactions on Circuits and Systems for
                                                                     Video Technology, Oct 2005
                                                                 [9] Xu H., Chua T.: The fusion of audio-visual features and
             Results of the cross-media feature selection            external knowledge for event detection in team sports video.
                                                                     In Proceedings of the 6th ACM SIGMM Workshop on
                                                                     Multimedia information Retrieval, 2004