=Paper=
{{Paper
|id=Vol-233/paper-2
|storemode=property
|title=Indexing Camera Motion Integrating Knowledge of the Quality of the Encoded Video
|pdfUrl=https://ceur-ws.org/Vol-233/p03.pdf
|volume=Vol-233
|dblpUrl=https://dblp.org/rec/conf/samt/KramerDP06
}}
==Indexing Camera Motion Integrating Knowledge of the Quality of the Encoded Video==
Indexing Camera Motion Integrating
Knowledge of the Quality of the Encoded Video
P. Krämer, J. Benois-Pineau, member IEEE, M. Gràcia Pla
claim that a compressed stream is a rich source of input data
Abstract—Fast indexing of video contents in the compressed for indexing and this is only the matter of interpretation for the
domain has become an important task as growing quantities of intelligent use of it. In this paper we show how we can truly
multimedia (MM) digital content are available in this form. In use not only MPEG (1 or 2) motion vectors, but also the
this paper we present a method for fast indexing of camera
motion of MPEG1 and 2 compressed video. We use P-frame
information on the quality of their estimation in order to
motion vectors and extract some knowledge on the quality of the estimate the camera model (Section 2) and to qualify motion in
compensated motion from the compressed stream. It is then used the humanly interpretable way (Section 3). This is for instance
for decision making on the motion refinement. Then camera a task of camera motion characterization in TREC Video
motion is indexed in terms of physical motions. Results obtained 2005, where we did participate. We show how this knowledge
on the TREC Video test data set are interesting. helps us to improve the indexing results and give the
perspectives of this work (Section 4).
Index Terms— video indexing, camera motion, compressed
streams.
II. GLOBAL MOTION ESTIMATION AND CORRECTION FROM
I. INTRODUCTION MPEG COMPRESSED VIDEO
Indexing and annotating large quantities of films and video In this section we address the problem of estimating the global
material has become an increasing problem for the media (camera model) in a video sequence. Here we use motion
industry. Today, indexing for large application areas such as compensation vectors from P-frames. In order to remain the
broadcast, archives, and home MM devices definitely follows same temporal resolution and get a smooth motion trajectory,
we interpolate it for I-frames. Finally, as MPEG motion
MPEG7 – the compliant way. This is a standard [1] for
vectors are not computed for analysis purposes, but for optimal
describing the multimedia content. For visual media, it defines
encoding, they can be very much erroneous (e.g. in case of
descriptors to characterize the content on a visual basis. In
strong motion), we propose how to detect such encoder
video, which intrinsic property is motion, it proposes motion failures and how to correct the motion.
descriptors. Nevertheless, MPEG7 does not give hints on how
to produce a standard compliant description of e.g. camera II.1 Global motion estimation from P-frames
motion, and how to translate this description into features Here we rely on our previous work [5] and use a 6
easily interpreted by humans such as tilt, zoom, or pan… A lot parameter affine camera model. We suppose [5] that an MPEG
of multimedia content is already available in compressed form. macro-block displacement vector is expressed as:
Furthermore, a digitization of the existing video content and
dx a1 a 2 a3 x − x g
digital production of new content are today unthinkable = + (1)
y − y
without compression. Thus a lot of work [2 – 4] has been 2 5
dy a a a 6 g
devoted to the estimation of the camera model from motion
vectors contained in the compressed stream. This work is where a1 ,..., a 6 are the global motion parameters of camera
another step forward in the general framework which we call T
and ( x g , y g ) denotes the image center. The estimation by a
“Rough Indexing Paradigm” and has been developed since [5].
robust estimator that we proposed in [5], allows classifying
A whole lot of indexing tasks such as shot boundary detection,
macro-blocks (MBs) as conformant to the model, what we call
scene grouping, video summarization, video object extraction,
the “dominant estimation support”, or outliers. The latter
or motion characterization can be fulfilled on degraded and
contain intra-coded MBs, MBs in moving objects and in
low-resolution/low-level data produced by encoding video
occluding areas. This approach supposes that in a current P-
streams with current encoders (MPEG1, 2, H.264 …). We
frame, there are motion vectors, which express the apparent
camera motion. Unfortunately this is not always the case. In
P. Krämer and J. Benois-Pineau are with LABRI UMR CNRS/University order to re-cover the real camera motion in such frames it is
of Bordeaux 1/Enseirb/INRIA laboratory, 351, crs de la Libération, 33405
Talence Cedex, France; petra.kraemer, jenny.benois@labri.fr; phone 33 5 40 necessary to detect encoder failures and to correct the motion.
00 84 24, fax 33 5 40 00 66 69. M. Gràcia Pla has been on master position in
LABRI on leave from UPC, Barcelona, Spain.
II.2 Detection of frames with low–quality motion and motion respect to the residuals between the estimated model and the
correction MPEG motion vectors. These residuals are supposed to follow
If the MPEG encoder motion estimator failed, the motion the bi-variate Gaussian law. The decision on the significance is
compensation error encoded in the MPEG stream is strong. made by a comparison of the log-likelihood ratio with a
Such failures are very much dependent on the parameter threshold. We used this scheme in our previous work, but in
settings of the encoder and are specifically observed in the case of the knowledge on a bad estimation that is available
case of strong motion (e.g. soccer content). from (2), we do not compute residuals between the erroneous
MPEG motion vectors and those obtained by the re-estimated
We compute the mean low frequency energy E t on the
model. The interpolated parameters are used as reference (light
dominant estimation support Dt i.e. excluding the motion correction) in this case.
outliers:
1
Et =
Dt
∑ DC ( p, t ) err
P
2
(2)
IV. RESULTS AND CONCLUSION
p∈Dt
err
To assess the improvement due to the proposed integration
Here DCP ( p, t ) are the DC coefficients extracted from the of the knowledge on erroneous motion and re-estimation of
encoded error in P-frames. motion (3), we conducted experiments on the evaluation set of
To take the decision if the motion model has to be the TREC Video camera motion task http://www-
corrected, we use the temporal mean γ t of (2). If the nlpir.nist.gov/projects/trecvid/ in which we participated in
2005. A subset of 4 videos containing visually observable
instantaneous value of (2) exceeds αγ t , with α ≥ 1 then the motion was chosen. Using α = 4.0 in the decision rule, about
motion will be corrected. 4% of the P-frame motion is corrected. Due to this correction
To fulfill this correction we first interpolate the motion we obtain a mean precision of 76% and a mean recall of
model from neighboring P-frames by a linear regression. This 86.1%. Without the correction 74.5% and 78.7% are obtained
interpolation is used as the initialization of the model estimate respectively. We have to stress that the increase of recall of 8
in the gradient descent scheme. % is already very much significant for this task.
Here we minimize the functional of the mean square error of Hence in this paper we proposed a new method for motion
the motion compensation at DC resolution on the dominant correction when estimating and indexing camera motion from
estimation support: compressed (MPEG1 and MPEG2) video streams.
1 r 2 We tested it for indexing purposes on the MPEG1
MSEt =
Dt
∑ (I ( p ) − I ( p + d ) )
t t −1 (3) compressed TREC Video test set. For video summarizing by
p∈D t mosaicing from compressed streams and for other indexing
The optimization is done in the parameter space by gradient applications (shot boundary detection, object extraction) we
descent: work on MPEG2 compressed streams as well. There is no
ε principal difference and the method reveals promising for the
Θ it +1 = Θ it − Gi whole Rough Indexing Paradigm, we continue developing on
2 Dt compressed streams.
with G as the gradient of (3) and ε as the adaptive gain
matrix. REFERENCES
[1] MPEG-7 Requirements Document V.7: Coding of Moving Pictures and
Audio
[2] E. Saez et al., “Global motion estimation algorithm for video
III. CAMERA MOTION INDEXING segmentation”, Proc. SPIE, VCIP'03, pp. 1540-1550
The objective here is to translate the motion model (1) into [3] R. Ewerth et al. “Estimation of arbitrary camera motion in {MPEG}
videos”, Proc. ICPR'04, pp. 512-515
physical motion, interpretable by humans, such as pan, tilt, or [4] C. Doulaverakis et al. , “Adaptive Methods for Motion Characterization
zoom. To do this we follow [6] and reformulate the model (1) and Segmentation of MPEG Compressed Frame Sequences”, Proc.
as: ICIAR'04, pp. 310-317
[5] M. Durik et al, “Robust Motion Characterisation for Video Indexing
dx pan zoom⋅ x − rot ⋅ y + hyp1⋅ x + hyp2 ⋅ y based on Optical Flow” Proc. CBMI’01, pp. 57-64.
= + [6] P. Bouthemy et al. “A unified approach to shot change detection and
dy tilt zoom⋅ y + rot ⋅ x − hyp1⋅ y + hyp2 ⋅ y camera motion characterization”, IEEE Trans. on CSVT, 9(7), pp.
1030-1044
(4)
Then two statistical hypotheses are tested on each parameter
of this model. The first one H 0 consists in supposing that the
parameter is significant, the second one H 1 assumes that the
component is not significant, i.e. equals zero.
The likelihood function f for each hypothesis is defined with