=Paper=
{{Paper
|id=Vol-1263/paper17
|storemode=property
|title=CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_17.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/GaluscakovaP14
}}
==CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments==
CUNI at MediaEval 2014 Search and Hyperlinking Task:
Search Task Experiments
Petra Galuščáková and Pavel Pecina
Charles University in Prague
Faculty of Mathematics and Physics
Institute of Formal and Applied Linguistics
Prague, Czech Republic
{galuscakova,pecina}@ufal.mff.cuni.cz
ABSTRACT 3. SYSTEM TUNING
In this paper, we describe our participation in the Search Based on our previous experiments, we set the segment
part of the Search and Hyperlinking Task in MediaEval length in the fixed-length segmentation to 60 seconds and
Benchmark 2014. In our experiments, we compare two types the shift between the overlapping segments to 10 seconds.
of segmentation: fixed-length segmentation and segmenta- The segment length applied in the segmentation system was
tion employing Decision Trees on a set of various features. tuned on the training data and set to 50 seconds and 120
We also show usefulness of exploiting metadata and explore seconds for the Search sub-task. We also experimented with
removal of overlapping retrieved segments. post-filtering of the retrieved segments – we either used all
the retrieved segments or we removed segments which par-
1. INTRODUCTION tially overlapped with another higher ranked segment.
We also employed the metadata provided for the task.
The main aim of the Search sub-task is to find video seg- For each recording we extracted the title, episode title, de-
ments relevant to a given textual query. This problem is scription, short episode synopsis, service name and program
an important part of the Spoken Content Retrieval [8, 10] variant and appended the text to each segment from that
research area, which has been emerging in recent years. recording.
All experiments presented in this paper were conducted
on the BBC Broadcast data. A total of 1335 hours of video
was available for training and 2686 hours for testing. We ex- 4. RESULTS
ploited subtitles, automatic speech recognition (ASR) tran- The results for the Search sub-task are given in Table 1.
scripts by LIMSI [6], LIUM [9], and NST-Sheffield [7], all We present scores of six evaluation measures: Mean Average
available for the task. Detailed information about the task Precision (MAP), Precision at 5 (P5), Precision at 10 (P10),
and data can be found in the task description [2]. Precision at 20 (P20), Binned Relevance (MAP-bin), and
Tolerance to Irrelevance (MAP-tol) [1].
2. SYSTEM DESCRIPTION Unsurprisingly, the best results are achieved in experi-
ments using subtitles. Generally, most of the results ob-
Based on the results of our previous experiments [3], we
tained with the LIMSI transcripts are higher than the cor-
employed the Terrier IR system1 and its implementation of
responding results with the LIUM and NST-Sheffield tran-
the Hiemstra language model [5] with stemming and stop-
scripts. The only exception are the experiments employing
words removal.
overlapping segments. The results with the NST-Sheffield
Two strategies were used for segmentation of the record-
transcripts are higher than the corresponding results with
ings: 1) we divided the video recordings into segments of
the LIUM transcripts.
fixed length and 2) we used segmentation system which em-
In most of the cases, the concatenation of the segment
ployed Decision Trees (DT) [3]. This system makes use of
with metadata improved the results, despite the drop in the
several features including cue word n-grams (word n-grams
P5 score for all types of transcripts. Apart from several val-
frequently occurring at the segment boundary, e.g. “if”,
ues of P and MAP-bin for the LIUM transcript, the fixed-
“I’m”, “especially”, “the”) and cue tag n-grams (tag n-grams
length segmentation outperforms the Decision Trees-based
frequently occurring at the segment boundary, e.g. “VBP
segmentation with 120-seconds-long segments. Though the
PRP VBG”), silence between words, division given in tran-
50-seconds-long segments created using Decision Trees no-
scripts, and the output of the TextTiling algorithm [4]. For
tably outperform the fixed-length segments measured by
each word in the transcript, it decides whether the segment
MAP and precision-based measures, they are outperformed
ends after this word or not. The created segments may
by the fixed-length segmentation using the MAP-bin and
overlap. The system was trained on the data from Similar
MAP-tol measures.
Segments in Social Speech Task in MediaEval 2013 [11].
All measures, except the MAP-tol measure, are notably
1 higher in the experiments in which we did not remove par-
http://terrier.org
tially overlapping segments from the list of the retrieved
segments. Due to the nature of these measures, it is not pos-
Copyright is held by the author/owner(s). sible to distinguish, whether a user had already seen the re-
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain trieved segment or not. Therefore, all the relevant segments,
Transcripts Segment. Seg. Len. Metadata Overlap MAP P5 P10 P20 MAP-bin MAP-tol
Subtitles Fixed 60s No No 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155
Subtitles Fixed 60s Yes No 0.5127 0.7467 0.7267 0.6100 0.3433 0.3023
Subtitles Fixed 60s Yes Yes 4.3527 0.7867 0.7733 0.7683 0.4150 0.1459
Subtitles DT 120s Yes No 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157
Subtitles DT 120s Yes Yes 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515
Subtitles DT 50s Yes No 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350
LIMSI Fixed 60s No No 0.3534 0.7133 0.6600 0.5317 0.2916 0.2633
LIMSI Fixed 60s Yes No 0.4725 0.6667 0.6633 0.5467 0.3160 0.2696
LIMSI Fixed 60s Yes Yes 4.3000 0.6733 0.7133 0.7400 0.3822 0.1344
LIMSI DT 120s Yes No 0.3750 0.6933 0.6600 0.5383 0.2759 0.2054
LIMSI DT 120s Yes Yes 4.6366 0.7133 0.7300 0.7617 0.3706 0.1007
LIUM Fixed 60s No No 0.2836 0.6667 0.6067 0.4800 0.2227 0.2080
LIUM Fixed 60s Yes No 0.4371 0.6333 0.6400 0.5367 0.2651 0.2327
LIUM Fixed 60s Yes Yes 3.8328 0.6333 0.6767 0.6817 0.3180 0.1118
LIUM DT 120s Yes No 0.3538 0.6533 0.6300 0.5450 0.2659 0.2009
LIUM DT 120s Yes Yes 4.0709 0.6533 0.6800 0.6900 0.3345 0.0990
NST-Sheffield Fixed 60s No No 0.3279 0.6867 0.6467 0.5050 0.2646 0.2405
NST-Sheffield Fixed 60s Yes No 0.4645 0.6667 0.6600 0.5667 0.2974 0.2598
NST-Sheffield Fixed 60s Yes Yes 4.1241 0.6933 0.7000 0.7300 0.3560 0.1209
NST-Sheffield DT 120s Yes No 0.3627 0.6733 0.6567 0.5633 0.2624 0.2133
NST-Sheffield DT 120s Yes Yes 10.0198 0.7267 0.7533 0.7650 0.3342 0.0675
Table 1: Results of the Search sub-task for different transcripts, segmentation types, segment lengths, meta-
data, and removal of overlapping segments. The best results for each transcript are highlighted.
which frequently overlap each other, increase the score. The [3] P. Galuščáková and P. Pecina. Experiments with
MAP-tol measure is not influenced by this behavior as it Segmentation Strategies for Passage Retrieval in
takes into account only the relevant content which had not Audio-Visual Documents. In Proc. of ICMR, pages
been already seen by a user. Therefore, the highest MAP-tol 217–224, Glasgow, UK, 2014.
scores are achieved for the fixed-length segmentation when [4] M. A. Hearst. TextTiling: Segmenting Text into
the overlapping retrieved segments are removed. Multi-paragraph Subtopic Passages. Computational
Linguistics, 23(1):33–64, Mar. 1997.
5. CONCLUSION [5] D. Hiemstra. Using Language Models for Information
In our experiments in the Search sub-task, we have ex- Retrieval. PhD thesis, University of Twente, Enschede,
perimented with subtitles and three ASR transcripts. The Netherlands, 2001.
subtitles outperformed all used ASR transcripts. However, [6] L. Lamel and J.-L. Gauvain. Speech Processing for
the LIMSI transcripts also generally scored well and they Audio Indexing. In Proc. of GoTAL, pages 4–15,
slightly outperformed the NST-Sheffield transcripts. The Gothenburg, Sweden, 2008.
LIUM transcripts achieved the lowest scores in most of the [7] P. Lanchantin, P.-J. Bell, M.-J.-F. Gales, T. Hain,
cases. Moreover, we have confirmed usefulness of the meta- X. Liu, Y. Long, J. Quinnell, S. Renals, O. Saz, M.-S.
data and effectiveness of simple segmentation into fixed- Seigel, P. Swietojanski, and P.-C. Woodland.
length segments. Automatic Transcription of Multi-genre Media
We have also pointed out the problems with partially over- Archives. In Proceedings of SLAM Workshop, pages
lapping segments occurring in the results. Such segments 26–31, Marseille, France, 2013.
can greatly increase MAP scores, however they could not be [8] M. A. Larson and G. J. F. Jones. Spoken Content
expected to be helpful for the users. Therefore, the MAP-tol Retrieval: A Survey of Techniques and Technologies,
measure could be preferred in such cases. volume 5 of Found. Trends Inf. Retr. Now Publishers
Inc., Hanover, MA, USA, 2012.
6. ACKNOWLEDGMENTS [9] A. Rousseau, P. Deléglise, and Y. Estève. Enhancing
the TED-LIUM Corpus with Selected Data for
This research is supported by the Czech Science Foun-
Language Modeling and More TED Talks. In Proc. of
dation, grant number P103/12/G084, Charles University
LREC, pages 3935–3939, Reykjavik, Iceland, 2014.
Grant Agency GA UK, grant number 920913, and by SVV
[10] S. Rüger. Multimedia Information Retrieval. Synthesis
project number 260 104.
Lectures on Information Concepts, Retrieval and
Services. Morgan & Claypool Publishers, San Rafael,
7. REFERENCES CA, USA, 2010.
[1] R. Aly, M. Eskevich, R. Ordelman, and G. J. F. Jones. [11] N. G. Ward, S. D. Werner, D. G. Novick, E. E.
Adapting Binary Information Retrieval Evaluation Shriberg, C. Oertel, L.-P. Morency, and T. Kawahara.
Metrics for Segment-based Retrieval Tasks. CoRR, The Similar Segments in Social Speech Task. In Proc.
abs/1312.1913, 2013. of MediaEval, Barcelona, Spain, 2013.
[2] M. Eskevich, R. Aly, D. N. Racca, R. Ordelman,
S. Chen, and G. J. F. Jones. The Search and
Hyperlinking Task at MediaEval 2014. In Proc. of
MediaEval, Barcelona, Spain, 2014.