CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments Petra Galuščáková and Pavel Pecina Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Prague, Czech Republic {galuscakova,pecina}@ufal.mff.cuni.cz ABSTRACT 3. SYSTEM TUNING In this paper, we describe our participation in the Search Based on our previous experiments, we set the segment part of the Search and Hyperlinking Task in MediaEval length in the fixed-length segmentation to 60 seconds and Benchmark 2014. In our experiments, we compare two types the shift between the overlapping segments to 10 seconds. of segmentation: fixed-length segmentation and segmenta- The segment length applied in the segmentation system was tion employing Decision Trees on a set of various features. tuned on the training data and set to 50 seconds and 120 We also show usefulness of exploiting metadata and explore seconds for the Search sub-task. We also experimented with removal of overlapping retrieved segments. post-filtering of the retrieved segments – we either used all the retrieved segments or we removed segments which par- 1. INTRODUCTION tially overlapped with another higher ranked segment. We also employed the metadata provided for the task. The main aim of the Search sub-task is to find video seg- For each recording we extracted the title, episode title, de- ments relevant to a given textual query. This problem is scription, short episode synopsis, service name and program an important part of the Spoken Content Retrieval [8, 10] variant and appended the text to each segment from that research area, which has been emerging in recent years. recording. All experiments presented in this paper were conducted on the BBC Broadcast data. A total of 1335 hours of video was available for training and 2686 hours for testing. We ex- 4. RESULTS ploited subtitles, automatic speech recognition (ASR) tran- The results for the Search sub-task are given in Table 1. scripts by LIMSI [6], LIUM [9], and NST-Sheffield [7], all We present scores of six evaluation measures: Mean Average available for the task. Detailed information about the task Precision (MAP), Precision at 5 (P5), Precision at 10 (P10), and data can be found in the task description [2]. Precision at 20 (P20), Binned Relevance (MAP-bin), and Tolerance to Irrelevance (MAP-tol) [1]. 2. SYSTEM DESCRIPTION Unsurprisingly, the best results are achieved in experi- ments using subtitles. Generally, most of the results ob- Based on the results of our previous experiments [3], we tained with the LIMSI transcripts are higher than the cor- employed the Terrier IR system1 and its implementation of responding results with the LIUM and NST-Sheffield tran- the Hiemstra language model [5] with stemming and stop- scripts. The only exception are the experiments employing words removal. overlapping segments. The results with the NST-Sheffield Two strategies were used for segmentation of the record- transcripts are higher than the corresponding results with ings: 1) we divided the video recordings into segments of the LIUM transcripts. fixed length and 2) we used segmentation system which em- In most of the cases, the concatenation of the segment ployed Decision Trees (DT) [3]. This system makes use of with metadata improved the results, despite the drop in the several features including cue word n-grams (word n-grams P5 score for all types of transcripts. Apart from several val- frequently occurring at the segment boundary, e.g. “if”, ues of P and MAP-bin for the LIUM transcript, the fixed- “I’m”, “especially”, “the”) and cue tag n-grams (tag n-grams length segmentation outperforms the Decision Trees-based frequently occurring at the segment boundary, e.g. “VBP segmentation with 120-seconds-long segments. Though the PRP VBG”), silence between words, division given in tran- 50-seconds-long segments created using Decision Trees no- scripts, and the output of the TextTiling algorithm [4]. For tably outperform the fixed-length segments measured by each word in the transcript, it decides whether the segment MAP and precision-based measures, they are outperformed ends after this word or not. The created segments may by the fixed-length segmentation using the MAP-bin and overlap. The system was trained on the data from Similar MAP-tol measures. Segments in Social Speech Task in MediaEval 2013 [11]. All measures, except the MAP-tol measure, are notably 1 higher in the experiments in which we did not remove par- http://terrier.org tially overlapping segments from the list of the retrieved segments. Due to the nature of these measures, it is not pos- Copyright is held by the author/owner(s). sible to distinguish, whether a user had already seen the re- MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain trieved segment or not. Therefore, all the relevant segments, Transcripts Segment. Seg. Len. Metadata Overlap MAP P5 P10 P20 MAP-bin MAP-tol Subtitles Fixed 60s No No 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155 Subtitles Fixed 60s Yes No 0.5127 0.7467 0.7267 0.6100 0.3433 0.3023 Subtitles Fixed 60s Yes Yes 4.3527 0.7867 0.7733 0.7683 0.4150 0.1459 Subtitles DT 120s Yes No 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157 Subtitles DT 120s Yes Yes 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515 Subtitles DT 50s Yes No 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350 LIMSI Fixed 60s No No 0.3534 0.7133 0.6600 0.5317 0.2916 0.2633 LIMSI Fixed 60s Yes No 0.4725 0.6667 0.6633 0.5467 0.3160 0.2696 LIMSI Fixed 60s Yes Yes 4.3000 0.6733 0.7133 0.7400 0.3822 0.1344 LIMSI DT 120s Yes No 0.3750 0.6933 0.6600 0.5383 0.2759 0.2054 LIMSI DT 120s Yes Yes 4.6366 0.7133 0.7300 0.7617 0.3706 0.1007 LIUM Fixed 60s No No 0.2836 0.6667 0.6067 0.4800 0.2227 0.2080 LIUM Fixed 60s Yes No 0.4371 0.6333 0.6400 0.5367 0.2651 0.2327 LIUM Fixed 60s Yes Yes 3.8328 0.6333 0.6767 0.6817 0.3180 0.1118 LIUM DT 120s Yes No 0.3538 0.6533 0.6300 0.5450 0.2659 0.2009 LIUM DT 120s Yes Yes 4.0709 0.6533 0.6800 0.6900 0.3345 0.0990 NST-Sheffield Fixed 60s No No 0.3279 0.6867 0.6467 0.5050 0.2646 0.2405 NST-Sheffield Fixed 60s Yes No 0.4645 0.6667 0.6600 0.5667 0.2974 0.2598 NST-Sheffield Fixed 60s Yes Yes 4.1241 0.6933 0.7000 0.7300 0.3560 0.1209 NST-Sheffield DT 120s Yes No 0.3627 0.6733 0.6567 0.5633 0.2624 0.2133 NST-Sheffield DT 120s Yes Yes 10.0198 0.7267 0.7533 0.7650 0.3342 0.0675 Table 1: Results of the Search sub-task for different transcripts, segmentation types, segment lengths, meta- data, and removal of overlapping segments. The best results for each transcript are highlighted. which frequently overlap each other, increase the score. The [3] P. Galuščáková and P. Pecina. Experiments with MAP-tol measure is not influenced by this behavior as it Segmentation Strategies for Passage Retrieval in takes into account only the relevant content which had not Audio-Visual Documents. In Proc. of ICMR, pages been already seen by a user. Therefore, the highest MAP-tol 217–224, Glasgow, UK, 2014. scores are achieved for the fixed-length segmentation when [4] M. A. Hearst. TextTiling: Segmenting Text into the overlapping retrieved segments are removed. Multi-paragraph Subtopic Passages. Computational Linguistics, 23(1):33–64, Mar. 1997. 5. CONCLUSION [5] D. Hiemstra. Using Language Models for Information In our experiments in the Search sub-task, we have ex- Retrieval. PhD thesis, University of Twente, Enschede, perimented with subtitles and three ASR transcripts. The Netherlands, 2001. subtitles outperformed all used ASR transcripts. However, [6] L. Lamel and J.-L. Gauvain. Speech Processing for the LIMSI transcripts also generally scored well and they Audio Indexing. In Proc. of GoTAL, pages 4–15, slightly outperformed the NST-Sheffield transcripts. The Gothenburg, Sweden, 2008. LIUM transcripts achieved the lowest scores in most of the [7] P. Lanchantin, P.-J. Bell, M.-J.-F. Gales, T. Hain, cases. Moreover, we have confirmed usefulness of the meta- X. Liu, Y. Long, J. Quinnell, S. Renals, O. Saz, M.-S. data and effectiveness of simple segmentation into fixed- Seigel, P. Swietojanski, and P.-C. Woodland. length segments. Automatic Transcription of Multi-genre Media We have also pointed out the problems with partially over- Archives. In Proceedings of SLAM Workshop, pages lapping segments occurring in the results. Such segments 26–31, Marseille, France, 2013. can greatly increase MAP scores, however they could not be [8] M. A. Larson and G. J. F. Jones. Spoken Content expected to be helpful for the users. Therefore, the MAP-tol Retrieval: A Survey of Techniques and Technologies, measure could be preferred in such cases. volume 5 of Found. Trends Inf. Retr. Now Publishers Inc., Hanover, MA, USA, 2012. 6. ACKNOWLEDGMENTS [9] A. Rousseau, P. Deléglise, and Y. Estève. Enhancing the TED-LIUM Corpus with Selected Data for This research is supported by the Czech Science Foun- Language Modeling and More TED Talks. In Proc. of dation, grant number P103/12/G084, Charles University LREC, pages 3935–3939, Reykjavik, Iceland, 2014. Grant Agency GA UK, grant number 920913, and by SVV [10] S. Rüger. Multimedia Information Retrieval. Synthesis project number 260 104. Lectures on Information Concepts, Retrieval and Services. Morgan & Claypool Publishers, San Rafael, 7. REFERENCES CA, USA, 2010. [1] R. Aly, M. Eskevich, R. Ordelman, and G. J. F. Jones. [11] N. G. Ward, S. D. Werner, D. G. Novick, E. E. Adapting Binary Information Retrieval Evaluation Shriberg, C. Oertel, L.-P. Morency, and T. Kawahara. Metrics for Segment-based Retrieval Tasks. CoRR, The Similar Segments in Social Speech Task. In Proc. abs/1312.1913, 2013. of MediaEval, Barcelona, Spain, 2013. [2] M. Eskevich, R. Aly, D. N. Racca, R. Ordelman, S. Chen, and G. J. F. Jones. The Search and Hyperlinking Task at MediaEval 2014. In Proc. of MediaEval, Barcelona, Spain, 2014.