TUD-MIR at MediaEval 2011 Genre Tagging Task: Query Expansion from a Limited Number of Labeled Videos Stevan Rudinac, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology, Delft, The Netherlands {s.rudinac, m.a.larson, a.hanjalic}@tudelft.nl ABSTRACT language model for each individual genre label. Instead, in our In this paper we present results of our initial research on genre approach we take a genre label to be an initial query and mine tagging. We approach the task from information retrieval perspec- additional genre-specific query terms from the text associated with tive using a relatively small number of labeled videos in the de- the available labeled videos. We choose to use query expansion velopment set to mine query expansion terms characteristic of because in our previous work [4] it has proven effective in seman- each genre. We also investigate which sources of information tic-theme-based video tagging and retrieval. associated with the videos or extracted from their audio channel, The experiments reported here were performed on MediaEval e.g. title, description, tags and automatic speech recognition tran- 2011 Genre Tagging datasets [1], which consist of semi- scripts yield the highest improvement within our query expansion professional documentary videos downloaded from blip.tv to- framework. The experiments performed on MediaEval 2011 gether with the associated metadata. The metadata available with Genre Tagging dataset demonstrate the effectiveness of our ap- the videos include title, description, tags as well as the id of show proach. to which a particular video episode belongs. The development set consists of 247 videos for which the genre labels are provided. Categories and Subject Descriptors The test set is larger and consists of 1727 videos. Each video in H.3.3 [Information Storage and Retrieval]: Information Search the development and the test set belongs to one of 26 genre cate- and Retrieval – retrieval models, query formulation. gories defined by blip.tv (e.g. art, autos and vehicles, business, default category etc.). The task requires prediction of genre label General Terms for the videos in the test set. In the following, we first describe our Algorithms, Performance, Experimentation. query expansion approaches as well as information sources used. Then, we report on experimental results which confirm effective- Keywords ness of our approach to genre tagging and indicate research direc- Genre tagging, query expansion, video retrieval. tions that should be pursued for further performance improve- ment. 1. INTRODUCTION 2. APPROACHES In this paper we present results of our initial research on genre In all official runs, we expand queries using the videos available tagging, conducted as part of the participation in MediaEval 2011 in the development set. Additionally, we experiment with several benchmark. Aiming to create a solid baseline for the future work, other query expansions and report results as “unofficial runs”. we investigate which sources of information, including automatic speech recognition transcripts and metadata associated with the 2.1 Query Expansion via Labeled Videos videos, such as e.g. title, description and tags would yield the best We conjecture that a set of terms characteristic of a particular performance in the task. Information about genre is generally genre could be extracted even from a small number of labeled encoded in both visual and spoken channel of the video. In the videos and further used for query expansion. This concept has specific case of semi-professional user-generated videos, used to been widely exploited in e.g. information retrieval approaches compose MediaEval 2011 Genre Tagging datasets, the visual with relevance feedback [5]. In our approach, we treat a genre channel usually doesn’t provide enough information to discrimi- label as the original query and sample additional query terms from nate between videos based on genre [6], because a large number the text associated with the videos of that particular genre avail- of videos depict a single person talking about a particular topic. able in the development set. For each video, text from information For this reason, here we focus on the spoken channel and meta- sources used in a particular run is concatenated in a single docu- data only, while the possibilities of exploiting visual content to ment and then stopword removal and stemming are applied. Fur- improve performance in a further step are explored in [6]. ther, for each genre we rank all terms in the development set vo- Motivated by the success of information retrieval approaches cabulary according to the decreasing Offer Weight [3] and extend to semantic video annotation demonstrated in the Tagging Task the initial query (genre label) with the 20 top-ranked terms. Professional and WWW of MediaEval 2010 benchmark [2], we  ( r + 0.5 ) ∗ ( N − n − R + r + 0.5 )  perform genre tagging within an information retrieval framework. OW ( i ) = r ∗ log  We conjecture that, given a relatively small number of videos in  ( n − r + 0.5 ) ∗ ( R − r + 0.5 )  (1)   the development set, it would be practically infeasible to train a In the formula above, r is the number of videos of a particular genre term t(i) appears in, R is the total number of videos of that Copyright is held by the author/owner(s). genre, N is the total number of videos in the collection and n is the MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy number of videos in the collection term t(i) appears in. As a re- for default category and sample 20 expansion terms using the trieval method we use negative divergence between multinomial Offer Weight as explained in the previous section. For a descrip- models of the query and the document (video) implemented in tion of the baseline retrieval run and the remaining three query Lemur Toolkit. Since the default category is very broad and di- expansion approaches please refer to [4]. Such composed queries verse we produce a ranked list of videos for this genre independ- are used to query metadata associated with the videos in the test ently. We conjecture that the videos that are ranked low, or don’t set. Videos in default category are ranked as described in previous appear at all in the results lists produced for the other 25 genres, section. To conserve space, we report only the performance of a likely belong to the default category. Therefore, we produce the hypothetical oracle query expansion indicator that chooses the ranked list for this genre according to the increased video best performing query expansion (PRF, WordNet, Google Sets or score, VSi = ∑ g =1:25 ( N g − Rgi ) N g , where Ng is the total number YouTube) or the baseline for each genre (unofficial run_7 in Table 1). Failure analysis confirms that across genres all of these of videos retrieved for a particular genre g and Rgi is the rank of choices make a contribution to performance in individual cases. video vi in that list. If a particular video doesn’t appear in the results list produced for a genre g, Rgi is set to 0. Table 1. Performance of reported runs expressed in terms of In the official runs we test how different sources of informa- MAP; officially submitted runs are indicated with “^” tion influence tagging performance of the approach. Performance run_1^ run_2^ run_3^ run_4^ run_5^ run_6 run_7 of the runs described below is reported in Table 1. 0.2146 0.2699 0.3212 0.3937 0.4191 0.5594 0.2175 ASR: In the official run_1 we investigate the scenario when no metadata is available. Expansion terms are sampled from ASR 3. DISCUSSION AND CONCLUSIONS transcripts of videos in the development set and the retrieval is We presented several approaches to genre tagging for web video performed on the ASR transcripts of the videos in the test set. classification, based on simple and proven information retrieval concepts. The experimental results summarized in Table 1 con- Metadata No Tags: In the official run_2 only video title and firm their effectiveness for the task. We show that it is possible to description are exploited. make effective use of sampling genre-specific expansion terms, Metadata & ASR: In the official run_3 we use title, description, even when only a limited set of labeled videos is available. Fur- tags and ASR transcripts associated with the videos. ther, we show that the use of metadata yields the highest perform- ance within our framework. Reranking with show ids (run_5 and Metadata: To produce results in the official run_4 we index run_6) further improves performance, but we have strong reser- only title, description and the tags associated with the videos. vations about generality of this conclusion, because it might be Reranking With Show ID: In the case of blip.tv dataset, the show the artifact of blip.tv portal. It is also interesting to notice that the id is a strong genre indicator. Specifically, on the development set use of ASR transcripts together with metadata does not improve we noticed that episodes from the same show are usually of the performance of genre tagging, which is opposite to our earlier same genre. However, we decided not to use the show ids in the findings on “general” video retrieval and tagging [2]. Finally, the first 4 official runs because we wouldn’t be able to localize per- results in Table 1 show that the performance of “naïve” baseline, formance improvement and isolate contribution of other informa- PRF and query expansions using thesauri and collateral corpora is tion used. We use results produced in the official run_4 as the far below level of the approach presented in Section 2.1. In the baseline for reranking, because of the highest performance on the future we will work on refinement of the approach and investigate development set. For each genre, we utilize show ids to compute the performance on e.g. substantially larger video collections. We median rank of the videos (episodes) coming from the same show. will also investigate how visual modality could be exploited for In the official run_5, we follow the general video search rerank- improved genre tagging performance. ing idea, ranking videos of the same show together and according to their median rank in the starting results list. This run is meant 4. ACKNOWLEDGMENTS to complement visual reranking runs from [6] which investigate The research leading to these results has received funding from usefulness of visual channel for discriminating blip.tv videos the European Commission's 7th Framework Programme (FP7) based on genre. In the unofficial reranking run_6 we use a simi- under grant agreement n° 216444 (NoE PetaMedia). lar idea and rank at the top all episodes from the test set belonging to the shows that were in the development set labeled by a given 5. REFERENCES genre label. Videos belonging to the same show, are sorted ac- [1] Larson, M. et al. 2011. Overview of MediaEval 2011 Rich Speech cording to their rank in the initial results list and otherwise alpha- Retrieval Task and Genre Tagging Task. In MediaEval ’11. betically. The remaining videos from the initial results list are [2] Larson, M. et al. 2011. Automatic tagging and geotagging in video sorted according to their initial ranks. Note that we consider the collections and communities. In Proceedings ACM ICMR ‘11. strength of show id as a genre indicator to be an artifact of this particular dataset and do not expect this approach to generalize [3] Robertson, S. E. 1991. On term selection for query expansion. J. Doc. 46, 4. 2.2 Baseline Query Expansions [4] Rudinac, S., Larson, M. and Hanjalic, A. 2010. Exploiting noisy Besides the approach described in the previous section, we run visual concept detection to improve spoken content based video re- several experiments using unexpanded queries (genre labels – a trieval. In Proceedings ACM MM ‘10. baseline run) and several query expansions: PRF, WordNet, Goo- [5] Ruthven, I. and Lalmas, M. 2003. A survey on the use of relevance gle Sets and YouTube. To expand queries via YouTube, we first feedback for information access systems. Knowl. Eng. Rev. 18, 2. download metadata (e.g. title, description and tags) of the top-50 [6] Xu, P. et al. 2011. TUD-MM at MediaEval 2011 Genre Tagging ranked videos returned by YouTube for each genre label, except Task: Video search reranking for genre tagging. In MediaEval ’11.