=Paper=
{{Paper
|id=Vol-1436/Paper49
|storemode=property
|title=RECOD @ Placing Task of MediaEval 2015
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper49.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LiMACPDNMPPSGT15
}}
==RECOD @ Placing Task of MediaEval 2015==
RECOD @ Placing Task of MediaEval 2015 Lin Tzy Li1 , Javier A. V. Muñoz1 , Jurandy Almeida1,2 , Rodrigo T. Calumby1,3 , Otávio A. B. Penatti1,4 , Ícaro C. Dourado1 , Keiller Nogueira6 , Pedro R. Mendes Júnior1 , Luís A. M. Pereira1 , Daniel C. G. Pedronette1,5 , Jefersson A. dos Santos6 , Marcos A. Gonçalves6 , Ricardo da S. Torres1 1 RECOD Lab, Institute of Computing, University of Campinas (UNICAMP), 2 GIBIS Lab, Institute of Science and Technology, Federal University of São Paulo (UNIFESP), 3 University of Feira de Santana, 4 SAMSUNG Research Institute Brazil, 5 Universidade Estadual Paulista (UNESP), 6 Department of Computer Science, Universidade Federal de Minas Gerais (UFMG) – Brazil {lintzyli, pedro.mendes, luis.pereira, rtorres}@ic.unicamp.br, jurandy.almeida@unifesp.br, rtcalumby@ecomp.uefs.br, o.penatti@samsung.com, jalvarm.acm@gmail.com, icaro.dourado@students.ic.unicamp.br, daniel@rc.unesp.br, {keiller.nogueira, jefersson, mgoncalv}@dcc.ufmg.br ABSTRACT set has 4,677 images and 905 videos, while the sub-training In this work, we describe the approach proposed by the RE- set has 12,944 videos and 4,226,559 images. COD team for the Placing Task, Locale-based sub-task, at MediaEval 2015. Our approach is based on the use of as 2.1 Features much evidence as possible (textual, visual, and/or audio de- Textual . The title, description, and tags of photos/videos scriptors) to automatically assign geographical locations to were concatenated as a single field (here named as fusion). images and videos. The versions that used only title, description, or tags were also used for the rank aggregation method. The text was stemmed and stopwords were removed. We used BM25, 1. INTRODUCTION TF-IDF (cosine), information-based similarity (IBSimilar- Geocoding multimedia has gained attention in the latest ity) and language modelling similarity (LMDirichletSimi- years given its importance for providing richer services for larity), which are similarity measures implemented in the users such as placing information on maps and providing Lucene package [8]. geographic searches. Since 2011, the Placing Task [2] at Audio/Visual . For visual place recognition of images, we MediaEval has been challenging participants to assign the used the provided features: edgehistogram (EHD), scalable- geographical locations to images and videos automatically. color (SCD), and tamura. We also extracted BIC [13]. Here we present our approach for the Locale-based sub- Due to time constraint and feature dimensionality, other task. It combines textual, audio, and/or visual descriptors planned visual features could not be applied this year. For by applying rank aggregation and ranked list density anal- videos, we used the provided features (all from LIRE [9] ysis to combine multimodal information encoded in ranked and MFCC20 [3]) besides extracting histograms of motion lists. This year, we evaluated new features and a Genetic patterns (HMP) [1]. Programming (GP) [5] approach to multimodal geocoding. GP provides a good framework for modeling optimization problems even when the variables are functions. 2.2 Rank aggregation, density analysis & Besides combining ranked lists, we also applied combina- geocoding tions of rank aggregation methods by using GP. The idea is We used the full training set as geo-profiles and each test to automatically select a set of suitable features and rank item was compared to the whole training set for each fea- aggregation functions that yield the best result according to ture independently. For a given test item, a ranked list for a given fitness function. Previous works [7, 16] have shown each feature was generated. Given the ranked lists, we ex- that combining rank aggregated lists and rank aggregation plored two distinct strategies: (i) a rank aggregation ap- functions [15] yields very effective results. proach based on genetic programming (GP-Agg); and (ii) a ranked list density analysis. In addition, we also explored the combination of both strategies. 2. PROPOSED APPROACH 1. The GP-Agg method uses genetic programming to com- Our approach estimates location based on rank aggrega- bine a set of methods for rank aggregation in an agglom- tion of a multitude of ranked lists and their density analy- erative way, in order to improve the results of the isolated sis [7]. We extracted a large set of features from the data, methods [15]. We used this method to combine the textual derived ranked lists, and combined them using rank aggre- and visual ranked lists generated for various descriptors. gation methods which in turn are selected and fused by a This method was choosen because in [15] the authors GP-based framework proposed in [15]. showed that GP-Agg produced better or equal results than For evaluation purposes in the training phase (as in the best supervised technique in a wide range of rank aggre- 2014 [7]) we split the whole training set into two parts: (i) gation techniques (supervised and unsupervised). Moreover, a validation set; and (ii) a sub-training set. The validation it required a reasonable time for training (a couple of hours), and it was relatively fast to apply the best individual (dis- covered function) on the test set. Copyright is held by the author/owner(s). The GP-Agg method was trained using 400 queries from MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany the validation set (randomly chosen) and their ranked lists. We stopped the evolution process at the 30th generation. We used the fitness function, genetic operators, and rank Table 2: Runs configurations Images Videos aggregation techniques that yielded the best results in [15]. Run Textual Visual Geocoding Textual Visual/ Geocoding Audio The GP-Agg parameters are shown in Table 1. BM25 + BM25 + For the training phase of GP-Agg, an element of a ranked 1 TF-IDF - GP-Agg TF-IDF - GP-Agg + IBS + + IBS + list was considered relevant if it is located no farther than LMD LMD 1 km from the ground truth location of the query element. RLDA HMP + all GP-Agg 2 - BIC - + RLDA The best individuals discovered in the training phase were (top100) LIRE (top100) applied to the test set. BM25 + BIC + BM25 + HMP + all TF-IDF EHD + TF-IDF 3 GP-Agg LIRE + GP-Agg Table 1: GP-Agg parameters [15]. + IBS + SCD + + IBS + MFCC20 LMD tamura LMD Parameter Value TFIDF+ Number of generations 30 RLDA (top BM25+ RLDA (top 4 - 5 from each BM25 - Genetics operators Reproduction, Mutation, IBS+ 5) list) Crossover LMD Fitness functions FFP1, WAS*, MAP, NDCG Rank Agg. methods CombMAX, CombMIN, items of the aggregated lists in Run 3 as we had planned ini- CombSUM, CombMED, tially. The second best submission was achieved by Run 1. CombANZ, CombMNZ, Runs 2 and 3 showed that there is a room for improvements RLSim, BordaCount, RRF, MRA in our method based on GP. * WAS (Weighted Average Score) as defined in [6]. Table 3: Overall test result: % correct in precision levels. Among the different fitness functions tested, the best re- Km Run 1 Run 2 Run 3 Run 4 sults (more precise) were achivied with the FFP1 as defined 0.001 0.15 0 0.14 0.12 in [4]: 0.01 0.54 0.01 0.53 0.62 0.1 5.49 0.09 5.35 6.44 |N | X 1 19.75 0.44 19.11 21.74 FF F P 1 = li ) × k1 × ln−1 (i + k2 ) r(ˆ (1) 10 36.60 1.99 35.31 38.38 i=1 100 44.89 3.57 43.26 46.91 1000 58.97 20.38 57.67 63.22 where i is the element position after retrieval and ˆ li is the 10000 91.36 88.13 91.53 94.01 element at position i. r(ˆ li ) ∈ {0, 1} is the relevance score Distance in each quartile (km) assigned to an element, being 1 if the element is relevant and Q1 1.8967 1239.6011 2.1244 1.4824 Q2 309.8649 5882.7206 394.889 196.0081 0 otherwise. |N | is the total number of retrieved elements. Q3 5573.9304 8636.9465 5766.8941 3798.6223 k1 , k2 are scaling factors. Based on [15], we choose k1 = 6 and k2 = 1.2 in our experiments. In most of the runs we have dealt with images and video 2. The ranked list density analysis (RLDA)1 explores the through different settings. For instance, in Run 2 we applied idea of finding the maximum point in a probability den- RLDA top-100 of BIC ranked list for images, while for videos sity function (PDF). Firstly, we induce a k-nearest neighbor we combined other descriptors using GP-Agg followed by graph (with k = 3), where the graph nodes are defined as RLDA for the GP aggregated list. Thus, in Table 4, we being the top-n items of the ranked lists. For each node, only show the results regarding the videos in the test set. In we estimate its probability density value by using a Parzen- the validation phase, the geocoding results for videos were Window gaussian kernel. This procedure is the same used to relatively better than the ones for images, but it seems that find root nodes (nodes with maximum density in a PDF) in in the test set this tendency was not preserved. For example the Optimum-path Forest (OPF) clustering algorithm [10]. in Run 4 (Table 4), the rate of correctly geocoded for videos Finally, to assign a lat/long to a test item, we just verify in each precision levels is lower than the overall results for the lat/long of the graph node (a ranked list item) with the Run 4 (Table 3). highest density value. Table 4: Videos test results (% correct) Km 0.001 0.01 0.1 1 10 100 1000 10000 Run 1 0.08 0.40 5.46 17.62 32.44 40.27 54.13 90.57 3. OUR SUBMISSIONS & RESULTS Run 2 0.00 0.00 0.01 0.02 0.11 3.80 20.39 91.97 None of our submissions used extra crawled material or Run 3 0.08 0.37 5.13 16.74 32.10 39.69 53.67 91.50 gazetteers. Based on parameters of our best results on the Run 4 0.06 0.41 5.79 17.89 32.24 39.92 55.68 93.16 evaluation phase, our submissions were configured as shown in Table 2. Runs 1 and 4 were solely based on textual de- 4. FUTURE WORK scriptors, while Run 2 was only-visual and Run 3 was a We plan to evaluate more textual and visual descriptors multimodal submission. and give them as input to GP-Agg to select descriptors From Table 3, our best submission was Run 4, in which and rank aggregation methods. For example: (a) a textual we applied the RLDA over the top-5 items of each tex- descriptor that combines graph representation [11] with a tual ranked list. We observed on the validation set that framework for graph-to-vector synthesis [12]; (b) applying the RLDA of the top-n items from aggregated ranked list results from works that tackle the problem of visual place (visual and textual) seems to improve the results over just recognition [14]. Additionally, we plan to devise a GP fit- taking the first item from a multimodal aggregated ranked ness function that takes advantage of RLDA to geocode, list. However, due to the delay in generating ranked lists since most of the time RLDA improves geocoding results, of the visual features, we did not apply RLDA to the top-n besides exploring clustering analysis of the top-n items. 1 Last year [7], we called RLDA as OPF, however this year Acknowledgments we renamed it, since we only used the OPF step that finds the most dense point. We thank FAPESP, CNPq, CAPES, and Samsung. 5. REFERENCES International Conference on Conference on Information and Knowledge Management, CIKM ’15, [1] J. Almeida, N. J. Leite, and R. da Silva Torres. 2015. (in press). Comparison of video sequences with histograms of [16] M. N. Volkovs and R. S. Zemel. CRF framework for motion patterns. In ICIP, pages 3673–3676, 2011. supervised preference aggregation. In Proceedings of [2] J. Choi, C. Hauff, O. V. Laere, and B. Thomee. The the 22Nd ACM International Conference on Placing Task at MediaEval 2015. In Working Notes Conference on Information; Knowledge Management, Proc. MediaEval Workshop, Wurzen, Germany, Sept. CIKM ’13, pages 89–98, New York, NY, USA, 2013. 2015. [3] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4):357–366, Aug. 1980. [4] W. Fan, E. A. Fox, P. Pathak, and H. Wu. The effects of fitness functions on genetic programming-based ranking discovery for web search. Journal of the American Society for Information Science and Technology, 55(7):628–636, 2004. [5] J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. [6] L. T. Li, D. C. G. Pedronette, J. Almeida, O. A. B. Penatti, R. T. Calumby, and R. da Silva Torres. A rank aggregation framework for video multimodal geocoding. Mult. Tools and App., pages 1–37, 2013. http://dx.doi.org/10.1007/s11042-013-1588-4. [7] L. T. Li, O. A. B. Penatti, J. Almeida, G. Chiachia, R. T. Calumby, P. R. M. Júnio, D. C. G. Pedronette, and R. da S. Torres. Multimedia geocoding: The RECOD 2014 approach. In Working Notes Proc. MediaEval Workshop, volume 1263, page 2, 2014. [8] A. Lucene. Apache Lucene Core. Web Site. http://lucene.apache.org/core/. As of Sept. 2015. [9] M. Lux and S. A. Chatzichristofis. Lire: Lucene image retrieval: An extensible java CBIR library. In Proceedings of the 16th ACM International Conference on Multimedia, MM ’08, pages 1085–1088, New York, NY, USA, 2008. [10] L. M. Rocha, F. A. M. Cappabianco, and A. X. Falcão. Data clustering as an optimum-path forest problem with applications in image analysis. Int J Imag Syst Tech, 19(2):50–68, 2009. [11] A. Schenker, H. Bunke, M. Last, and A. Kandel. Graph-Theoretic Techniques for Web Content Mining. World Scientific Publishing Co., Inc., NJ, USA, 2005. [12] F. B. Silva, S. Tabbone, and R. d. S. Torres. BoG: A New Approach for Graph Matching. In ICPR, pages 82–87. IEEE, Aug. 2014. [13] R. d. O. Stehling, M. Nascimento, and A. Falcão. A compact and efficient image retrieval approach based on border/interior pixel classification. In Proceedings of the 11th International Conference on Information and Knowledge Management, CIKM ’02, pages 102–109, 2002. [14] A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’15, pages 1808–1817, 2015. [15] J. A. Vargas, R. d. S. Torres, and M. A. Gonçalves. A soft computing approach for learning to aggregate rankings. In Proceedings of the 24Th ACM