Recod @ MediaEval 2015: Diverse Social Images Retrieval Rodrigo T. Calumby1,2 , Iago B. A. do C. Araujo2 , Vinícius P. Santana2 Javier A. V. Munoz1 , Otávio A. B. Penatti3 , Lin T. Li1 , Jurandy Almeida1,5 Giovani Chiachia1 , Marcos A. Gonçalves4 , and Ricardo da S. Torres1 1 RECOD Lab, University of Campinas, Campinas, SP – Brazil, 2 University of Feira de Santana, Feira de Santana, BA – Brazil 3 SAMSUNG Research Institute Brazil, Campinas, SP – Brazil, 4 Federal University of Minas Gerais, Belo Horizonte, MG – Brazil 5 GIBIS Lab, Federal University of São Paulo, São José dos Campos, SP – Brazil [rtcalumby,ibacaraujo,vpsantana]@ecomp.uefs.br, jalvarm.acm@gmail.com, o.penatti@samsung.com, jurandy.almeida@unifesp.br, mgoncalv@dcc.ufmg.br, [lintzyli,chiachia,rtorres]@ic.unicamp.br ABSTRACT This paper presents the RECOD team experience in the Re- trieving Diverse Social Images Task at MediaEval 2015. The teams were required to develop a diversification approach for social photo retrieval. Our proposal is based on irrelevant image filtering, reranking, rank aggregation, and diversity promotion. We proposed a multimodal approach and ex- ploited image metadata and user credibility information. 1. INTRODUCTION The relevance-diversity trade-off is an important problem associated with several search scenarios. Promoting diver- sity in retrieval results has been shown to positively impact the user search experience specially for ambiguous, under- Figure 1: Overview of the proposed approach. specified, and visual summarization queries [1]. The Re- trieving Diverse Social Images Task [6] integrates it into a tourism related representative image retrieval challenge. used a face detection module of Face++1 and all images with This paper describes the RECOD group contributions via more than one face detected were removed. Finally, out-of- diversity promotion boosted by multimodal rank fusion. focus images were eliminated using the method from [14] with blur threshold set to 0.8. 2. PROPOSED APPROACH 2.2 Visual Features and Text Similarity As a relevance enhancement step our approach includes For visual similarity, besides the provided features, we irrelevant image filtering, multimodal image reranking, and also extracted: (i) two general purpose global descriptors rank aggregation. Image filtering was conducted according (BIC [13] and GIST [9]); (ii) a bag of visual words (BoVW) to face detection data and geographic location of the im- descriptor, based on sparse (Harris-Laplace detector) SIFT, ages. We also evaluated several different visual features and with 512 visual words (randomly selected), soft assignment text similarity measures. For reranking the original retrieval (σ = 150), and max pooling or using Word Spatial Arrange- list, we exploited textual, visual, geographic, and credibil- ment (WSA) [11] for encoding the spatial arrangement of ity information. As an additional step, the reranked lists visual words; and (iii) fifteen features available in the Lire were also aggregated with a Genetic Programming (GP) [7] package [8].2 approach. Our proposal follows the general work-flow pre- For text-only and multimodal runs, we used the Cosine, sented in Figure 1. Each of these steps is described next. BM25, Dice, Jaccard, and TF-IDF measures which were computed using the provided TF, DF, and TF-IDF vectors. 2.1 Filtering For irrelevant images removal we used three filtering strate- 2.3 Reranking and Aggregation gies: geographic-based, face-based, and blur-based. The Ge- For improving the original list ranking, we explored visual, oFilter eliminated all images located farther than a given textual, credibility, and geographic ranking. For text-only range in relation to the reference lat/long. Following the and multimodal reranking, the text-based scores were com- result on the development set, the filtering radius was set to puted as the similarity between the text vectors associated 10 km. In turn, the face-based procedure was introduced to 1 filter out images containing people as the main subject. We http://www.faceplusplus.com (as of June/2015). 2 CEDD, FCTH, OpponentHistogram, JointHistogram, Au- toColorCorrelogram, ColorLayout, EdgeHistogram, Ga- bor, JCD, JpegCoefficientHistogram, ScalableColor, Simple- Copyright is held by the author/owner(s). ColorHistogram, Tamura, LuminanceLayout, and PHOG. MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany Available at: http://www.lire-project.net/ Table 1: Runs Configurations (*only for one-topic Table 2: DevSet and TestSet Results queries) DevSet TestSet Run Filtering Reranking Diversity Run P@20 CR@20 F1@20 P@20 CR@20 F1@20 1 Geo*, face, blur Visual* BIC 1 0.7487 0.4336 0.5409 0.7129 0.4111 0.5063 2 Geo*, face, blur Textual Cosine + Jaccard 2 0.8013 0.4514 0.5694 0.6996 0.4248 0.5101 3 Geo*, face, blur Visual*, textual Jaccard 3 0.7837 0.4436 0.5592 0.7058 0.3881 0.4883 4 Geo*, face, blur Credibility Users 4 0.7644 0.4446 0.5532 0.7198 0.4309 0.5219 5 Geo*, face*, blur Visual*, textual, Jaccard 5 0.8190 0.4637 0.5853 0.7324 0.4123 0.5084 credibility, geo* with the images and the localities’ text vectors. The visual Table 3: TestSet Results: One-topic and Multi-topic method reranked the original list according to the similarity in relation to the location’s representative Wikipedia images. The visual distance from each image to the representative One-topic Multi-topic set was computed as the minimum distance to each repre- Run P@20 CR@20 F1@20 P@20 CR@20 F1@20 sentative image. All credibility scores were individually used 1 0.6906 0.4000 0.4991 0.7350 0.4221 0.5133 2 0.7130 0.4316 0.5205 0.6864 0.4181 0.4998 for reranking. Additionally, lat/long data were used to rank 3 0.6942 0.3982 0.4970 0.7171 0.3782 0.4798 images according to the Haversine distance to the reference 4 0.7630 0.4301 0.5390 0.6771 0.4318 0.5051 point. 5 0.7290 0.4286 0.5228 0.7357 0.3963 0.4942 For feature fusion, the reranked lists were combined using the GP approach from [15] which uses several rank aggre- gation methods. This method was trained using the devel- opment data and combined order-based (MRA [4], RRF [3], In all runs, the geographic filtering and reranking were and BordaCount [16]) and score-based (CombMIN, Comb- only applied for one-topic queries since multi-topic queries MAX, CombSUM, ComMED, CombANZ [12], and RL- do not have reference geo-location. Additionally, in run 5, Sim [10]) rank fusion methods. the face-based filters were also only applied to one-topic As a relevance-based filtering, from the final aggregated queries since multi-topic queries have a different relevance list, up to 150-top ranked images were selected as the input constraint in relation to people in the foreground. Finally, list for the summarization procedure. in runs 1, 3, and 5 no visual reranking was applied for multi- topic queries. 2.4 Diversification Method After filtering, reranking, and aggregation steps, the im- 4. RESULTS AND DISCUSSION proved relevance-based lists were submitted to explicit diver- Table 2 presents the effectiveness results for the five runs sification. We evaluated four methods: clustering-based (k- for the development and test set. The best results (F1@20) Medoids and agglomerative) and reranking-based (MMR [2], on the development set were achieved on run 5, followed MSD [5]). In all cases the k-Medoids method achieved sig- by runs 2 and 3, in which textual information was used. nificantly superior results on the development set and was However, these were the runs with the greatest effectiveness used in the submitted runs. difference when comparing development and test queries, In the clustering step, the initial medoids were selected in specially considering the multi-topic queries. an offset fashion for equally sampling from the top to the Table 3 presents the effectiveness results for one-topic and bottom of the ranked list. The medoids updating procedure multi-topic test queries. As we can observe, even with no just selected the best connected image of the cluster, using visual reranking, the visual-only run allowed slightly supe- the average distance to all other images in the cluster. The rior results for multi-topic queries considering all the run process iterates until no intercluster image transition occurs types and also comparing to one-topic queries. All other run or up to 50 iterations. types achieved superior effectiveness on one-topic queries, For runs 1, 2, 3, and 5, the clusters were ranked according specially when the credibility information was used (runs 4 to their sizes in descending order and intra-cluster sorting and 5). Our results suggests that visual features are impor- was applied using average connectivity. For the credibility- tant when considering multi-topic queries while the textual based submission (run 4), the images were clustered accord- information seems more suitable for one-topic queries. ing to their owner (user) and the clusters were ranked ac- cording to the users’ credibility computed as a linear com- bination of tagSpecificity, uploadFrequency, meanTagRank, 5. CONCLUSIONS faceProportion, meanImageTagClarity, photoCount, visual- For relevance and diversity maximization we proposed fil- Score, and locationSimilarity [6]. tering strategies and the combination of multiples features The representative images were selected in a round robin with a rank fusion method. These improved ranked lists fashion from the final clusters. The number of clusters was were used as input for a clustering-based summarization defined as 30 for runs 1 and 3, and 40 for runs 2 and 5. method. Our experiments suggest that different summa- rization alternatives may result in different effectiveness for one-topic and multi-topic queries. 3. RUNS SETUP We submitted five runs (Table 1). The features used for diversification were selected according to the best results on 6. ACKNOWLEDGMENTS the development set. We thank the support of UEFS/PROBIC and FAPESP. 7. REFERENCES detection for digital images using wavelet transform. [1] R. T. Calumby, R. da S. Torres, and M. A. Gonçalves. In Proceedings of the IEEE International Conference Diversity-driven learning for multimodal image on Multimedia and Expo, pages 17–20 Vol.1, 2004. retrieval with relevance feedback. In Proceedings of the [15] J. A. Vargas, R. da S. Torres, and M. A. Gonçalves. A 21st IEEE International Conference on Image soft computing approach for learning to aggregate Processing, pages 2197–2201, 2014. rankings. In Proceedings of the 24th ACM [2] J. Carbonell and J. Goldstein. The use of mmr, International Conference on Conference on diversity-based reranking for reordering documents Information and Knowledge Management, 2015. (in and producing summaries. In Proceedings of the 21st press). Anual International ACM SIGIR Conference on [16] H. P. Young. An axiomatization of borda’s rule. Research and Development in Information Retrieval, Journal of Economic Theory, 9(1):43–52, 1974. pages 335–336, 1998. [3] G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 758–759, 2009. [4] R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 301–312, 2003. [5] S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, pages 381–390, 2009. [6] B. Ionescu, A. Gı̂nscă, B. Boteanu, A. Popescu, M. Lupu, and H. Müller. Retrieving diverse social images at mediaeval 2015: Challenge, dataset and evaluation. In Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Setember 14-15 2015. CEUR-WS.org. [7] J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. [8] M. Lux and S. A. Chatzichristofis. LIRE: lucene image retrieval: an extensible java CBIR library. In Proceedings of the 16th ACM International Conference on Multimedia, pages 1085–1088, 2008. [9] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. 42(3):145–175, 2001. [10] D. C. G. Pedronette and R. da S. Torres. Image re-ranking and rank aggregation based on similarity of ranked lists. In Proceedings of the 14th International Conference on Computer Analysis of Images and Patterns - Volume Part I, pages 369–376, 2011. [11] O. A. B. Penatti, F. B. Silva, E. Valle, V. Gouet-Brunet, and R. da S. Torres. Visual word spatial arrangement for image retrieval and classification. Pattern Recognition, 47(2):705–720, 2014. [12] J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox. Combination of multiple searches. In The Second Text REtrieval Conference (TREC-2), pages 243–252, 1994. [13] R. Stehling, M. Nascimento, and A. Falcão. A compact and efficient image retrieval approach based on border/interior pixel classification. In Proceedings of the 11th International Conference on Information and Knowledge Management, pages 102–109, 2002. [14] H. Tong, M. Li, H. Zhang, and C. Zhang. Blur