Recod @ MediaEval 2016: Diverse Social Images Retrieval Cristiano D. Ferreira1 , Rodrigo T. Calumby2 , Iago B. A. do C. Araujo2 Ícaro C. Dourado1 , Javier A. V. Munoz1 , Otávio A. B. Penatti1,3 , Lin T. Li1 Jurandy Almeida1,4 and Ricardo da S. Torres1 1 RECOD Lab, University of Campinas, Campinas, SP – Brazil, 2 University of Feira de Santana, Feira de Santana, BA – Brazil 3 SAMSUNG Research Institute Brazil, Campinas, SP – Brazil 4 GIBIS Lab, Federal University of São Paulo, São José dos Campos, SP – Brazil [rtcalumby,ibacaraujo]@ecomp.uefs.br, [crferreira,icarocd,jalvarm.acm]@gmail.com, o.penatti@samsung.com, jurandy.almeida@unifesp.br, [lintzyli,rtorres]@ic.unicamp.br ABSTRACT This paper presents the RECOD team experience in the Re- trieving Diverse Social Images Task at MediaEval 2016. The teams were required to develop a diversification approach for social photo retrieval. Our proposal is based on re-ranking, rank aggregation, and diversity promotion, allowing employ- ment of textual and visual information apart or fused. 1. INTRODUCTION The relevance-diversity trade-off is an important problem associated with several search scenarios. Promoting diver- sity in retrieval results has been shown to positively impact Figure 1: Overview of the proposed approach. the user search experience specially for ambiguous, under- specified, and visual summarization queries [2]. The Retrieving Diverse Social Images Task 2016 [7] task in the Lire package [10].1 addresses the problem of image search result diversification For text-only and multimodal runs, we used the cosine [1], in the context of social media. This paper describes the RE- BM25 [1], Dice [9], Jaccard [9], and TF-IDF measures which COD group contributions via diversity promotion boosted were computed using the provided TF, DF, and TF-IDF by rank fusion. vectors. 2.2 Re-ranking and Aggregation 2. PROPOSED APPROACH For improving the original list ranking, several textual Our proposal follows the general workflow presented in measures (Section 2.1) were employed for re-ranking. The Figure 1. The first step, Re-ranking, ranks the original list text-based scores were computed as the similarity between provided by Flickr according to a text-based descriptor. The the text vectors associated with the query topic and the Fusion step employs Genetic Programming (GP) [8] to ag- image associated text vectors. For visual only run, the re- gregate lists re-ranked by several text-based descriptors. Fi- ranking step was skipped. nally, the Diversification step exploits visual and textual- For feature fusion, re-ranked lists were combined using based descriptors to promote diversification at the resulting the GP approach from [16], which uses several rank ag- ranking. gregation methods. This method took as input the Flickr The next sections provide a more detailed description of query result re-ranked considering each textual similarity our approach. measures. Then, it was trained using the development data and combined by order-based (MRA [5], RRF [4], and 2.1 Visual Features and Text Similarity BordaCount [17]) and score-based (CombMIN, CombMAX, For visual similarity, besides the provided features, we CombSUM, ComMED, CombANZ [14], and RLSim [12]) also extracted: (i) two general-purpose global descriptors rank fusion methods. (BIC [15] and GIST [11]); (ii) a bag-of-visual-words (BoVW) descriptor, based on sparse (Harris-Laplace detector) SIFT, 2.3 Diversification Method with 1000 visual words (randomly selected), soft assignment After re-ranking and aggregation steps, the improved (σ = 150), and max pooling with spatial pyramids or Word relevance-based lists were submitted to explicit diversifica- Spatial Arrangement (WSA) [13] for encoding the spatial ar- 1 rangement of visual words; and (iii) fifteen features available CEDD, FCTH, OpponentHistogram, JointHistogram, Au- toColorCorrelogram, ColorLayout, EdgeHistogram, Ga- bor, JCD, JpegCoefficientHistogram, ScalableColor, Simple- Copyright is held by the author/owner(s). ColorHistogram, Tamura, LuminanceLayout, and PHOG. MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, Available at: http://www.lire-project.net/ (As of Sep. Netherlands. 2016). tion. Visual and textual descriptors were employed (Sec- CombMAX( MRA( BordaCount( CombMNZ( tion 2.1). We evaluated five methods: clustering-based CombMAX( CombMAX( (k-Medoids, agglomerative and Birch [18]) and re-ranking- CombMNZ(COSINE, BM25), CombSUM(BM25, COSINE), based (MMR [3] and MSD [6]). COSINE_ME RRF(JACCARD, COSINE)), Agglomerative and Birch methods achieved significantly ), BM25), superior results on the development set, thus they were used CombMAX( MRA(TFIDF, COSINE_ME)) CombMNZ(DICE, DICE), in the submitted runs. COSINE In the clustering step, for agglomerative method, centroid )), and average link linkage methods were employed using Col- CombMIN( orLayout for distance computing. Forty clusters were con- RRF( CombMNZ(DICE, TFIDF), sidered in our approach. For Birch method, a maximum of COSINE_ME 51 entries per node was admitted, with a distance threshold ), of 0.3, and also considering cluster refining. The representa- CombMAX( tive images were selected in a round robin fashion from the MRA(TFIDF, BM25Orig), final clusters. COSINE ))) 2.4 Workflow Discussions (a) (b) It is important to observe that re-ranking and GP fusion are optional steps at the workflow presented in Figure 1. Figure 2: GP individuals for rank aggregation. Depending on run requirements or on the desired experiment goals, one or both can be skipped. For example, for run 1, Table 1: DevSet and TestSet Results which is visual only, neither of those steps were employed. DevSet TestSet Furthermore, the diversification step can employ textual Run P@20 CR@20 F1@20 P@20 CR@20 F1@20 or visual information alone or together. It allows adherence 1 0.6821 0.4641 0.5359 0.5180 0.4001 0.4258 2 0.6814 0.4643 0.5403 0.5000 0.3709 0.4045 to the single modality requirement of runs 1 and 2. Section 3 3 0.6714 0.4519 0.5268 0.4969 0.3881 0.4107 will present details of each run configuration. 4 0.6614 0.4578 0.5248 0.5156 0.4173 0.4339 5 0.6550 0.4525 0.5196 0.5156 0.4065 0.4379 3. RUNS SETUP We submitted five runs. Run 1 – (required) visual information only. No re-ranking the worst results on this set. However, on the test set, they and no GP-fusion were employed. Diversification provided presented the best results. by Agglomerative method, using average link method with As we can observe, in most cases, the re-ranking over the ColorLayout as visual feature and grouping images into 40 Flickr initial ranking improved the overall results, even when clusters. employing only textual features for this task. Run 2 – (required) text information only. Re-ranking con- Considering test set results of runs 4 and 5 over 1 and 2, we sidering cosine similarity was employed. Diversification pro- can also notice that the retrieval process illustrated in Fig- vided by Birch method, using 51 as maximum entries per ure 1 can benefit from fusing visual and textual information. node with distance threshold of 0.3. We believe that textual and visual information have a com- Run 3 – (required) text-visual fused. Re-ranking consider- plementary nature for an effective image retrieval process. ing cosine similarity was employed. Diversification provided Textual information introduces the notion of context around by Birch method, using 51 as maximum entries per node retrieval, but ignores the image itself by not inspecting its with distance threshold of 0.19; content. On the other hand, content-based image retrieval Run 4 – (optional) general run. GP-fusion rank aggre- lacks context. By initial retrieval and re-ranking based on gation employed (Figure 2 - a). Diversification provided textual information and introducing visual features at the by Agglomerative method, using average link method with diversification step this complementary nature is explored ColorLayout as visual feature and grouping images into 40 and provided the best results. clusters; Run 5 – (optional) general run. GP-fusion rank aggrega- 5. CONCLUSIONS tion employed (Figure 2 - b). Diversification provided by Agglomerative method, using centroid linkage method with For relevance and diversity maximization, we proposed ColorLayout as visual feature and grouping images into 40 re-ranking strategies and the combination of multiple fea- clusters. tures with a rank fusion method. These improved ranked The diversification methods, parameters, features, and lists were used as input for a clustering-based summariza- textual similarities used were selected according to the best tion method. Our experiments suggest that aggregation of results on the development set. multiple re-ranked lists and fusion of visual and textual in- formation can improve retrieval effectiveness. It is interest- ing to observe the recurrent selection of the cosine measure 4. RESULTS AND DISCUSSION on both GP aggregation individuals. Table 1 presents the results for the five runs for the de- velopment and test sets. The best results (F1@20) on the Acknowledgments development set were achieved on run 2, followed by runs 1 and 3, in which textual information was used. Runs 4 and We thank the support of CAPES, CNPq, and FAPESP. 5, which employed GP-fusion rank aggregation, have shown 6. REFERENCES REtrieval Conference (TREC-2), pages 243–252, 1994. [1] R. Baeza-Yates and B. Ribeiro-Neto. Modern [15] R. Stehling, M. Nascimento, and A. Falcão. A Information Retrieval - the concepts and technology compact and efficient image retrieval approach based behind search, Second edition. Pearson Education Ltd., on border/interior pixel classification. In Proceedings Harlow, England, 2011. of the 11th International Conference on Information [2] R. T. Calumby, R. da S. Torres, and M. A. Gonçalves. and Knowledge Management, pages 102–109, 2002. Diversity-driven learning for multimodal image [16] J. A. Vargas, R. da S. Torres, and M. A. Gonçalves. A retrieval with relevance feedback. In Proceedings of the soft computing approach for learning to aggregate 21st IEEE International Conference on Image rankings. In Proceedings of the 24th ACM Processing, pages 2197–2201, 2014. International on Conference on Information and [3] J. Carbonell and J. Goldstein. The use of mmr, Knowledge Management, CIKM ’15, pages 83–92, New diversity-based reranking for reordering documents York, NY, USA, 2015. ACM. and producing summaries. In Proceedings of the 21st [17] H. P. Young. An axiomatization of borda’s rule. Anual International ACM SIGIR Conference on Journal of Economic Theory, 9(1):43–52, 1974. Research and Development in Information Retrieval, [18] T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An pages 335–336, 1998. efficient data clustering method for very large [4] G. V. Cormack, C. L. A. Clarke, and S. Buettcher. databases. In Proceedings of the 1996 ACM SIGMOD Reciprocal rank fusion outperforms condorcet and International Conference on Management of Data, individual rank learning methods. In Proceedings of SIGMOD ’96, pages 103–114, New York, NY, USA, the 32nd International ACM SIGIR Conference on 1996. ACM. Research and Development in Information Retrieval, pages 758–759, 2009. [5] R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 301–312, 2003. [6] S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, pages 381–390, 2009. [7] B. Ionescu, A. L. Gı̂nscă, M. Zaharieva, B. Boteanu, M. Lupu, and H. Müller. Retrieving diverse social images at mediaeval 2016: Challenge, dataset and evaluation. In Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, Oct. 20-21 2016. [8] J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. [9] J. Lewis, S. Ossowski, J. Hicks, M. Errami, and H. R. Garner. Text similarity: an alternative way to search medline. Bioinformatics, 22(18):2298–2304, 2006. [10] M. Lux and S. A. Chatzichristofis. LIRE: lucene image retrieval: an extensible java CBIR library. In Proceedings of the 16th ACM International Conference on Multimedia, pages 1085–1088, 2008. [11] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175, 2001. [12] D. C. G. Pedronette and R. da S. Torres. Image re-ranking and rank aggregation based on similarity of ranked lists. In Proceedings of the 14th International Conference on Computer Analysis of Images and Patterns - Volume Part I, pages 369–376, 2011. [13] O. A. B. Penatti, F. B. Silva, E. Valle, V. Gouet-Brunet, and R. da S. Torres. Visual word spatial arrangement for image retrieval and classification. Pattern Recognition, 47(2):705–720, 2014. [14] J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox. Combination of multiple searches. In The Second Text