=Paper=
{{Paper
|id=Vol-1263/paper38
|storemode=property
|title=Recod @ MediaEval 2014: Diverse Social Images Retrieval
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_38.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/CalumbySCPLCT14
}}
==Recod @ MediaEval 2014: Diverse Social Images Retrieval==
Recod @ MediaEval 2014: Diverse Social Images Retrieval Rodrigo T. Calumby1,2 , Vinícius P. Santana2 , Felipe S. Cordeiro2 , Otávio A. B. Penatti3 , Lin T. Li1 , Giovani Chiachia1 , and Ricardo da S. Torres1 1 RECOD Lab, Institute of Computing, University of Campinas (UNICAMP), Campinas, SP – Brazil 2 Department of Exact Sciences, University of Feira de Santana (UEFS), Feira de Santana, BA – Brazil 3 SAMSUNG Research Institute Brazil, Campinas, SP – Brazil [rtcalumby, vpsantana, fscordeiro]@ecomp.uefs.br, o.penatti@samsung.com, [lintzyli, chiachia, rtorres]@ic.unicamp.br ABSTRACT This paper presents the results of the first participation of our multi-institutional team in the Retrieving Diverse So- cial Images Task at MediaEval 2014. In this task we were required to develop a summarization and diversification ap- proach for social photo retrieval. Our approach is based on irrelevant image filtering, image re-ranking, and diversity promotion by clustering. We have used visual and textual features, including image metadata and user credibility in- formation. Figure 1: Proposed approach overview. 1. INTRODUCTION Promoting diversity is an effective approach for improving retrieval results and user search experience. For instance, it tagged images were assessed. According to the results on has been applied for tackling ambiguous or underspecified the development set, a 10 km range limit from the reference queries, or producing summaries. The Retrieving Diverse point was a good choice. Social Images Task [2] combines such problems into a chal- Since images containing a person or crowds in the fore- lenge on visual summarization for social photo retrieval in a ground are considered non-relevant, we used a face detec- tourism related context. This paper presents our first efforts tion module of Face++1 for filtering. For all images, we on relevance improvement and diversity promotion using im- computed the features: a) number of faces; b) biggest face age visual features, metadata and user credibility informa- size; c) smallest face size; d) average face size; e) total face tion. size. The size values were computed as a fraction of the image spatial domain. Our first face-based filtering approach (NumFacesFilter) 2. PROPOSED APPROACH eliminates all images with a number of faces superior to a The proposed approach follows the general pipeline pre- threshold. According to the experiments on the development sented in Figure 1. At first, two filtering steps are conducted set, we eliminated all images with more than one face. The in order to reduce the amount of irrelevant images. After- second approach (FaceClassifierFilter) used a 1-NN classifier wards, re-ranking steps are applied for improving image rank based on the described features and considering all develop- positions according to two different relevance aspects. Fi- ment images as training instances. All images classified as nally, clustering is performed and followed by representative non-relevant were eliminated. and diverse images selection. Specific combinations of the proposed steps were set for each submitted run (Section 3). 2.2 Features For the textual and multimodal approaches, we evaluated 2.1 Filtering the TF-IDF, BM25, and Cosine measures. All of them were In order to reduce the number of non-relevant images we computed using the provided TF, DF, and TF-IDF. In the adapted two filtering strategies: Geographic filtering and development set, the best results were achieved using the Co- Face filtering. Eliminating non-relevant images allows higher sine measure. To enable the combination with other distance effectiveness in terms of final relevance and boosts the di- measures, the Cosine similarity values were converted by versification procedure. This is a consequence of fewer non- subtracting it from 1.0. For visual approaches, besides the relevant items as candidates for the final diversified list. provided features, we also extracted two global descriptors The geographic filtering (GeoFilter) takes the reference (BIC and LAS) [3] and two bag-of-visual-words (BoVW) de- lat/long of each location and then eliminates all images lo- scriptors, based on dense (6 pixels) or sparse (Harris-Laplace cated farther than a given range. In this case, only geo- detector) SIFT, with 1000 visual words (randomly selected), soft assignment (σ = 150), and max pooling. 1 Copyright is held by the author/owner(s). http://www.faceplusplus.com - Last accessed on Sept 20, MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain 2014. 2.3 Re-ranking Table 1: Runs Configurations Since original lists may present redundant and non-relevant Run Filtering Re-ranking Diversification items, their positions may not be optimum for their rele- 1 GeoFilter and - kMedoids sparse vance. Even after the filtering procedures, some non-relevant NumFacesFilter (BoV Wmax + HOG) 2 GeoFilter and - kMedoids (Cosine) images may remain and therefore we proposed two re-ranking NumFacesFilter strategies: visual-based and user credibility-based. 3 GeoFilter and Visual re-ranking kMedoids sparse The visual re-ranking used location’s representative im- NumFacesFilter (CM3x3 + HOG + BIC) (BoV Wmax + HOG + Cosine) ages obtained from Wikipedia as queries. The original lists 4 GeoFilter and Visual re-ranking kMedoids (CN3x3) were re-ranked according to the similarity in relation to the NumFacesFilter (CM3x3 + HOG + BIC) representative sets. The visual distance from each image in and Credibility re-ranking 5 GeoFilter and Visual re-ranking kMedoids (CN3x3) a list to the corresponding representative set was computed FaceClassifierFilter (CM3x3 + HOG + BIC) as the minimum distance value between the image and each and Credibility re-ranking representative image. For multiple feature fusion, we used a smoothed version of the Borda Count algorithm. In our version the vote (relevance score) for the nth image in the Table 2: Runs Effectiveness - Official Measures 1 Run P@20 CR@20 F1@20 rank was computed as √ 4 n+1 . As a different re-ranking strategy, we also exploited the 1 0.7130 0.4030 0.5077 user-credibility descriptors provided with the data. Hence, 2 0.6976 0.4139 0.5133 we combined a relevance-based score (relScore) with an- 3 0.7016 0.4177 0.5168 other score based on credibility (credScore). The relScore 4 0.7598 0.4288 0.5423 of each image was computed according to its position in the 5 0.7407 0.4076 0.5206 list as described for the visual re-ranking. The credScore was computed as the product of three credibility features: visualScore, faceProportion, and tagSpecificity. The final re- ranking score was computed as relScore × credScore; (Runs 4 and 5). Run 2 (purely textual) slightly outper- formed Run 1 (purely visual) in terms of diversity. The mul- 2.4 Diversification Method timodal combination (Run 3) slightly outperformed Runs 1 After the filtering and re-ranking procedures, the next and 2 on CR@20 and F1@20. However when the credibil- step consists of the actual summarization and diversifica- ity re-ranking was applied (Run 4) the best results were tion. We evaluated two diversification methods: MMR [1] achieved by the visual approach with reasonable improve- and a clustering technique based-on kMedoids. Given the ment on all effectiveness measures. Notice that when the superiority of kMedoids over MMR on the development set, face-based filtering used the classifier (Run 5), the results we used the kMedoids clustering for the test set runs. were lower than using the face number threshold (Run 4) The kMedoids clustering technique is divided into two but still superior to Runs 1 to 3 on F1@20. main steps: medoid definition and clusters construction. Since we were supposed to return 50 representative images, 5. CONCLUSIONS the algorithm was set to create 50 clusters. The initial cen- We proposed a multimodal approach with the use of fil- troids were defined in a offset fashion. The offset value was tering and re-ranking approaches in conjunction with a clus- computed by dividing the list size by 50. The centroids tering technique for diversification. Our best results were were then defined as the images in the positions i×offset, achieved with image re-ranking by combining their relevance with 0 ≤ i < 50. Hence, the initial medoids were picked score and user credibility information. As future work we throughout the list from the top to the bottom. After the would like to evaluate the usage of additional information on clusters are constructed, the process iterates untill there is the re-ranking and diversification steps and more elaborated no further transition between the clusters. At each itera- fusion approaches. tion, the new medoids were defined as the best connected images (average distance to all images in the cluster). The distance between two images is computed as the average of 6. ACKNOWLEDGMENTS their distances computed for each feature. Finally the im- We thank the support of UEFS/PROBIC, Samsung Re- ages in each cluster are ranked according to their positions search Institute Brazil, and FAPESP (2013/11359-0). in the original non-clustered list. The final output list is composed of the most relevant item from each cluster. 7. REFERENCES [1] J. Carbonell and J. Goldstein. The use of mmr, 3. RUN SETUP diversity-based reranking for reordering documents and We submitted five runs and their descriptions are pre- producing summaries. In SIGIR, pages 335–336, 1998. sented in Table 1. The features used in each run and each [2] B. Ionescu, A. Popescu, M. Lupu, A. L. Gı̂nscă, and step were selected according to the best results on the de- H. Müller. Retrieving diverse social images at velopment set. mediaeval 2014: Challenge, dataset and evaluation. In MediaEval 2014 Workshop, Barcelona, 2014. 4. RESULTS AND DISCUSSION [3] O. A. B. Penatti, E.Valle, and R. da S. Torres. Comparative study of global color and texture Table 4 presents the official evaluation measures for the descriptors for web image retrieval. J. Vis. Commun. five runs. We can see that best results (for all measures) Image Repr., 23(2):359 – 380, 2012. were achieved when the proposed full pipeline was applied