TIA-INAOE’s approach for the 2013 Retrieving Diverse Social Images task∗ Hugo Jair Escalante, Alicia Morales-Reyes Instituto Nacional de Astrofísica, Óptica y Electrónica Luis Enrique Erro 1, 72840, Puebla, Mexico {hugojair, a.morales}@inaoep.mx ABSTRACT to develop a method able to re-rank a list of images, returned This paper describes the approach adopted by the TIA- by an image retrieval system, in such a way that images in INAOE team for the 2013 Retrieving Diverse Social Images the first ranking positions are visually diverse to each other. task of MediaEval. The challenge consists in re-ranking a In addition, the images in the list must be ranked in de- list of images returned by a retrieval system in such a way scending relevance order according to the query. Hence, the that visual diversity among images in the first positions is proposed approach targets diversification as an optimization maximized [5]. A database of partially-annotated images challenge of two objectives: relevance and diversity [3]. of tourist destinations is provided. The problem is tackled as an optimization one with the aim of finding a new im- 2. RELATED WORK age ranking that shows improved diversity according to the Several approaches for result diversification have been pro- criterion defined by MediaEval. The proposal is to apply posed: Arni et al. reported results obtained by several a multi-objective evolutionary algorithm to simultaneously strategies evaluated in the ImageCLEF2008 photographic maximize image diversity in consecutive positions of the list retrieval task which focused on diversification [1]. Cluster- and minimize divergence from the original list. Multi-modal ing, topic modeling and margin-maximization approaches information can be incorporated by the proposed approach. were proposed to re-rank images. Whereas these methods Results obtained in the MediaEval forum are reported and proved to be effective, they still attempt to optimize a sin- analyzed. gle criterion. Deselaers et al. sustained that diversifica- tion involves two conflicting objectives: relevance and diver- Keywords sity [3]. Their proposal is an effective dynamic programming Evolutionary algorithms; Multi-objective optimization; Re- approach to optimize an objective function that combines trieval result diversification; Multi-modal image retrieval relevance and diversity estimates into a single term. This pa- per proposes a multi-objective evolutionary algorithm that explicitly attempts to optimize relevance and diversity ob- 1. INTRODUCTION jectives. Retrieval results diversification has been a very active research-topic and recently has had an increasing popular- ity in information retrieval. In particular, diversification is 3. PROPOSED METHOD essential when searching very large image collections. For Considering that a list of N images (L = ⟨I1 , . . . , IN ⟩) instance, users do not want to see the same or very similar relevant to a particular query (Q) has an associated ranking views/images (e.g., the Eiffel tower) regarding a particular score: S 0 = ⟨s1 , . . . , sN ⟩, where si ≥ sj , ∀i, j : i ̸= j and i < query/topic (e.g., Paris), even when those images are rele- j (if the scoring value is unknown, an estimation is calculated vant to the query. Instead, it is desirable that multiple views by si = 1i ). The goal is to find the ranking score S ∗ = associated to the topic are shown in the first results (e.g., Eif- ⟨s∗1 , . . . , s∗N ⟩ such that the ranking induced by S ∗ maximizes fel Tower, Arch of Triumph, Notre Dame, etc.). Therefore, the objectives of Equations (1) and (2): maximizing relevancy should not be a unique objective for image retrieval systems, but a trade-off between relevance 6 ∑ and diversity must be aimed. ρ(S 0 , S) = 1 − dri (S 0 , S)2 (1) n(n2 − 1) i In this note, the solution to the 2013 Retrieving Diverse Social Images MediaEval Task proposed by the TIA-INAOE1 where dri (S 0 , S) is the difference in rankings at position i research group is outlined. A detailed description of the con- induced by scores lists S 0 and S. Spearman’s correlation co- sidered scenario is provided in [5]. In a nutshell, the goal is efficient (ρ) measures discrepancy between S and the initial ∗This work was partially supported by the LACCIR pro- scores in S 0 . Implicitly, it is assumed that the initial list is gramm under project id R1212LAC006. a good one in terms of relevance. 1 http://ccc.inaoep.mx/∼tia/ Equation (2) is defined to evaluate visual similarity among images ranked by S in consecutive positions: ∑ N Copyright is held by the author/owner(s). β(S) = min(dd (Ii , I1,...,i−1 )) (2) MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain i=2 Table 1: Summary of submitted runs. Table 2: Official results obtained by MORD. ID Description Expert evaluation 1 MORD - HOG / Euclidean distance for dd C@10 C@20 P@10 P@20 F@10 F@20 2 MORD - Tags + descriptions / cosine dissim. for dr 1 0.3808 0.5699 0.7146 0.7143 0.4795 0.6125 3 MORD - 3 objectives: ρ, visual dvd and textual dtd div. 2 0.3885 0.5732 0.7091 0.7136 0.4801 0.6102 3 0.3823 0.5690 0.6977 0.7076 0.4728 0.6067 Crowd-sourcing evaluation where β is the diversity term, dd (Ii , I1,...,i−1 ) measures the C@10 C@20 P@10 P@20 F@10 F@20 visual distance between image in the ith rank position and 1 0.7194 0.8499 0.6755 0.6745 0.6640 0.7245 the rest of the images appearing in previous positions. Since 2 0.7503 0.8625 0.6755 0.6898 0.6790 0.7392 dd (Ii , I1,...,i−1 ) is not associated to a particular feature rep- 3 0.7479 0.8675 0.6714 0.6918 0.6769 0.7464 resentation, it can be estimated by using any visual feature provided for the task [5]. Moreover, dd can be estimated by using textual information or meta-data associated to the tained when using only textual data to estimate diversity images. Calculating β in this way to represent diversity among images (run 2). This is a somewhat unexpected re- modifies the diversity term defined in [3]. sult because the aim is to maximize visual diversity. Re- Aiming to find the score list S ∗ that offers the best trade- garding the crowd-sourcing evaluation, a similar pattern is off between ρ and β is the main objective of this study. This observed, although the results are much higher than in the problem can be tackled by a multi-objective evolutionary expert evaluation. It is difficult to determine how good these optimization technique maximizing simultaneously ρ and β. results are on test data since other systems results have not It was decided to use the NSGA-II algorithm to target this been revealed yet. However, from results reported in this goal [2]. NSGA-II is one of the most used multi-objective working note, it is possible to observe that the achieved per- evolutionary algorithms. Standard operators for selection, formance on test data is similar regardless of the modality crossover and mutation were adopted. The NSGA-II algo- for the proposed method. Nevertheless, it is worth mention- rithm returns as output the set of non-dominated solutions ing that during the development phase, performance of the (i.e., a set of re-ranked lists that optimize ρ and β), an es- proposed method was evaluated and seemed competitive. timate of the Pareto front for the problem at hand. Theo- For instance, the multi-modal run using development data retically, none of these solutions is better or worse than the (keywords only) obtained a C@R10 of 0.5043 ± 0.0111 (10 others, therefore all of them are valid solutions. However, runs of MORD) compared to 0.4635, which is the C@R10 for our problem, a single solution has to be selected, thus obtained when using the base system (using the list induced a strategy for selecting a single solution from the set of so- by S 0 ), this is a relative difference of ≈ 8.9%. lutions is also proposed. Specifically, values normalization of the involved objectives is carried out across the returned solutions and the sum of normalized objectives is ranked. 5. DISCUSSION AND FUTURE WORK In this way, the solution at the first position offers a good In this study, visual results diversification has been tackled trade-off between relevance and diversity. as a multi-objective optimization problem while considering relevance and diversity as two independent but complemen- 4. EXPERIMENTAL RESULTS tary objectives. To the best of our knowledge, this is the first approach to explicitly optimize both objectives by using a Three runs of the proposed Multi-Objective Result Diver- multi-objective evolutionary optimization technique. sification (MORD) method were submitted to be considered for evaluation, these are summarized in Table 1. In a vi- sual run, HOG features are used to estimate dd , this choice 6. REFERENCES is justified by preliminary experimentation in the develop- [1] T. Arni, P. Clough, M. Sanderson, and M. Grubinger. ment data set. Also, a textual run was submitted in which Overview of the imageclefphoto 2008 photographic the term dd was estimated by the cosine dissimilarity be- retrieval task. In CLEF 2008, volume 5706 of LNCS, tween the bag-of-words (BoW) representation obtained by pages 500–511. Springer, 2009. tags and the textual images descriptions. For the textual [2] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A information character 3-grams were used instead of words fast and elitist multiobjective genetic algorithm: for building the BoW. This type of representation is partic- NSGA-II. IEEE Transactions on Evolutionary ularly helpful for texts dealing with informal language, be- Computation, 6(2):182–197, 2002. cause writing style patterns can be discovered (see e.g., [4]). [3] T. Deselaers, T. Gass, P. Dreuw, and H. Ney. Jointly Finally, a multi-modal run was also submitted where three optimising relevance and diversity in image retrieval. In objectives are simultaneously maximized: ρ and diversity Proc. of ACM Conference on Image and Video terms dvd and dtd , considered for visual and textual runs re- Retrieval, pages 1–8, 2009. spectively. [4] H. J. Escalante, T. Solorio, and M. Montes y Gomez. The average (test-set) performance for a number of mea- Local histograms of character n-grams for authorship sures and for the three runs are reported in Table 2. We attribution. In Proc. of ACL, pages 288–298, 2011. show results using expert and crowd-sourcing ground-truth [5] B. Ionescu, M. Menéndez, H. Müller, and A. Popescu. (a 50 images sample was evaluated and the average over Retrieving diverse social images at mediaeval 2013: three subjects is reported). Regarding expert’s evaluation, Objectives, dataset and evaluation. In MediaEval 2013 it can be seen that performance difference among three runs Workshop, Barcelona, Spain, October 18-19 2013. is roughly the same. Slightly better performance was ob-