=Paper=
{{Paper
|id=Vol-1739/MediaEval_2016_paper_17
|storemode=property
|title=UNED-UV @ Retrieving Diverse Social Images Task
|pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf
|volume=Vol-1739
|dblpUrl=https://dblp.org/rec/conf/mediaeval/GonzalezGGC16
}}
==UNED-UV @ Retrieving Diverse Social Images Task==
UNED-UV@2016 Retrieving Diverse Social Images Task A. Castellanos X. Benavent A. García-Serrano NLP&IR Group UNED, Madrid Informatics, U. Valencia NLP&IR Group UNED, Madrid acastellanos@lsi.uned.es xaro.benavent@uv.es agarcia@lsi.uned.es E. de Ves Informatics, U. Valencia esther.deves@uv.es ABSTRACT images [8]. In the next step, we select the highest proba- This paper details the participation of the UNED-UV group bility image at each textual FCA cluster or visual k-means at the MediaEval 2016 Retrieving Diverse Social Images clustering to generate the list of similar and diverse images. Task using a multimodal approach. Several Local Logistic Regression models, which use the visual low-level features, 2.1 Relevancy via relevance feedback estimate the relevance probability for all the images in the We try to get the relevance of each of the images to the dataset. Then, the images are ranked by selecting the high- given query by using a relevance feedback algorithm [7]. The est probability image at each of the textual clusters. These general methodology involves four steps: textual clusters are generated by making use of a textual algorithm based on Formal Concept Analysis (FCA) and 1. Reduction of the data dimensionality. The provided Hierarchical Agglomerative Clustering (HAC) to detect the low-level visual features [8] are used to generate a feature latent topics addressed. The images will be then diversified vector associated to each image that will be generically de- according to detected topics. noted as x in a dimensional space N = 8194. These features are reduced using a Principal Component Analysis (PCA). We retain only the first components that account for 80% 1. INTRODUCTION of the data variability. We have used this idea to reduce the Information retrieval systems have mainly relied on opti- original dimension of our characteristic space in a new char- mizing the result list according to accuracy-based metrics. acteristic vector of dimension M < N , being M = 52. One However, when dealing with image retrieval, systems should of the advantages of this reduction is that the new trans- be able to offer relevant but also diverse results [1]. Then, formed components are in decreasing order with respect to we propose a multimodal approach that works with rele- the variance explained by the corresponding principal com- vance and diversity. It uses a relevance feedback algorithm ponent; developed by the UV group [3, 6, 7]. This method estimates 2. Selecting the relevant and non-relevant sets. The user the similarity probability of all the images from the dataset looks at a few screens, each showing some images and marks using visual low-level features by means of several Local Lo- some of them as being relevant and non-relevant (run4 ). For gistic Regression models (LLR) [10]. It has extensively been the automatic runs (run1, run3 and run5 ), these sets are au- proven that the visual information has a great impact in tomatically selected. Relevant images are the first P images the information retrieval systems [12]. For diversification, of the ranked Flicker list that belong to different textual- we propose an approach to represent an image by applying FCA o visual (k-means) clusters, being P = 5. The goal the concepts covered by the textual information of the im- of selecting relevant images from different clusters is to give ages. This conceptual representation is tackled by means of diverse relevant example images for the model to improve Formal Concept Analysis (FCA) [13], a data organization the diversity in the estimation. We use a K-means cluster- technique. In our participation in previous editions of this ing by using provided visual low-level features in run1 and task, we proved that this approach was able to identify the run5, and a textual FCA cluster by using provided textual different topics addressed in the images, allowing the diver- features [8] in run3. Non-relevant images are selected from sification of the results list according to them [5, 4]. other topics at each query; 3. Parameter estimation of the Local Logistic Regression 2. SYSTEM DESCRIPTION Models [10]. The reduced feature vectors (PCA) and the We present a two-step system that ranks a list of retrieved relevant and non-relevant sets are the inputs or several Local images taking into account the relevance of the retrieved Logistic Regression models whose outputs are the probabil- images to the given query and showing as much diversity ities that an image belongs to the relevant set. The feature as possible. The first step is based on a relevance feedback vector is splitted dynamically in m groups of non-fixed size. algorithm that estimates the relevance of each image to the Each group is used for adjusting the model of higher order, query by using the provided visual low-level features of the given the inputs sets and PCA components; 4. Ranking of the database. Models are evaluated on all the images of the database and return the probabilities of Copyright is held by the author/owner(s). being relevant for each estimated model; as results, we have MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, Netherlands. a probability vector (p) of dimension m for each individual image. We combine these probabilities in just one by using a weighted average. The weights (w) for a given probability Table 1: Description of the runs. Relevance algorithm Ranking are obtained by the amount of variance accounted for the Relevant Non-relevant group of components used to adjust the model. Finally, this images images clusters procedure gives us a score/probability for each image. text visual human other human text visual 2.2 Clusters for diversity run1 X X X The final rank is generated by selecting the highest prob- run2 X ability image from the different clusters trying to give as run3 X X X much diversity as possible to the diverse final list. If there run4 X X X are less than 50 clusters, a second highest probability im- run5 X X X age selection is done. We have generated clusters for the ranking using textual information, from run2 to run5, and k-means clusters by using the visual information of the im- Table 2: Official Metrics for Retrieving Diverse So- ages (run1 ). cial Images Task. Best result, human-based is in bold, and best second result, automatic run, is in 2.2.1 Textual clusters with FCA italics. run P@20 CR@20 F1@20 The presented clustering procedure is based on the discov- ering of the latent topics addressed by the textual informa- run1 0.4180 0.3538 0.3637 tion in the images. To that end, Formal Concept Analysis run2 0.5367 0.4133 0.4425 is proposed to detect these topics and a Hierarchical Ag- run3 0.4305 0.3544 0.3745 glomerative Clustering (HAC) [11] to group similar images run4 0.5734 0.4252 0.4597 together into the detected topics. Each HAC-based cluster run5 0.5602 0.4179 0.4562 contains the image set covering a similar topic. Thereafter, the images of each cluster are ranked according to their di- versity based on their visual features. human-based run. All of them use our two-step system ex- cept run2 that uses only the second step, ranking the images by textual clusters with FCA. Three of them are multimodal FCA-based Modelling. runs (run3, run4 and run5 ) using both textual and visual Formal Concept Analysis (FCA) is a theory of concept information. formation [13] for the organization of content according to Results are presented in table 2. It is interesting to point their related features. The basic formation of FCA is the out that our best result for both precision and diversification formal context, a structure K := (G, M, I), where G is a set is obtained with the multimodal human-based approach: es- of objects, M a set of attributes related to these objects and timating the probability of the relevance of each image to the I a binary relationship between G and M , denoted by gIm: query by a LLR model, and then ranking the final list with the object g has the attribute m. From the formal context, FCA clusters, run4, F @20 = 0.4597. Second best result, a set of formal concepts can be inferred i.e., a formal con- run5, F @20 = 0.4562, is the same run4 approach, but au- cept is a pair (A, B) of images A and the features shared tomatically selecting the relevant and non-relevant images by those images B and organized in a lattice from the most sets for estimating the LLR models. It is also important generic to the most specific one. By applying FCA to the to observe that both automatic multimodal runs, run5 and formal context containing the textual information of the im- run4, overcome the automatic monomodal runs, run1 and ages, they are modelled in terms of formal concepts, which run2 as expected. group together the images sharing a same set of features. In order to select only those most-representative features, we applied Kullback-Leibler Divergence (KLD) [9] on the 4. CONCLUSIONS textual contents related to the images. This KLD-based se- We presented a multimodal approach, that estimates the lection represents each image by the textual contents that relevance of the images by a relevance feedback algorithm better differentiates a image from the other ones. using the visual features for determining the similarity. To handle image diversification, we apply a conceptual-based HAC-based grouping. procedure (based on FCA and HAC) to cluster the images From the FCA formal concepts, a set of diverse image according to the latent topics addressed by their textual con- groups is created by applying a HAC algorithm [11]. Specif- tent. Results show that our multimodal approach works ically, we propose a Single Linking Hierarchical Clustering properly for retrieving similar diverse images. Results also that groups together similar formal concepts and the Zero- show the importance of a proper selection of relevant and Induces index to set the cluster similarity [2]. non-relevant sets for the relevance algorithm. A human knows better the meaning and the diversity of the topics. 2.2.2 Visual clusters Our challenge is to make the approach automatic to be able The clusters are made by a k-means procedure (k = 18) to select these relevant and non-relevant images as a human over the PCA components of the provided visual low-level being. The presented results are encouraging. features [8]. Acknowledgements. This work has been partially sup- 3. RESULTS ported by PhD fellowship (FPI-UNED 2014), VOXPOPULI We submitted five runs (see table 1), four of them are (TIN2013-47090-C3-1-P), MUSACCES (S2015/HUM3494) automatic (run1, run2, run3 and run5 ), and one, run4 a and DPI2013-47279-C2-1-R projects. 5. REFERENCES [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5–14. ACM, 2009. [2] F. Alqadah and R. Bhatnagar. Similarity measures in formal concept analysis. Annals of Mathematics and Artificial Intelligence, 61(3):245–256, 2011. [3] X. Benavent, A. Garcia-Serrano, R. Granados, J. Benavent, and E. de Ves. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a wikipedia image collection. Multimedia, IEEE Transactions on, PP(99):1–1, 2013. [4] A. Castellanos, X. Benavent, A. Garcı́a-Serrano, E. de Ves, and J. Cigarrán. Uned-uv @ retrieving diverse social images task. In MediaEval Multimedia Benchmark Workshop, CEUR-WS.org, 1436, ISSN 1613-0073, 2015. [5] A. Castellanos, J. Cigarrán, and A. Garcı́a-Serrano. Uned @ retrieving diverse social images task. In MediaEval Multimedia Benchmark Workshop, CEUR-WS.org, 1263, ISSN 1613-0073, 2014. [6] E. de Ves, G. Ayala, X. Benavent, J. Domingo, and E. Dura. Modeling user preferences in content-based image retrieval: a novel attempt to bridge the semantic gap. Neurocomputing, 168:829–845, 2015. [7] E. de Ves, X. Benavent, I. Coma, and G. Ayala. A novel dynamic multi-model relevance feedback procedure for content-based image retrieval. Neurocomputing, pages 99–107, October 2016. [8] B. Ionescu, A. L. Gı̂nscǎ, M. Zaharieva, B. Boteanu, M. Lupu, , and H. Müller. Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation. In MediaEval 2016 Workshop, October 20-21, Hilversum, Netherlands, 2016. [9] S. Kullback and R. A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, 1951. [10] C. Loader. Local regression and likelihood. New York: Springer-Verlag, 1999. [11] C. D. Manning, P. Raghavan, and H. Schütze. Hierarchical clustering. pages 377–403. 2008. [12] S. Rudinac, A. Hanjalic, and M. Larson. Generating visual summaries of geographic areas using community-contributed images. IEEE Transactions on Multimedia, 15(4):921–932, 2013. [13] R. Wille. Concept lattices and conceptual knowledge systems. Computers & mathematics with applications, 23(6):493–515, 1992.