=Paper=
{{Paper
|id=Vol-1263/paper9
|storemode=property
|title=UNED @ Retrieving Diverse Social Images Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_9.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/CastellanosGR14
}}
==UNED @ Retrieving Diverse Social Images Task==
UNED @ Retrieving Diverse Social Images Task A.Castellanos A. García-Serrano J.Cigarrán NLP & IR Group, UNED NLP & IR Group, UNED NLP & IR Group, UNED C/ Juan del Rosal,16 C/ Juan del Rosal,16 C/ Juan del Rosal,16 Madrid, Spain Madrid, Spain Madrid, Spain acastellanos@lsi.uned.es agarcia@lsi.uned.es juanci@lsi.uned.es ABSTRACT propose the application of Formal Concept Analysis as a This paper summarizes the participation of UNED at the modelling technique. By means of FCA, the set of images 2014 Retrieving Diverse Social Images Task [3]. We pro- to be diversified will be organized according to their latent pose a novel approach based on Formal Concept Analysis concepts. (FCA) to detect the latent topics related to the images and After the FCA application, a hierarchy organizing the im- a later Hierarchical Agglomerative Clustering (HAC) to put ages in formal concepts according to their shared features together the images according to these latent topics. The will be obtained. However, it still remains the diversifica- diversification will be based on offering images from the dif- tion of the images according to this representation. For that, ferent topics detected. In order to detect these latent topics, we proposed the application of a HAC algorithm to group two kinds of data have been tested: only information related together the formal concepts that could be considered as to the description of the images and all the textual informa- similar (belonging to the same topic). tion related to the images. The results show that our pro- After this grouping, each HAC cluster can be considered posal is suitable for search result diversification, achieving as an image set covering a similar topic. Then, for each similar results to those in the state of the art. cluster the best image in the group (the one with a higher ranking according to the ranking provided by the task, taken from an IR system) will be taken and offered as result. The 1. INTRODUCTION final result list will be a ranked list of images, ordered ac- Diversification, in the sense proposed by this task, refers cording to the provided ranking. As our methodology does to the creation of a diverse result list given an user query. not provide any score associated to the results and it was The rationale is that the users are not only interested in required by the organization, a dummy score has been set accurate results but also in results covering different top- (1 for the 1st result in the ranking, 0.99 for the second, and ics or situations. From the IR-based point of view, a good so on). definition of the problem and a review of the state of the art can be found at [1]. To address this task we propose a novel approach based on an image representation using the 2.1 Formal Concept Analysis concept/s covered by the image. For that, we use the infor- Formal Concept Analysis (FCA) is a mathematical the- mation related to the images to create a conceptual-based ory of concept formation [6, 7] derived from lattice and or- data representation. The creation of this conceptual repre- dered set theories that provide a theoretical model to or- sentation is tackled by means of the application of Formal ganize formal contexts. A formal context is a set structure Concept Analysis, a data organization technique. We expect K := (G, M, I), where G is a set of objects, M a set of this representation will make explicit the latent concepts in attributes and I a binary relationship between G and M , the images as well as the relationships between them. Based (I ⊆ G × M ), denoted by gIm, which is read as: the ob- on this FCA representation, a HAC algorithm is proposed ject g has the attribute m. An example of formal context is for grouping similar images, according to the latent concepts shown in the Figure 1a. From the point of view of a con- detected. Our hypothesis is that the resultant groups can tent representation system, the formal context can be seen represent the different topic addressed in the images. So, as the set of contents (images) to be represented (G) is the the image diversification will be done by taking one image set of items to be represented, the set of textual (or other from the different HAC-based clusters/topics. kind of) features (M ), representing the items, and the bi- nary relationship I can be read as item g has the feature m. 2. WORK PROPOSAL From the information in the formal context, a set of formal Our proposal is based on taking the information related to concepts can be inferred. A formal concept is a pair (A, B) of the images so as to create a conceptual-based representation. objects and attributes, which has the following properties: This representation intends to discover latent concepts in If an object a in A is tagged with an attribute b, then b the data and the relationships between them. For that, we must is included in B (B = AI includes all the attributes shared by the objects). Conversely, if an object a is tagged with all the attributes in B, then a must be included in A Copyright is held by the author/owner(s). (i.e. A = B I : includes all those objects filtered out by the MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain attributes). Finally, the whole sets of formal concepts can be Table 1: Official Metrics for Retrieving Diverse So- cial Images Task Approach P@20 CR@20 F-measure RUN-2: Description Info 0,7581 0,4325 0,5429 RUN-5: All Textual Info 0,7772 0,4343 0,5502 are favoured by the inclusion of more information than only the related to the description. However, this improvement is not enough to definitely conclude that this information can Figure 1: (a) A sample formal context and (b) the represent a valuable indicator for image diversity. concept lattice associated with the formal context 4. CONCLUSIONS In this paper we presented a Conceptual-based modelling organized according to an order relationship, from the most (based on FCA) for improving diversity in the retrieving of generic to the most specific. This order can be proven to social images. With FCA we intend to infer the latent topics be a lattice and, consequently, to be represented as a Hasse addressed in the information related to the images. Once the Diagram, as the one in the Figure 1b. latent concepts has been detected, in order to put together the most similar ones, we have applied an HAC approach. 2.2 FCA-based Modelling Our hypothesis is that this concept modelling could help Following the FCA rationale, the information of each of in the diversification of the retrieved images. For that, given the 123 locations included in the test set is modelled. After a query the system will retrieve images trying to cover all this modelling, a set of formal concepts, grouping together the identified topics. the images sharing a same set of features, will be obtained. In this work we have also experimented with different To select the features to model the images we propose two kinds of information to describe the images. The obtained different kinds of data: 1) use only the description informa- results proved our proposal as suitable to offer accurate and tion of the images (description, title and tags), and 2) use all also diverse results. Related to the kind of information to the textual information related to the images (description, use, the inclusion of the most information possible about the title, tags, user information and date information). images seems to be the best choice. For both, precision and In order to select only those most-representative features, diversity based results, its performance is better, although we applied Kullback-Leibler Divergence (KLD) [4] on the the improvement in the diversity is not very large. textual contents related to the images. Basically, KLD com- pares the textual content related to a given image to the textual contents of the rest of the images. In this way, 5. ACKNOWLEDGMENTS KLD allows representing each image by the textual contents This work has been partially supported by VOXPOPULI that better differentiates a given image from the other ones. (TIN2013-47090-C3-1-P) Spanish project. Hopefully, it will allow the improvement of the diversity of the images selected. 6. REFERENCES [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. 2.3 HAC-based grouping Diversifying search results. In Proceedings of the Second From the FCA-based modelling, a hierarchy organizing ACM International Conference on Web Search and the obtained set of formal concepts is provided. It remains Data Mining, pages 5–14. ACM, 2009. the creation of a set of diverse image groups based on this [2] F. Alqadah and R. Bhatnagar. Similarity measures in organization. For that, we propose the application of a formal concept analysis. Annals of Mathematics and HAC algorithm [5]. Specifically, we propose a Single Link- Artificial Intelligence, 61(3):245–256, 2011. ing based hierarchical clustering that groups together similar [3] B. Ionescu, A. Popescu, M. Lupu, A. L. Gı̂nsca, and formal concepts. In order to set the ”similarity” of two for- H. Müller. Retrieving diverse social images at mal concepts, we have applied the Zero-Induces index [2]. mediaeval 2014: Challenge, dataset and evaluation. In This index measures the similarity of two formal concepts, Proceedings of MediaEval Benchmarking Initiative for based on the number of features that they share. Multimedia Evaluation., 2014. [4] S. Kullback and R. A. Leibler. On information and 3. RESULTS sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, 1951. Table 1 shows the results obtained by our approaches for the official runs. In general the performance of our ap- [5] C. D. Manning, P. Raghavan, and H. Schütze. proaches are in the same level than other approaches in Hierarchical clustering. pages 377–403. 2008. the state of the art, according the results of last year for [6] R. Wille. Concept lattices and conceptual knowledge the same task. The inclusion of all the textual information systems. Computers & mathematics with applications, related to an image seems to slightly improve the perfor- 23(6):493–515, 1992. mances, both for precision and also for diversity based re- [7] R. Wille. Restructuring Lattice Theory: An Approach sults. It is reasonable to think that including different kinds Based On Hierarchies Of Concepts, volume 5548 of of data will improve the diversity of the results. However, as Lecture Notes in Computer Science. Springer Berlin it can be seen in the results, also the precision based results Heidelberg, 2009.