=Paper= {{Paper |id=Vol-1263/paper9 |storemode=property |title=UNED @ Retrieving Diverse Social Images Task |pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_9.pdf |volume=Vol-1263 |dblpUrl=https://dblp.org/rec/conf/mediaeval/CastellanosGR14 }} ==UNED @ Retrieving Diverse Social Images Task== https://ceur-ws.org/Vol-1263/mediaeval2014_submission_9.pdf
            UNED @ Retrieving Diverse Social Images Task

                 A.Castellanos                       A. García-Serrano                       J.Cigarrán
             NLP & IR Group, UNED                   NLP & IR Group, UNED               NLP & IR Group, UNED
              C/ Juan del Rosal,16                   C/ Juan del Rosal,16               C/ Juan del Rosal,16
                  Madrid, Spain                          Madrid, Spain                      Madrid, Spain
         acastellanos@lsi.uned.es                  agarcia@lsi.uned.es                 juanci@lsi.uned.es


ABSTRACT                                                          propose the application of Formal Concept Analysis as a
This paper summarizes the participation of UNED at the            modelling technique. By means of FCA, the set of images
2014 Retrieving Diverse Social Images Task [3]. We pro-           to be diversified will be organized according to their latent
pose a novel approach based on Formal Concept Analysis            concepts.
(FCA) to detect the latent topics related to the images and          After the FCA application, a hierarchy organizing the im-
a later Hierarchical Agglomerative Clustering (HAC) to put        ages in formal concepts according to their shared features
together the images according to these latent topics. The         will be obtained. However, it still remains the diversifica-
diversification will be based on offering images from the dif-    tion of the images according to this representation. For that,
ferent topics detected. In order to detect these latent topics,   we proposed the application of a HAC algorithm to group
two kinds of data have been tested: only information related      together the formal concepts that could be considered as
to the description of the images and all the textual informa-     similar (belonging to the same topic).
tion related to the images. The results show that our pro-           After this grouping, each HAC cluster can be considered
posal is suitable for search result diversification, achieving    as an image set covering a similar topic. Then, for each
similar results to those in the state of the art.                 cluster the best image in the group (the one with a higher
                                                                  ranking according to the ranking provided by the task, taken
                                                                  from an IR system) will be taken and offered as result. The
1.   INTRODUCTION                                                 final result list will be a ranked list of images, ordered ac-
   Diversification, in the sense proposed by this task, refers    cording to the provided ranking. As our methodology does
to the creation of a diverse result list given an user query.     not provide any score associated to the results and it was
The rationale is that the users are not only interested in        required by the organization, a dummy score has been set
accurate results but also in results covering different top-      (1 for the 1st result in the ranking, 0.99 for the second, and
ics or situations. From the IR-based point of view, a good        so on).
definition of the problem and a review of the state of the
art can be found at [1]. To address this task we propose a
novel approach based on an image representation using the
                                                                  2.1 Formal Concept Analysis
concept/s covered by the image. For that, we use the infor-          Formal Concept Analysis (FCA) is a mathematical the-
mation related to the images to create a conceptual-based         ory of concept formation [6, 7] derived from lattice and or-
data representation. The creation of this conceptual repre-       dered set theories that provide a theoretical model to or-
sentation is tackled by means of the application of Formal        ganize formal contexts. A formal context is a set structure
Concept Analysis, a data organization technique. We expect        K := (G, M, I), where G is a set of objects, M a set of
this representation will make explicit the latent concepts in     attributes and I a binary relationship between G and M ,
the images as well as the relationships between them. Based       (I ⊆ G × M ), denoted by gIm, which is read as: the ob-
on this FCA representation, a HAC algorithm is proposed           ject g has the attribute m. An example of formal context is
for grouping similar images, according to the latent concepts     shown in the Figure 1a. From the point of view of a con-
detected. Our hypothesis is that the resultant groups can         tent representation system, the formal context can be seen
represent the different topic addressed in the images. So,        as the set of contents (images) to be represented (G) is the
the image diversification will be done by taking one image        set of items to be represented, the set of textual (or other
from the different HAC-based clusters/topics.                     kind of) features (M ), representing the items, and the bi-
                                                                  nary relationship I can be read as item g has the feature
                                                                  m.
2.   WORK PROPOSAL                                                   From the information in the formal context, a set of formal
  Our proposal is based on taking the information related to      concepts can be inferred. A formal concept is a pair (A, B) of
the images so as to create a conceptual-based representation.     objects and attributes, which has the following properties:
This representation intends to discover latent concepts in        If an object a in A is tagged with an attribute b, then b
the data and the relationships between them. For that, we         must is included in B (B = AI includes all the attributes
                                                                  shared by the objects). Conversely, if an object a is tagged
                                                                  with all the attributes in B, then a must be included in A
Copyright is held by the author/owner(s).                         (i.e. A = B I : includes all those objects filtered out by the
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain    attributes). Finally, the whole sets of formal concepts can be
                                                                   Table 1: Official Metrics for Retrieving Diverse So-
                                                                   cial Images Task
                                                                    Approach                 P@20 CR@20 F-measure
                                                                    RUN-2: Description Info 0,7581 0,4325      0,5429
                                                                    RUN-5: All Textual Info 0,7772 0,4343      0,5502


                                                                   are favoured by the inclusion of more information than only
                                                                   the related to the description. However, this improvement is
                                                                   not enough to definitely conclude that this information can
Figure 1: (a) A sample formal context and (b) the                  represent a valuable indicator for image diversity.
concept lattice associated with the formal context
                                                                   4. CONCLUSIONS
                                                                      In this paper we presented a Conceptual-based modelling
organized according to an order relationship, from the most        (based on FCA) for improving diversity in the retrieving of
generic to the most specific. This order can be proven to          social images. With FCA we intend to infer the latent topics
be a lattice and, consequently, to be represented as a Hasse       addressed in the information related to the images. Once the
Diagram, as the one in the Figure 1b.                              latent concepts has been detected, in order to put together
                                                                   the most similar ones, we have applied an HAC approach.
2.2 FCA-based Modelling                                               Our hypothesis is that this concept modelling could help
   Following the FCA rationale, the information of each of         in the diversification of the retrieved images. For that, given
the 123 locations included in the test set is modelled. After      a query the system will retrieve images trying to cover all
this modelling, a set of formal concepts, grouping together        the identified topics.
the images sharing a same set of features, will be obtained.          In this work we have also experimented with different
To select the features to model the images we propose two          kinds of information to describe the images. The obtained
different kinds of data: 1) use only the description informa-      results proved our proposal as suitable to offer accurate and
tion of the images (description, title and tags), and 2) use all   also diverse results. Related to the kind of information to
the textual information related to the images (description,        use, the inclusion of the most information possible about the
title, tags, user information and date information).               images seems to be the best choice. For both, precision and
   In order to select only those most-representative features,     diversity based results, its performance is better, although
we applied Kullback-Leibler Divergence (KLD) [4] on the            the improvement in the diversity is not very large.
textual contents related to the images. Basically, KLD com-
pares the textual content related to a given image to the
textual contents of the rest of the images. In this way,
                                                                   5. ACKNOWLEDGMENTS
KLD allows representing each image by the textual contents           This work has been partially supported by VOXPOPULI
that better differentiates a given image from the other ones.      (TIN2013-47090-C3-1-P) Spanish project.
Hopefully, it will allow the improvement of the diversity of
the images selected.                                               6. REFERENCES
                                                                   [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.
2.3 HAC-based grouping                                                 Diversifying search results. In Proceedings of the Second
  From the FCA-based modelling, a hierarchy organizing                 ACM International Conference on Web Search and
the obtained set of formal concepts is provided. It remains            Data Mining, pages 5–14. ACM, 2009.
the creation of a set of diverse image groups based on this        [2] F. Alqadah and R. Bhatnagar. Similarity measures in
organization. For that, we propose the application of a                formal concept analysis. Annals of Mathematics and
HAC algorithm [5]. Specifically, we propose a Single Link-             Artificial Intelligence, 61(3):245–256, 2011.
ing based hierarchical clustering that groups together similar     [3] B. Ionescu, A. Popescu, M. Lupu, A. L. Gı̂nsca, and
formal concepts. In order to set the ”similarity” of two for-          H. Müller. Retrieving diverse social images at
mal concepts, we have applied the Zero-Induces index [2].              mediaeval 2014: Challenge, dataset and evaluation. In
This index measures the similarity of two formal concepts,             Proceedings of MediaEval Benchmarking Initiative for
based on the number of features that they share.                       Multimedia Evaluation., 2014.
                                                                   [4] S. Kullback and R. A. Leibler. On information and
3.   RESULTS                                                           sufficiency. The Annals of Mathematical Statistics,
                                                                       22(1):79–86, 1951.
   Table 1 shows the results obtained by our approaches for
the official runs. In general the performance of our ap-           [5] C. D. Manning, P. Raghavan, and H. Schütze.
proaches are in the same level than other approaches in                Hierarchical clustering. pages 377–403. 2008.
the state of the art, according the results of last year for       [6] R. Wille. Concept lattices and conceptual knowledge
the same task. The inclusion of all the textual information            systems. Computers & mathematics with applications,
related to an image seems to slightly improve the perfor-              23(6):493–515, 1992.
mances, both for precision and also for diversity based re-        [7] R. Wille. Restructuring Lattice Theory: An Approach
sults. It is reasonable to think that including different kinds        Based On Hierarchies Of Concepts, volume 5548 of
of data will improve the diversity of the results. However, as         Lecture Notes in Computer Science. Springer Berlin
it can be seen in the results, also the precision based results        Heidelberg, 2009.