=Paper= {{Paper |id=Vol-1436/Paper22 |storemode=property |title=UNED-UV @ Retrieving Diverse Social Images Task |pdfUrl=https://ceur-ws.org/Vol-1436/Paper22.pdf |volume=Vol-1436 |dblpUrl=https://dblp.org/rec/conf/mediaeval/CastellanosBGVC15 }} ==UNED-UV @ Retrieving Diverse Social Images Task== https://ceur-ws.org/Vol-1436/Paper22.pdf
         UNED-UV @ Retrieving Diverse Social Images Task

          A.Castellanos                             X. Benavent                     A. García-Serrano
     NLP & IR Group, UNED                   Department of Informatics,             NLP & IR Group, UNED
      C/ Juan del Rosal,16                    University of Valencia                C/ Juan del Rosal,16
          Madrid, Spain                         Valencia, Spain                         Madrid, Spain
 acastellanos@lsi.uned.es   xaro.benavent@uv.es                                   agarcia@lsi.uned.es
                        E. de Ves                                               J.Cigarrán
                             Department of Informatics,                    NLP & IR Group, UNED
                               University of Valencia                       C/ Juan del Rosal,16
                                 Valencia, Spain                                Madrid, Spain
                              esther.deves@uv.es                            juanci@lsi.uned.es

ABSTRACT                                                              model [8].
This paper details the participation of the UNED-UV group
at the 2015 Retrieving Diverse Social Images Task. This               2.    SYSTEM DESCRIPTION
year, our proposal is based on a multi-modal approach that              Our multi-media system has two sub-systems: a Textual-
firstly applies a textual algorithm based on Formal Concept           Based Information Retrieval system that works with tex-
Analysis (FCA) and Hierarchical Agglomerative Clustering              tual information and generates clusters for diversity, and a
(HAC) to detect the latent topics addressed by the images to          Content-Based Information Retrieval sub-system that esti-
diversify them according to these topics. Secondly, a Local           mates the relevance of each of the images of the generated
Logistic Regression model, which uses the low level features          clusters.
and some relevant and non-relevant samples, is adjusted and
estimates the relevance probability for all the images in the         2.1     Textual-Based Information Retrieval
database.                                                                Our proposal is based on the discovering of the latent
                                                                      topics addressed by the images by applying Formal Concept
1.   INTRODUCTION                                                     Analysis. A Hierarchical Agglomerative Clustering (HAC)
   Information retrieval systems have been commonly based             [9] is then applied to group together similar images according
on maximizing the relevance of the result list (i.e., in terms        to the detected formal concepts (those belonging to the same
of accuracy-based metrics). However, retrieval systems, and           topic). Each HAC-based cluster may be considered as an
specially those focused on image diversification, should be           image set covering a similar topic. Then, for each cluster,
able to offer relevant but also diverse results. Users are not        the visual features related to the images are applied to rank
only interested in accurate results but also in results covering      the cluster images according to their visual diversity.
different topics or situations [1].
   To address this task we propose an image representation            2.1.1    FCA-based Modelling
using the concept/s covered by the textual information re-               Formal Concept Analysis (FCA) is a theory of concept
lated to the images. This conceptual representation is tack-          formation [11] to organize formal contexts. A formal context
led by means of the use of Formal Concept Analysis, a data            is a structure K := (G, M, I), where G is a set of objects,
organization technique. In our participation in the 2014 edi-         M a set of attributes related to these objects and I a binary
tion of this task we proved that this approach was able to            relationship between G and M , denoted by gIm: the object
identify the different topics addressed in the images, allow-         g has the attribute m. From the formal context, a set of
ing the diversification of the result list according to them          formal concepts can be inferred i.e., a formal concept is a
[4].                                                                  pair (A, B) of images A and the features shared by those
   This year we intend to go a step further by presenting             images B) and organized in a lattice from the most generic
a multimedia approach so that the aspects related to the              to the most specific one.
visual information of the images were missed in our previous             By applying FCA the images in the test set are modelling
approach. As it has extensivily been proven that the visual           terms of formal concepts, which group together the images
information has a great impact in the information retrieval           sharing a same set of features. In order to select only those
systems [10]. The visual approach presented uses a relevance          most-representative features, we applied Kullback-Leibler
feedback algorithm developped by the UV group used in                 Divergence (KLD) [7] on the textual contents related to the
previous works [3], [5]. This method estimates the similarity         images. This KLD-based selection represents each image by
probability of all the images of the database using the visual        the textual contents that better differentiates a image from
low-level features by means of a Local Logistic Regression            the other ones.

Copyright is held by the author/owner(s).                             2.1.2    HAC-based grouping
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
This work has been partially supported by VOXPOPULI (TIN2013-47090-     From the FCA formal concepts, a set of diverse image
C3-1-P) and DPI2013- 47279-C2-1-R Spanish projects.                   groups is created by applying a HAC algorithm [9]. Specif-
ically, we propose a Single Linking hierarchical clustering
that groups together similar formal concepts and the Zero-        Table 1: Official Metrics for Retrieving Diverse So-
Induces index to set the cluster similarity [2].                  cial Images Task. Best result for each topic set is
                                                                  in bold, and best result for the automatic runs is in
2.2   Content-Based Information Retrieval                         italics. First block results are for the one-topic sub-
                                                                  set, the second for the multi-topic subset, and the
  This sub-system is concerned with Content-Based Image
                                                                  third for the overall set.
retrieval which models the user preferences by using a rel-         Set                    run1     run2    run3     run5
evance feedback algorithm. The general methodology in-                           P@20     0.6362 0.7094 0.7051 0.7645
volves five steps:                                                  One-topic    CR@20 0.3704 0.4082 0.3995 0.4194
  1. Reduction of the data dimensionality: The provided                          F1@20    0.4618 0.5068 0.4988 0.5240
     low-level visual features [6] are used to generate a fea-                   P@20     0.7300 0.6393 0.7207 0.7886
     ture vector associated to each image that will be gener-       Multi-topic CR@20 0.4257 0.4407 0.4116 0.4491
     ically denoted as x in a dimensional space N = 945.                         F1@20    0.5130 0.5025 0.5001 0.5519
     These features are reduced using a Principal Compo-                         P@20     0.6835 0.6741 0.7129 0.7766
     nent Analysis (PCA). We retain only the first compo-           Overall      CR@20 0.3983 0.4246 0.4056 0.4344
     nents that account for 80% of the data variability. We                      F1@20    0.4876 0.5046 0.4994 0.5380
     have used this idea to reduce the original dimension of
     our characteristic space in a new characteristic vector
     of dimension M < N . One of the advantages of this re-            image at each of the clusters generated by the TBIR
     duction is that the new transformed components are in             sub-system (run3 ). If there are less than 50 clusters,
     decreasing order with respect to the variance explained           a second highest probability image selection is done.
     by the corresponding principal component.                         For the automatic run using only visual information
  2. Selecting the relevant and non-relevant sets: The user            (run1 ), the clusters are made by a k-means (k = 50)
     looks a few screens, each showing some images, and                procedure over the PCA components of the visual fea-
     marks some of them as being relevant and non-relevant             ture vector.
     (run5 ). For the automatic runs (run1 and run3 ), the
     relevant and non-relevant images are automatically se-
     lected. For the one-topic subset, the relevant images        3.   RESULTS
     are the images given by Wikipedia for the certain topic,       We submitted four runs computed as following: run1 -
     and for the multi-topic subset, we have generated the        automated using visual information only (uses step 2 pre-
     relevant images by selecting the first five results. A set   sented in Section 2.2), run2 - automated using text infor-
     of non-relevant images has been manually generated           mation only (uses step 1 presented in Section 2.1), run3 -
     taking into account the non-relevant guidelines given        automated multimedia (uses steps 1-2 presented in Sections
     [6] (photos with regular people as main subject, pho-        2.1 and 2.2) and run5 - everything allowed: Textual clusters
     tos with riots and protests). At each query, the non-        witch FCA and manual relevance feedback algorithm using
     relevant images required are randomly selected from          visual features (uses steps 1-2 presented in Sections 2.1 and
     the generated non-relevant set.                              2.2). Results are presented in Table 1.
                                                                    It is interesting to observe that our best results for both
  3. Parameter estimation of the Local Logistic Regression
                                                                  precision and diversification are obtained with the multi-
     Models [8]: The reduced feature vectors (PCA) and
                                                                  media human-based approach, run5, F @20 = 0.5380, for
     the relevant and non-relevant sets are the inputs of
                                                                  both subsets: one-topic, F @20 = 0.5240, and multi-topic,
     several Local Logistic Regression models whose out-
                                                                  F @20 = 0.5519. For the automatic runs, the best result is
     puts are the probabilities for user assessment, i.e. the
                                                                  achieved by run2 at the one-topic subsest, F @20 = 0.5068;
     probabilities he/she would assign to the fact that the
                                                                  whereas for the multi-topic subset, run1 gets the highest
     image belongs to the relevant set. The feature vector
                                                                  precision P @20 = 0.7300 and best performance F 1@20 =
     is splitted dynamically in m groups of non-fixed size.
                                                                  0.5130, but run2 gets better diversification, CR@20 = 0.4407.
     Each group is used for adjusting the model of higher
     order, given the inputs sets and PCA components.
  4. Ranking of the database: Models are evaluated on all
                                                                  4.   CONCLUSIONS
     the images of the database and return the probabil-             We presented a multimodal approach for image diversi-
     ities of being relevant for each estimated model; as         fication applying a conceptual-based modelling (based on
     results, we have a probability vector (p) of dimension       FCA and HAC) to cluster the images according to the la-
     m for each individual image. We combine these prob-          tent topics addressed by their textual content, joined with
     abilities in just one by using a weighted average. The       a relevance feedback algorithm using the visual features for
     weights (w) for a given probability are obtained by the      determining the similarity. Results show that the manual
     amount of variance accounted for the group of compo-         version of the multimedia approach works better than the
     nents used to adjust the model. Finally, this proce-         automatic one. This is due to the way the relevant and non-
     dure gives us a score/probability for each image in the      relevant images are chosen to estimate the model. A human
     database.                                                    knows better the meaning of the topic; therefore, he/she
                                                                  selects the most significant images for the model. Our chal-
  5. Ranking of the database: the final diversity similarity      lenge is to make the automatic approach to be able to select
     rank is generated by selecting the highest probability       the relevant and non-relevant images as a human being.
5.   REFERENCES
 [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.
     Diversifying search results. In Proceedings of the
     Second ACM International Conference on Web Search
     and Data Mining, pages 5–14. ACM, 2009.
 [2] F. Alqadah and R. Bhatnagar. Similarity measures in
     formal concept analysis. Annals of Mathematics and
     Artificial Intelligence, 61(3):245–256, 2011.
 [3] X. Benavent, A. Garcia-Serrano, R. Granados,
     J. Benavent, and E. de Ves. Multimedia information
     retrieval based on late semantic fusion approaches:
     Experiments on a wikipedia image collection.
     Multimedia, IEEE Transactions on, PP(99):1–1, 2013.
 [4] A. Castellanos, J. Cigarrán, and A. Garcı́a-Serrano.
     Uned @ retrieving diverse social images task. In
     MediaEval Multimedia Benchmark Workshop,
     CEUR-WS.org, 1263, ISSN 1613-0073, 2014.
 [5] E. de Ves, G. Ayala, X. Benavent, J. Domingo, and
     E. Dura. Modeling user preferences in content-based
     image retrieval: a novel attempt to bridge the
     semantic gap. Neurocomputing, (0):–, 2015.
 [6] B. Ionescu, A. L. Gı̂nsca, B. Boteanu, A. Popescu,
     M. Lupu, and H. Müller. Retrieving diverse social
     images at mediaeval 2015: Challenge, dataset and
     evaluation. In Retrieving Diverse Social Images at
     MediaEval 2015: Challenge, Dataset and Evaluation.
     Working Notes Proceedings of the MediaEval 2015
     Workshop, 2015.
 [7] S. Kullback and R. A. Leibler. On information and
     sufficiency. The Annals of Mathematical Statistics,
     22(1):79–86, 1951.
 [8] C. Loader. Local regression and likelihood. New York:
     Springer-Verlag, 1999.
 [9] C. D. Manning, P. Raghavan, and H. Schütze.
     Hierarchical clustering. pages 377–403. 2008.
[10] S. Rudinac, A. Hanjalic, and M. Larson. Generating
     visual summaries of geographic areas using
     community-contributed images. IEEE Transactions on
     Multimedia, 15(4):921–932, 2013.
[11] R. Wille. Concept lattices and conceptual knowledge
     systems. Computers & mathematics with applications,
     23(6):493–515, 1992.