=Paper= {{Paper |id=Vol-1436/Paper22 |storemode=property |title=UNED-UV @ Retrieving Diverse Social Images Task |pdfUrl=https://ceur-ws.org/Vol-1436/Paper22.pdf |volume=Vol-1436 |dblpUrl=https://dblp.org/rec/conf/mediaeval/CastellanosBGVC15 }} ==UNED-UV @ Retrieving Diverse Social Images Task== https://ceur-ws.org/Vol-1436/Paper22.pdf

UNED-UV @ Retrieving Diverse Social Images Task

A.Castellanos X. Benavent A. García-Serrano
NLP & IR Group, UNED Department of Informatics, NLP & IR Group, UNED
C/ Juan del Rosal,16 University of Valencia C/ Juan del Rosal,16
Madrid, Spain Valencia, Spain Madrid, Spain
acastellanos@lsi.uned.es xaro.benavent@uv.es agarcia@lsi.uned.es
E. de Ves J.Cigarrán
Department of Informatics, NLP & IR Group, UNED
University of Valencia C/ Juan del Rosal,16
Valencia, Spain Madrid, Spain
esther.deves@uv.es juanci@lsi.uned.es

ABSTRACT model [8].
This paper details the participation of the UNED-UV group
at the 2015 Retrieving Diverse Social Images Task. This 2. SYSTEM DESCRIPTION
year, our proposal is based on a multi-modal approach that Our multi-media system has two sub-systems: a Textual-
firstly applies a textual algorithm based on Formal Concept Based Information Retrieval system that works with tex-
Analysis (FCA) and Hierarchical Agglomerative Clustering tual information and generates clusters for diversity, and a
(HAC) to detect the latent topics addressed by the images to Content-Based Information Retrieval sub-system that esti-
diversify them according to these topics. Secondly, a Local mates the relevance of each of the images of the generated
Logistic Regression model, which uses the low level features clusters.
and some relevant and non-relevant samples, is adjusted and
estimates the relevance probability for all the images in the 2.1 Textual-Based Information Retrieval
database. Our proposal is based on the discovering of the latent
topics addressed by the images by applying Formal Concept
1. INTRODUCTION Analysis. A Hierarchical Agglomerative Clustering (HAC)
Information retrieval systems have been commonly based [9] is then applied to group together similar images according
on maximizing the relevance of the result list (i.e., in terms to the detected formal concepts (those belonging to the same
of accuracy-based metrics). However, retrieval systems, and topic). Each HAC-based cluster may be considered as an
specially those focused on image diversification, should be image set covering a similar topic. Then, for each cluster,
able to offer relevant but also diverse results. Users are not the visual features related to the images are applied to rank
only interested in accurate results but also in results covering the cluster images according to their visual diversity.
different topics or situations [1].
To address this task we propose an image representation 2.1.1 FCA-based Modelling
using the concept/s covered by the textual information re- Formal Concept Analysis (FCA) is a theory of concept
lated to the images. This conceptual representation is tack- formation [11] to organize formal contexts. A formal context
led by means of the use of Formal Concept Analysis, a data is a structure K := (G, M, I), where G is a set of objects,
organization technique. In our participation in the 2014 edi- M a set of attributes related to these objects and I a binary
tion of this task we proved that this approach was able to relationship between G and M , denoted by gIm: the object
identify the different topics addressed in the images, allow- g has the attribute m. From the formal context, a set of
ing the diversification of the result list according to them formal concepts can be inferred i.e., a formal concept is a
[4]. pair (A, B) of images A and the features shared by those
This year we intend to go a step further by presenting images B) and organized in a lattice from the most generic
a multimedia approach so that the aspects related to the to the most specific one.
visual information of the images were missed in our previous By applying FCA the images in the test set are modelling
approach. As it has extensivily been proven that the visual terms of formal concepts, which group together the images
information has a great impact in the information retrieval sharing a same set of features. In order to select only those
systems [10]. The visual approach presented uses a relevance most-representative features, we applied Kullback-Leibler
feedback algorithm developped by the UV group used in Divergence (KLD) [7] on the textual contents related to the
previous works [3], [5]. This method estimates the similarity images. This KLD-based selection represents each image by
probability of all the images of the database using the visual the textual contents that better differentiates a image from
low-level features by means of a Local Logistic Regression the other ones.

Copyright is held by the author/owner(s). 2.1.2 HAC-based grouping
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
This work has been partially supported by VOXPOPULI (TIN2013-47090- From the FCA formal concepts, a set of diverse image
C3-1-P) and DPI2013- 47279-C2-1-R Spanish projects. groups is created by applying a HAC algorithm [9]. Specif-
ically, we propose a Single Linking hierarchical clustering
that groups together similar formal concepts and the Zero- Table 1: Official Metrics for Retrieving Diverse So-
Induces index to set the cluster similarity [2]. cial Images Task. Best result for each topic set is
in bold, and best result for the automatic runs is in
2.2 Content-Based Information Retrieval italics. First block results are for the one-topic sub-
set, the second for the multi-topic subset, and the
This sub-system is concerned with Content-Based Image
third for the overall set.
retrieval which models the user preferences by using a rel- Set run1 run2 run3 run5
evance feedback algorithm. The general methodology in- P@20 0.6362 0.7094 0.7051 0.7645
volves five steps: One-topic CR@20 0.3704 0.4082 0.3995 0.4194
1. Reduction of the data dimensionality: The provided F1@20 0.4618 0.5068 0.4988 0.5240
low-level visual features [6] are used to generate a fea- P@20 0.7300 0.6393 0.7207 0.7886
ture vector associated to each image that will be gener- Multi-topic CR@20 0.4257 0.4407 0.4116 0.4491
ically denoted as x in a dimensional space N = 945. F1@20 0.5130 0.5025 0.5001 0.5519
These features are reduced using a Principal Compo- P@20 0.6835 0.6741 0.7129 0.7766
nent Analysis (PCA). We retain only the first compo- Overall CR@20 0.3983 0.4246 0.4056 0.4344
nents that account for 80% of the data variability. We F1@20 0.4876 0.5046 0.4994 0.5380
have used this idea to reduce the original dimension of
our characteristic space in a new characteristic vector
of dimension M < N . One of the advantages of this re- image at each of the clusters generated by the TBIR
duction is that the new transformed components are in sub-system (run3 ). If there are less than 50 clusters,
decreasing order with respect to the variance explained a second highest probability image selection is done.
by the corresponding principal component. For the automatic run using only visual information
2. Selecting the relevant and non-relevant sets: The user (run1 ), the clusters are made by a k-means (k = 50)
looks a few screens, each showing some images, and procedure over the PCA components of the visual fea-
marks some of them as being relevant and non-relevant ture vector.
(run5 ). For the automatic runs (run1 and run3 ), the
relevant and non-relevant images are automatically se-
lected. For the one-topic subset, the relevant images 3. RESULTS
are the images given by Wikipedia for the certain topic, We submitted four runs computed as following: run1 -
and for the multi-topic subset, we have generated the automated using visual information only (uses step 2 pre-
relevant images by selecting the first five results. A set sented in Section 2.2), run2 - automated using text infor-
of non-relevant images has been manually generated mation only (uses step 1 presented in Section 2.1), run3 -
taking into account the non-relevant guidelines given automated multimedia (uses steps 1-2 presented in Sections
[6] (photos with regular people as main subject, pho- 2.1 and 2.2) and run5 - everything allowed: Textual clusters
tos with riots and protests). At each query, the non- witch FCA and manual relevance feedback algorithm using
relevant images required are randomly selected from visual features (uses steps 1-2 presented in Sections 2.1 and
the generated non-relevant set. 2.2). Results are presented in Table 1.
It is interesting to observe that our best results for both
3. Parameter estimation of the Local Logistic Regression
precision and diversification are obtained with the multi-
Models [8]: The reduced feature vectors (PCA) and
media human-based approach, run5, F @20 = 0.5380, for
the relevant and non-relevant sets are the inputs of
both subsets: one-topic, F @20 = 0.5240, and multi-topic,
several Local Logistic Regression models whose out-
F @20 = 0.5519. For the automatic runs, the best result is
puts are the probabilities for user assessment, i.e. the
achieved by run2 at the one-topic subsest, F @20 = 0.5068;
probabilities he/she would assign to the fact that the
whereas for the multi-topic subset, run1 gets the highest
image belongs to the relevant set. The feature vector
precision P @20 = 0.7300 and best performance F 1@20 =
is splitted dynamically in m groups of non-fixed size.
0.5130, but run2 gets better diversification, CR@20 = 0.4407.
Each group is used for adjusting the model of higher
order, given the inputs sets and PCA components.
4. Ranking of the database: Models are evaluated on all
4. CONCLUSIONS
the images of the database and return the probabil- We presented a multimodal approach for image diversi-
ities of being relevant for each estimated model; as fication applying a conceptual-based modelling (based on
results, we have a probability vector (p) of dimension FCA and HAC) to cluster the images according to the la-
m for each individual image. We combine these prob- tent topics addressed by their textual content, joined with
abilities in just one by using a weighted average. The a relevance feedback algorithm using the visual features for
weights (w) for a given probability are obtained by the determining the similarity. Results show that the manual
amount of variance accounted for the group of compo- version of the multimedia approach works better than the
nents used to adjust the model. Finally, this proce- automatic one. This is due to the way the relevant and non-
dure gives us a score/probability for each image in the relevant images are chosen to estimate the model. A human
database. knows better the meaning of the topic; therefore, he/she
selects the most significant images for the model. Our chal-
5. Ranking of the database: the final diversity similarity lenge is to make the automatic approach to be able to select
rank is generated by selecting the highest probability the relevant and non-relevant images as a human being.
5. REFERENCES
[1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.
Diversifying search results. In Proceedings of the
Second ACM International Conference on Web Search
and Data Mining, pages 5–14. ACM, 2009.
[2] F. Alqadah and R. Bhatnagar. Similarity measures in
formal concept analysis. Annals of Mathematics and
Artificial Intelligence, 61(3):245–256, 2011.
[3] X. Benavent, A. Garcia-Serrano, R. Granados,
J. Benavent, and E. de Ves. Multimedia information
retrieval based on late semantic fusion approaches:
Experiments on a wikipedia image collection.
Multimedia, IEEE Transactions on, PP(99):1–1, 2013.
[4] A. Castellanos, J. Cigarrán, and A. Garcı́a-Serrano.
Uned @ retrieving diverse social images task. In
MediaEval Multimedia Benchmark Workshop,
CEUR-WS.org, 1263, ISSN 1613-0073, 2014.
[5] E. de Ves, G. Ayala, X. Benavent, J. Domingo, and
E. Dura. Modeling user preferences in content-based
image retrieval: a novel attempt to bridge the
semantic gap. Neurocomputing, (0):–, 2015.
[6] B. Ionescu, A. L. Gı̂nsca, B. Boteanu, A. Popescu,
M. Lupu, and H. Müller. Retrieving diverse social
images at mediaeval 2015: Challenge, dataset and
evaluation. In Retrieving Diverse Social Images at
MediaEval 2015: Challenge, Dataset and Evaluation.
Working Notes Proceedings of the MediaEval 2015
Workshop, 2015.
[7] S. Kullback and R. A. Leibler. On information and
sufficiency. The Annals of Mathematical Statistics,
22(1):79–86, 1951.
[8] C. Loader. Local regression and likelihood. New York:
Springer-Verlag, 1999.
[9] C. D. Manning, P. Raghavan, and H. Schütze.
Hierarchical clustering. pages 377–403. 2008.
[10] S. Rudinac, A. Hanjalic, and M. Larson. Generating
visual summaries of geographic areas using
community-contributed images. IEEE Transactions on
Multimedia, 15(4):921–932, 2013.
[11] R. Wille. Concept lattices and conceptual knowledge
systems. Computers & mathematics with applications,
23(6):493–515, 1992.