=Paper= {{Paper |id=Vol-1263/paper38 |storemode=property |title=Recod @ MediaEval 2014: Diverse Social Images Retrieval |pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_38.pdf |volume=Vol-1263 |dblpUrl=https://dblp.org/rec/conf/mediaeval/CalumbySCPLCT14 }} ==Recod @ MediaEval 2014: Diverse Social Images Retrieval== https://ceur-ws.org/Vol-1263/mediaeval2014_submission_38.pdf

Recod @ MediaEval 2014: Diverse Social Images Retrieval

Rodrigo T. Calumby1,2 , Vinícius P. Santana2 , Felipe S. Cordeiro2 ,
Otávio A. B. Penatti3 , Lin T. Li1 , Giovani Chiachia1 , and Ricardo da S. Torres1
1 RECOD Lab, Institute of Computing, University of Campinas (UNICAMP), Campinas, SP – Brazil
2 Department of Exact Sciences, University of Feira de Santana (UEFS), Feira de Santana, BA – Brazil
3 SAMSUNG Research Institute Brazil, Campinas, SP – Brazil
[rtcalumby, vpsantana, fscordeiro]@ecomp.uefs.br, o.penatti@samsung.com, [lintzyli, chiachia, rtorres]@ic.unicamp.br

ABSTRACT
This paper presents the results of the first participation of
our multi-institutional team in the Retrieving Diverse So-
cial Images Task at MediaEval 2014. In this task we were
required to develop a summarization and diversification ap-
proach for social photo retrieval. Our approach is based on
irrelevant image filtering, image re-ranking, and diversity
promotion by clustering. We have used visual and textual
features, including image metadata and user credibility in-
formation.
Figure 1: Proposed approach overview.
1. INTRODUCTION
Promoting diversity is an effective approach for improving
retrieval results and user search experience. For instance, it tagged images were assessed. According to the results on
has been applied for tackling ambiguous or underspecified the development set, a 10 km range limit from the reference
queries, or producing summaries. The Retrieving Diverse point was a good choice.
Social Images Task [2] combines such problems into a chal- Since images containing a person or crowds in the fore-
lenge on visual summarization for social photo retrieval in a ground are considered non-relevant, we used a face detec-
tourism related context. This paper presents our first efforts tion module of Face++1 for filtering. For all images, we
on relevance improvement and diversity promotion using im- computed the features: a) number of faces; b) biggest face
age visual features, metadata and user credibility informa- size; c) smallest face size; d) average face size; e) total face
tion. size. The size values were computed as a fraction of the
image spatial domain.
Our first face-based filtering approach (NumFacesFilter)
2. PROPOSED APPROACH eliminates all images with a number of faces superior to a
The proposed approach follows the general pipeline pre- threshold. According to the experiments on the development
sented in Figure 1. At first, two filtering steps are conducted set, we eliminated all images with more than one face. The
in order to reduce the amount of irrelevant images. After- second approach (FaceClassifierFilter) used a 1-NN classifier
wards, re-ranking steps are applied for improving image rank based on the described features and considering all develop-
positions according to two different relevance aspects. Fi- ment images as training instances. All images classified as
nally, clustering is performed and followed by representative non-relevant were eliminated.
and diverse images selection. Specific combinations of the
proposed steps were set for each submitted run (Section 3). 2.2 Features
For the textual and multimodal approaches, we evaluated
2.1 Filtering the TF-IDF, BM25, and Cosine measures. All of them were
In order to reduce the number of non-relevant images we computed using the provided TF, DF, and TF-IDF. In the
adapted two filtering strategies: Geographic filtering and development set, the best results were achieved using the Co-
Face filtering. Eliminating non-relevant images allows higher sine measure. To enable the combination with other distance
effectiveness in terms of final relevance and boosts the di- measures, the Cosine similarity values were converted by
versification procedure. This is a consequence of fewer non- subtracting it from 1.0. For visual approaches, besides the
relevant items as candidates for the final diversified list. provided features, we also extracted two global descriptors
The geographic filtering (GeoFilter) takes the reference (BIC and LAS) [3] and two bag-of-visual-words (BoVW) de-
lat/long of each location and then eliminates all images lo- scriptors, based on dense (6 pixels) or sparse (Harris-Laplace
cated farther than a given range. In this case, only geo- detector) SIFT, with 1000 visual words (randomly selected),
soft assignment (σ = 150), and max pooling.
1
Copyright is held by the author/owner(s). http://www.faceplusplus.com - Last accessed on Sept 20,
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain 2014.
2.3 Re-ranking
Table 1: Runs Configurations
Since original lists may present redundant and non-relevant Run Filtering Re-ranking Diversification
items, their positions may not be optimum for their rele- 1 GeoFilter and - kMedoids
sparse
vance. Even after the filtering procedures, some non-relevant NumFacesFilter (BoV Wmax + HOG)
2 GeoFilter and - kMedoids (Cosine)
images may remain and therefore we proposed two re-ranking NumFacesFilter
strategies: visual-based and user credibility-based. 3 GeoFilter and Visual re-ranking kMedoids
sparse
The visual re-ranking used location’s representative im- NumFacesFilter (CM3x3 + HOG + BIC) (BoV Wmax +
HOG + Cosine)
ages obtained from Wikipedia as queries. The original lists 4 GeoFilter and Visual re-ranking kMedoids (CN3x3)
were re-ranked according to the similarity in relation to the NumFacesFilter (CM3x3 + HOG + BIC)
representative sets. The visual distance from each image in and Credibility re-ranking
5 GeoFilter and Visual re-ranking kMedoids (CN3x3)
a list to the corresponding representative set was computed FaceClassifierFilter (CM3x3 + HOG + BIC)
as the minimum distance value between the image and each and Credibility re-ranking
representative image. For multiple feature fusion, we used
a smoothed version of the Borda Count algorithm. In our
version the vote (relevance score) for the nth image in the Table 2: Runs Effectiveness - Official Measures
1 Run P@20 CR@20 F1@20
rank was computed as √ 4 n+1 .

As a different re-ranking strategy, we also exploited the 1 0.7130 0.4030 0.5077
user-credibility descriptors provided with the data. Hence, 2 0.6976 0.4139 0.5133
we combined a relevance-based score (relScore) with an- 3 0.7016 0.4177 0.5168
other score based on credibility (credScore). The relScore 4 0.7598 0.4288 0.5423
of each image was computed according to its position in the 5 0.7407 0.4076 0.5206
list as described for the visual re-ranking. The credScore
was computed as the product of three credibility features:
visualScore, faceProportion, and tagSpecificity. The final re-
ranking score was computed as relScore × credScore; (Runs 4 and 5). Run 2 (purely textual) slightly outper-
formed Run 1 (purely visual) in terms of diversity. The mul-
2.4 Diversification Method timodal combination (Run 3) slightly outperformed Runs 1
After the filtering and re-ranking procedures, the next and 2 on CR@20 and F1@20. However when the credibil-
step consists of the actual summarization and diversifica- ity re-ranking was applied (Run 4) the best results were
tion. We evaluated two diversification methods: MMR [1] achieved by the visual approach with reasonable improve-
and a clustering technique based-on kMedoids. Given the ment on all effectiveness measures. Notice that when the
superiority of kMedoids over MMR on the development set, face-based filtering used the classifier (Run 5), the results
we used the kMedoids clustering for the test set runs. were lower than using the face number threshold (Run 4)
The kMedoids clustering technique is divided into two but still superior to Runs 1 to 3 on F1@20.
main steps: medoid definition and clusters construction.
Since we were supposed to return 50 representative images, 5. CONCLUSIONS
the algorithm was set to create 50 clusters. The initial cen- We proposed a multimodal approach with the use of fil-
troids were defined in a offset fashion. The offset value was tering and re-ranking approaches in conjunction with a clus-
computed by dividing the list size by 50. The centroids tering technique for diversification. Our best results were
were then defined as the images in the positions i×offset, achieved with image re-ranking by combining their relevance
with 0 ≤ i < 50. Hence, the initial medoids were picked score and user credibility information. As future work we
throughout the list from the top to the bottom. After the would like to evaluate the usage of additional information on
clusters are constructed, the process iterates untill there is the re-ranking and diversification steps and more elaborated
no further transition between the clusters. At each itera- fusion approaches.
tion, the new medoids were defined as the best connected
images (average distance to all images in the cluster). The
distance between two images is computed as the average of 6. ACKNOWLEDGMENTS
their distances computed for each feature. Finally the im- We thank the support of UEFS/PROBIC, Samsung Re-
ages in each cluster are ranked according to their positions search Institute Brazil, and FAPESP (2013/11359-0).
in the original non-clustered list. The final output list is
composed of the most relevant item from each cluster. 7. REFERENCES
[1] J. Carbonell and J. Goldstein. The use of mmr,
3. RUN SETUP diversity-based reranking for reordering documents and
We submitted five runs and their descriptions are pre- producing summaries. In SIGIR, pages 335–336, 1998.
sented in Table 1. The features used in each run and each [2] B. Ionescu, A. Popescu, M. Lupu, A. L. Gı̂nscă, and
step were selected according to the best results on the de- H. Müller. Retrieving diverse social images at
velopment set. mediaeval 2014: Challenge, dataset and evaluation. In
MediaEval 2014 Workshop, Barcelona, 2014.
4. RESULTS AND DISCUSSION [3] O. A. B. Penatti, E.Valle, and R. da S. Torres.
Comparative study of global color and texture
Table 4 presents the official evaluation measures for the
descriptors for web image retrieval. J. Vis. Commun.
five runs. We can see that best results (for all measures)
Image Repr., 23(2):359 – 380, 2012.
were achieved when the proposed full pipeline was applied