1. INTRODUCTION

João R. M. Palotti

palotti@ifs.tuwien.ac.at 0

Mihai Lupu

lupu@ifs.tuwien.ac.at 0

Navid Rekabsaz

rekabsaz@ifs.tuwien.ac.at 0

Allan Hanbury

hanbury@ifs.tuwien.ac.at 0 0 Vienna University of , Technology

2014

16 17

This paper describes the e orts of Vienna University of Technology (TUW) in the MediaEval 2014 Retrieving Diverse Social Images challenge. Our approach consisted of 3 steps: (1) a pre- ltering based on Machine Learning, (2) a re-ranking based on Word2Vec, and (3) a clustering part based on an ensemble of clusters. Our best run reached a F@20 of 0.564.

1. INTRODUCTION

Diversi cation is an interesting problem for the information retrieval community, being a challenge for both text and multimedia data. Focused on image retrieval, the MediaEval 2014 Retrieving Diverse Social Images Task [ 1 ] was proposed to foster the development and evaluation of methods for retrieving diverse images of di erent point of interest.

METHODS

2.1

Pre-Filtering

We employed a pre- ltering step to exclude likely irrelevant pictures. The goal of this step is to increase the percentage of relevant images. We studied two approaches: (1) a ltering step based on a simpli ed version of Jain et al. [ 2 ] experiments, removing images without any view, geotagged more than 8 kilometers away from the point of interest (POI) and with a description length longer than 2000 characters; (2) we trained a Logistic Regression classi er on the whole 2013 and 2014 data, using as features the ones described above and also the images' license, the time of the day (morning, afternoon, night) and the number of times the POI appeared in the title and descriptions of an image.

Re-ranking

For re-ordering the results, we used the title, tags and description of the photos. For text pre-processing, we decomposed the terms using a greedy dictionary based approach. In the next step, we expand the query using the rst sentence of Wikipedia which helped for place disambiguation. We tested four document similarity methods based on Solr1, Random Indexing2, Galago3 and Word2Vec4[ 4 ]. Among all, we found the best result using a semantic similarity approach based on Word2Vec.

Word2Vec provides vector representation of words by using deep learning. We trained a model on Wikipedia and then used the vector representation of words to calculate the text similarity of the query to each photo.

Apart from the Word2Vec scores, we extracted binary attributes based on Jain et al. [ 2 ], as we did in the pre- ltering step, and we used a Linear Regression to re-rank the results based on the development data. 2.3

We worked on three methods for clustering, all based on similarity measures. They share the idea of creating a similarity graph (potentially complete) in which each vertex represents an image for one point of interest, and each edge represents the similarity between two images. Di erent similarity metrics and di erent set of features can be used. Next, we explain each algorithm and how we combined them. 2.3.1

Metis

The rst approach, called Metis [ 3 ], tries to collapse similar and neighbor vertices, reducing the initial graph to a smaller one (known as coarsening step). Then, it divides the coarsest graph into a pre-de ned number of graphs, generating the clusters. 2.3.2

Spectral

Spectral clustering [ 5 ] can also be seen as a graph partitioning method, which measures both the total dissimilarity between groups as well as the total similarity within a group. We used the Scikit-learn5 implementation of this method. 2.3.3

Hierarchical

Hierarchical clustering [ 6 ] is based on the idea of a hierarchy of clusters. A tree is built in a way that the root gathers all the samples and the leaves are clusters with only one sample. This tree can be built bottom-up or top-down. We used the bottom-up implementation from Scikit-learn. 1http://lucene.apache.org/solr/ 2https://code.google.com/p/semanticvectors/ 3http://sourceforge.net/p/lemur/galago/ 4https://code.google.com/p/word2vec/ 5http://scikit-learn.org/ 1 2 3 4 5

Pre-Filtering

Re-Ranking 0.827 0.903 0.870 0.890 0.837 0.282 0.262 0.301 0.297 0.299 2.3.4

Merging

We found that the clustering methods were unstable as modi cations in the ltering step caused a great variation in the clustering step. Therefore, we decided to implement a merging heuristic, which takes into account di erent points of view from each clustering method and/or feature set, being potentially more robust than using one single algorithm.

Given c di erent clustering algorithms, f di erent feature sets, and m distance measures, there are c f m possible cluster sets. In our work, we used the 3 algorithms described, 3 features sets (HOG, CN, CN3x3, see [ 1 ] for details about these features) and 2 distance measures (cosine and Chebyshev), giving us 18 di erent cluster sets for each POI. We can then compute a frequency matrix that will hold for every two documents the number of cluster sets in which they occur in the same cluster. Next we create a re-ranked list of the images from the original list (Flickr ranking) based only on this frequency matrix. In order to do that, we dene a threshold t and a function F on a set of frequencies that determine when a document should be moved to the re-ranked list. Suppose the re-ranked list contains the documents D1; :::; Di and we want to know if a document Dk in the original list can be moved to the re-ranked list at position i + 1. We compute the frequencies f1; :::; fi between Dk and each D1; :::; Di and if F (f1; :::; fi) < t, Dk is moved to the re-ranked list, otherwise it is not. After all the elements in the original list are processed, if there are still remaining documents not moved to the re-ranked list, the value of t is increased and these documents are reprocessed. The algorithm continues until there are no documents left in the original list. The functions for F used in this work were maximum, minimum and mean, but other measures, such as mode, median or any percentile could be easily employed as well. 2.4

Credibility

Our approaches were based on Machine Learning (ML): we trained a Logistic Regression classi er to learn if a document was relevant or not based on the credibility data (used only face proportion, location similarity, upload frequency and bulk proportion). We tested two methods: (1) excluding documents set as irrelevant for Run4 and (2) moving to the bottom of the list irrelevant documents for Run5.

3. EXPERIMENTS

We submitted all 5 runs, varying on the use of pre- ltering, the re-ranking method, the clustering approach and the use of credibility. Details are shown in Table 1 and the results are shown in Table 2. Based on the development data, we were expecting Run3 and Run4 to be our best runs, but the results on the test data shows that we probably over tted the development set for the re-ranking and credibility part. The best result was that the cluster ensemble proved to be robust for this task. 4.

CONCLUSION

Our experiments show that an ensemble of clusters can be a robust way to diversify results. Unfortunately our re-rank method did not work in the test set as well as it did in the development set. Last, the use of credibility also seems to have over tted the development data, not being e ective for the test set.

ACKNOWLEDGMENTS.

This research was funded by the Austrian Science Fund (FWF) project number I1094-N23 (MUCKE) and project number P25905-N23 (ADmIRE).

[1]

Ionescu ,

Popescu ,

Lupu , A. L.

G^nsca, and

ller . Retrieving Diverse Social Images at MediaEval 2014 : Challenge, Dataset and Evaluation . In MediaEval 2014 , 2014 .

[2]

Jain ,

Hare ,

Samangooei ,

Preston ,

Davies ,

Dupplaw , and

P. H.

Lewis . Experiments in diversifying ickr result sets . In MediaEval 2013 , 2013 .

[3]

Karypis and

Kumar . A fast and high quality multilevel scheme for partitioning irregular graphs . SIAM J. Sci. Comput ., 1998 .

[4]

Mikolov ,

Chen , G. Corrado, and

Dean . E cient estimation of word representations in vector space . CoRR , 2013 .

[5]

Shi and

Malik . Normalized cuts and image segmentation . IEEE Trans. Pattern Anal. Mach . Intell., 2000 .

[6]

J. H.

Ward . Hierarchical grouping to optimize an objective function . JASA , 1963 .