1. INTRODUCTION

TUW @ MediaEval 2015 Retrieving Diverse Social Images Task

0 Serwah Sabetghadam, João Palotti, Navid Rekabsaz, Mihai Lupu, Allan Hanbury Favoriten Strasse 9-11/188 Vienna University of Technology Vienna , Austria

2015

14 15

This paper describes the contributions of Vienna University of Technology (TUW) to the MediaEval 2015 Retrieving Diverse Social Images challenge. Our approach consists of 3 phases: (1) Precision-oriented-phase: in which we focus only on the relevance of the documents; (2) Recall-orientedphase: in which we focus only on the diversity aspect; (3) Merging phase: in which we explore ways to nd a balance between the relevance and diversity factors. We use two fusion methods for this last part. Our best run reached a F1@20 of 0.582.

Fusion

1. INTRODUCTION

Result diversi cation has recently attracted much attention in the IR community. Often, the information need requested by the users cannot be found by displaying items related only to one facet of the query topic. Ideally an IR system displays pieces of information covering diverse subtopics of the query. The same idea has been used in the Recommender System area, where diversi cation techniques has shown to increase user satisfaction [ 9, 10 ].

This paper describes the second participation of our team at MediaEval Retrieving Diverse Social Images task [ 3 ]. We build our solution upon our previous participation [ 7 ]. Last year, we had good results in both precision and recall, but in separated runs. Therefore, we decided to explore di erent strategies for better fusing our individual runs.

METHODS

We leveraged a distinct set of methods for each run. We show the combinations used for each run in Table 1.

Regarding the experience of the previous year [ 7 ], we use only textual features (i.e. title, tag, description) for nding the relevant documents. We extend the usual termfrequency-based methods to more semantic-based approach [ 8 ]. We create word embeddings using the Wikipedia corpus with 400 dimension by Word2Vec method [ 6 ]. We calculate the similarity between the query and the text documents (concatanation of title, tag, and description) using the SimGreedy method [ 8 ]. 2.2

To nd diverse images we experiment with di erent clustering methods. From [ 7 ] we learned that an approach based on ensemble of clusters can perform better than using only one single clustering method. We also learned that a preltering step can potentially remove irrelevant images that harm the process of clustering creation. Here we brie y comment on these two aspects:

Pre-Filtering: We use hand-coded rules previously shown to perform well in this task, to exclude probably irrelevant pictures [ 4 ]. We exclude pictures based on three rules: without any views, geo-tagged 8km away from the POI, or with description length greater than 2000 characters.

Clustering solution: The basic idea is that, given a clustering algorithm A, a feature set F that describes an image and a distance measure Di, we can create a cluster set C = (A; F; Di). For example, C1 can be the result of applying K-Means (A) using the Color Histogram of the images (F ), based on the cosine distance (Di): C1 = (KMeans, ColorHistogram, Cosine).

A common strategy used by a number of teams in 2013 was to go one by one of the clusters made in C1 and pick the "best\ image from each cluster to form the nal ranked list. We noticed that small di erences, for example having C2 = (K-means, NeuralNetworkFeatures, Cosine), could have a large impact in the clusters formed, consequently strongly in uencing the nal ranked list. As described in [ 7 ], our solution is to use the development set to learn what are the best clustering algorithm, features sets, and distance measures. After that, we combine the results of di erent Cs and count the frequency that any two images end up in the same cluster. Based on this simple frequency, we rerank the initial Flickr list (Run 1) or the list generate by the algorithm in Section 2.1 (Run 3). 2.3

Atrey et al. [ 1 ] performed a survey on fusion methods of combining multiple modalities. In their view, there are three category of methods for fusion: rule-based methods, classi cation-based methods and estimation-based methods. Our approach is inspired from these fusion methods for combining relevancy and diversity results. We leverage the weighted linear method from the rst category, and Bayesian inference from the second category.

Weighted Linear: We use the optimization technique proposed by Deselaers et al. [ 2 ] based on weighted linear fusion. Having the relevance of the query to each document (R) and also the diversi cation measure for each set of documents (D), we formulate the diversi cation issue as Run 0.8137 0.8203 0.7804 0.8157 0.8137 0.7232 0.8188 0.7928 0.7507 0.7261

2015 Development Set 0.2873 0.2188 0.2748 0.2867 0.2873 0.4189 0.3389 0.4014 0.4184 0.4189 0.2904 0.2589 0.3237 0.3062 0.2907 0.4045 0.3772 0.4443 0.4206 0.4053 0.7817 0.798 0.7546 0.7782 0.7804 Run 0.7201 0.7842 0.7633 0.7345 0.7216 0.2908 0.2576 0.3163 0.3005 0.2906

Fusion Linear Fusion Bayesian Fusion 2015 Test Set an optimization problem where one tries to maximize the linear combination of these two values.

U (Sjq) = w R(Sjq) + (1 w) D(S) (1) where U denotes the score for the selected set S regarding to the query q, and w is a parameter which controls the importance of relevance and diversity. The parameter w is tuned using the development set.

Bayesian Inference: In this method the information is combined based on the rules of the probability theory [ 5 ]. The probability of a hypothesis H of diversi cation is:

P (HjR; D) = 1=2P (DjH)wd P (RjH)wr (2) where wd and wr are weights given to diversity and relevancy results.

EXPERIMENTS

We submitted 5 runs, varying on the use of relevancy results, pre- ltering, di erent clustering algorithms, and fusion methods. Details of the run con gurations are shown in Table 1. Run 1 is based on pure diversity results using image features. Run 2 uses only text information, we apply Word2Vec [ 8 ] semantic similarity. In Run 3, the input of diversity algorithm is the Run 2 ranked results. In this run, we leverage both modalities of text and image similarity in clustering the images. In the Run 4 and Run 5 we use two fusion methods of weighted linear and Bayesian reference on Run 1 (diversity) and Run 2 (relevancy) results.

Based on our development tests, we expected the Run 1 and Run 5 to achieve better results according to the F1 measure (Table 2). However, based on the test set results, we observe that the Run 3 obtains the best value for F1@10 with 0.43 and F1@20 with 0.57. One reason could be the multi-concept queries in the test runs. It shows that the semantic text similarity result (Run 2) as input to the clustering algorithms (Run 3) improved the F1 measure by 4%. We receive the best precision (0.82) in the Run 2 which is purely based on text similarity results.

In the experiments of this year, we added two runs based on fusion of relevancy and diversity results. In the development tests we reached the optimum weighting of 0:2 R + 0:8 D for both methods. Although we obtained better result with Bayesian inference approach in the development tests, with the test data, weighted linear fusion has the second place in F1@20 measure. This con rms the approach that Deselaers et al. [ 2 ] used in the score combination. However, Bayesian inference is usually used on classi cation results, which may explain why in our case the linear combination performed better on the test data.

In Table 3 we show separate results for single and multiconcept topics. We observe the same order of results here. The Run 3 keeps the best value of F1@20 and Run 2 the highest result in P@20. 4.

CONCLUSION

Our experiments show that the cluster ensemble with input of relevancy results (Run 3) provides robust results for this task. The input of this run was our relevancy results based on text semantic similarity results. This demonstrates that the combination of text similarity and diversity approach leads to higher F1@20 value. This year we added two fusion methods of weighted linear and Bayesian inference. Their results were indistinguishable on the devset, but the weighted linear fusion outperfomed the Bayesian on the testset.

[1]

P. K.

Atrey ,

M. A.

Hossain ,

El Saddik , and

M. S.

Kankanhalli . Multimodal fusion for multimedia analysis: a survey . Multimedia systems , 2010 .

[2]

Deselaers ,

Gass ,

Dreuw , and

Ney . Jointly optimising relevance and diversity in image retrieval . In Proceedings of the ACM international conference on image and video retrieval , 2009 .

[3]

Ionescu ,

Ginsca ,

Boteanu ,

Popescu ,

Lupu , and H. Muller. Retrieving Diverse Social Images at MediaEval 2015 : Challenge, Dataset and

Evaluation.

2015 .

[4]

Jain ,

Hare ,

Samangooei ,

Preston ,

Davies ,

Dupplaw , and

P. H.

Lewis . Experiments in diversifying ickr result sets . In MediaEval 2013 , 2013 .

[5]

R. C.

Luo , C.-C. Yih , and K. L. Su . Multisensor fusion and integration: approaches, applications, and future research directions . Sensors Journal, IEEE , 2002 .

[6]

Mikolov ,

Chen , G. Corrado, and

Dean . E cient estimation of word representations in vector space . arXiv preprint arXiv:1301.3781 , 2013 .

[7]

Palotti ,

Rekabsaz ,

Lupu , and

Hanbury . Tuw@ retrieving diverse social images task 2014 . In MediaEval, 2014 .

[8]

Rekabsaz ,

Bierig ,

Ionescu ,

Hanbury , and

Lupu . On the use of statistical semantics for metadata-based social image retrieval . In CBMI , 2014 .

[9]

Vee ,

Srivastava ,

Shanmugasundaram ,

Bhat , and

S. A.

Yahia . E cient computation of diverse query results . In Data Engineering , 2008 . ICDE 2008 . IEEE 24th International Conference on, 2008 .

[10] C.-N. Ziegler , S. M.

McNee , J. A.

Konstan , and G.

Lausen . Improving recommendation lists through topic diversi cation . In Proceedings of the 14th international conference on World Wide Web , 2005 .