=Paper=
{{Paper
|id=Vol-1436/Paper40
|storemode=property
|title=TUW @ MediaEval 2015 Retrieving Diverse Social Images Task
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper40.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/SabetghadamPRLH15
}}
==TUW @ MediaEval 2015 Retrieving Diverse Social Images Task==
TUW @ MediaEval 2015 Retrieving Diverse Social Images Task Serwah Sabetghadam, João Palotti, Navid Rekabsaz, Mihai Lupu, Allan Hanbury Favoriten Strasse 9-11/188 Vienna University of Technology Vienna, Austria {sabetghadam, palotti, rekabsaz, lupu, hanbury}@ifs.tuwien.ac.at ABSTRACT 2.2 Diversity This paper describes the contributions of Vienna Univer- To find diverse images we experiment with different clus- sity of Technology (TUW) to the MediaEval 2015 Retriev- tering methods. From [7] we learned that an approach based ing Diverse Social Images challenge. Our approach consists on ensemble of clusters can perform better than using only of 3 phases: (1) Precision-oriented-phase: in which we focus one single clustering method. We also learned that a pre- only on the relevance of the documents; (2) Recall-oriented- filtering step can potentially remove irrelevant images that phase: in which we focus only on the diversity aspect; (3) harm the process of clustering creation. Here we briefly Merging phase: in which we explore ways to find a balance comment on these two aspects: between the relevance and diversity factors. We use two Pre-Filtering: We use hand-coded rules previously shown fusion methods for this last part. Our best run reached a to perform well in this task, to exclude probably irrelevant F1@20 of 0.582. pictures [4]. We exclude pictures based on three rules: with- out any views, geo-tagged 8km away from the POI, or with description length greater than 2000 characters. 1. INTRODUCTION Clustering solution: The basic idea is that, given a Result diversification has recently attracted much atten- clustering algorithm A, a feature set F that describes an tion in the IR community. Often, the information need re- image and a distance measure Di, we can create a cluster quested by the users cannot be found by displaying items set C = (A, F, Di). For example, C1 can be the result of related only to one facet of the query topic. Ideally an IR applying K-Means (A) using the Color Histogram of the system displays pieces of information covering diverse sub- images (F ), based on the cosine distance (Di): C1 = (K- topics of the query. The same idea has been used in the Means, ColorHistogram, Cosine). Recommender System area, where diversification techniques A common strategy used by a number of teams in 2013 has shown to increase user satisfaction [9, 10]. was to go one by one of the clusters made in C1 and pick the This paper describes the second participation of our team ”best“ image from each cluster to form the final ranked list. at MediaEval Retrieving Diverse Social Images task [3]. We We noticed that small differences, for example having C2 build our solution upon our previous participation [7]. Last = (K-means, NeuralNetworkFeatures, Cosine), could have a year, we had good results in both precision and recall, but large impact in the clusters formed, consequently strongly in separated runs. Therefore, we decided to explore different influencing the final ranked list. As described in [7], our strategies for better fusing our individual runs. solution is to use the development set to learn what are the best clustering algorithm, features sets, and distance measures. After that, we combine the results of different 2. METHODS Cs and count the frequency that any two images end up in We leveraged a distinct set of methods for each run. We the same cluster. Based on this simple frequency, we re- show the combinations used for each run in Table 1. rank the initial Flickr list (Run 1) or the list generate by the algorithm in Section 2.1 (Run 3). 2.1 Relevancy 2.3 Fusion Regarding the experience of the previous year [7], we use only textual features (i.e. title, tag, description) for find- Atrey et al. [1] performed a survey on fusion methods ing the relevant documents. We extend the usual term- of combining multiple modalities. In their view, there are frequency-based methods to more semantic-based approach three category of methods for fusion: rule-based methods, [8]. We create word embeddings using the Wikipedia corpus classification-based methods and estimation-based methods. with 400 dimension by Word2Vec method [6]. We calculate Our approach is inspired from these fusion methods for com- the similarity between the query and the text documents bining relevancy and diversity results. We leverage the weigh- (concatanation of title, tag, and description) using the Sim- ted linear method from the first category, and Bayesian in- Greedy method [8]. ference from the second category. Weighted Linear: We use the optimization technique proposed by Deselaers et al. [2] based on weighted linear fusion. Having the relevance of the query to each docu- Copyright is held by the author/owner(s). ment (R) and also the diversification measure for each set MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany of documents (D), we formulate the diversification issue as Table 1: Official runs setup. Features used in clustering are Combined on CN3x3 and CNN in all runs [3]. The relevancy is based on the Word2Vec method [6] - see Section 2.1.; diversity approach is presented in Section 2.2.; fusion mechanism in Section 2.3. Type Relevancy Diversity Fusion Run Pre-Filtering Clustering 1 image - Based on [4] X - 2 text X - - - 3 text, image X - X - 4 text, image X - X Linear Fusion 5 text, image X Based on [4] X Bayesian Fusion Table 2: Results for the development and test set at various cutoff points. Official metric is F1@20. 2015 Development Set 2015 Test Set Run P@10 CR@10 F1@10 P@20 CR@20 F1@20 P@10 CR@10 F1@10 P@20 CR@20 F1@20 1 0.8137 0.2873 0.4189 0.7817 0.4713 0.5806 0.7201 0.2908 0.4025 0.7058 0.4705 0.5487 2 0.8203 0.2188 0.3389 0.798 0.3485 0.4766 0.7842 0.2576 0.3728 0.7687 0.3914 0.4968 3 0.7804 0.2748 0.4014 0.7546 0.4531 0.5583 0.7633 0.3163 0.4309 0.7309 0.4963 0.5727 4 0.8157 0.2867 0.4184 0.7782 0.4616 0.5741 0.7345 0.3005 0.4128 0.7291 0.4767 0.5601 5 0.8137 0.2873 0.4189 0.7804 0.4706 0.5796 0.7216 0.2906 0.4026 0.7076 0.4702 0.5492 Table 3: Results based on single and multi topics - the best run according to the official metric is Run3. 2015 Test Set - Single-concept queries 2015 Test Set - Multi-concept queries Run P@10 CR@10 F1@10 P@20 CR@20 F1@20 P@10 CR@10 F1@10 P@20 CR@20 F1@20 1 0.7232 0.2904 0.4045 0.6942 0.4807 0.553 0.7171 0.2912 0.4005 0.7171 0.4605 0.5445 2 0.8188 0.2589 0.3772 0.808 0.4038 0.5202 0.7500 0.2563 0.3684 0.7300 0.3793 0.4738 3 0.7928 0.3237 0.4443 0.7326 0.5037 0.5802 0.7343 0.3091 0.4177 0.7293 0.489 0.5654 4 0.7507 0.3062 0.4206 0.7355 0.4798 0.5664 0.7186 0.2948 0.4052 0.7229 0.4737 0.554 5 0.7261 0.2907 0.4053 0.6935 0.4788 0.5515 0.7171 0.2905 0.4000 0.7214 0.4617 0.5469 an optimization problem where one tries to maximize the semantic text similarity result (Run 2) as input to the clus- linear combination of these two values. tering algorithms (Run 3) improved the F1 measure by 4%. U (S|q) = w ∗ R(S|q) + (1 − w) ∗ D(S) (1) We receive the best precision (0.82) in the Run 2 which is where U denotes the score for the selected set S regarding purely based on text similarity results. to the query q, and w is a parameter which controls the In the experiments of this year, we added two runs based importance of relevance and diversity. The parameter w is on fusion of relevancy and diversity results. In the develop- tuned using the development set. ment tests we reached the optimum weighting of 0.2 · R + Bayesian Inference: In this method the information is 0.8·D for both methods. Although we obtained better result combined based on the rules of the probability theory [5]. with Bayesian inference approach in the development tests, The probability of a hypothesis H of diversification is: with the test data, weighted linear fusion has the second P (H|R, D) = 1/2P (D|H)wd P (R|H)wr (2) place in F1@20 measure. This confirms the approach that where wd and wr are weights given to diversity and rele- Deselaers et al. [2] used in the score combination. However, vancy results. Bayesian inference is usually used on classification results, which may explain why in our case the linear combination 3. EXPERIMENTS performed better on the test data. In Table 3 we show separate results for single and multi- We submitted 5 runs, varying on the use of relevancy re- concept topics. We observe the same order of results here. sults, pre-filtering, different clustering algorithms, and fu- The Run 3 keeps the best value of F1@20 and Run 2 the sion methods. Details of the run configurations are shown highest result in P@20. in Table 1. Run 1 is based on pure diversity results using image features. Run 2 uses only text information, we apply Word2Vec [8] semantic similarity. In Run 3, the input of diversity algorithm is the Run 2 ranked results. In this run, 4. CONCLUSION we leverage both modalities of text and image similarity in Our experiments show that the cluster ensemble with in- clustering the images. In the Run 4 and Run 5 we use two put of relevancy results (Run 3) provides robust results for fusion methods of weighted linear and Bayesian reference on this task. The input of this run was our relevancy results Run 1 (diversity) and Run 2 (relevancy) results. based on text semantic similarity results. This demonstrates Based on our development tests, we expected the Run 1 that the combination of text similarity and diversity ap- and Run 5 to achieve better results according to the F1 mea- proach leads to higher F1@20 value. This year we added sure (Table 2). However, based on the test set results, we two fusion methods of weighted linear and Bayesian infer- observe that the Run 3 obtains the best value for F1@10 ence. Their results were indistinguishable on the devset, but with 0.43 and F1@20 with 0.57. One reason could be the the weighted linear fusion outperfomed the Bayesian on the multi-concept queries in the test runs. It shows that the testset. 5. REFERENCES [1] P. K. Atrey, M. A. Hossain, A. El Saddik, and M. S. Kankanhalli. Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 2010. [2] T. Deselaers, T. Gass, P. Dreuw, and H. Ney. Jointly optimising relevance and diversity in image retrieval. In Proceedings of the ACM international conference on image and video retrieval, 2009. [3] B. Ionescu, A. Ginsca, B. Boteanu, A. Popescu, M. Lupu, and H. Müller. Retrieving Diverse Social Images at MediaEval 2015: Challenge, Dataset and Evaluation. 2015. [4] N. Jain, J. Hare, S. Samangooei, J. Preston, J. Davies, D. Dupplaw, and P. H. Lewis. Experiments in diversifying flickr result sets. In MediaEval 2013, 2013. [5] R. C. Luo, C.-C. Yih, and K. L. Su. Multisensor fusion and integration: approaches, applications, and future research directions. Sensors Journal, IEEE, 2002. [6] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [7] J. Palotti, N. Rekabsaz, M. Lupu, and A. Hanbury. Tuw@ retrieving diverse social images task 2014. In MediaEval, 2014. [8] N. Rekabsaz, R. Bierig, B. Ionescu, A. Hanbury, and M. Lupu. On the use of statistical semantics for metadata-based social image retrieval. In CBMI, 2014. [9] E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. Efficient computation of diverse query results. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, 2008. [10] C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web, 2005.