=Paper=
{{Paper
|id=Vol-1739/MediaEval_2016_paper_12
|storemode=property
|title=An Adaptive Clustering Approach for the Diversification of Image Retrieval Results
|pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_12.pdf
|volume=Vol-1739
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Zaharieva16
}}
==An Adaptive Clustering Approach for the Diversification of Image Retrieval Results==
An Adaptive Clustering Approach for the Diversification of Image Retrieval Results Maia Zaharieva∗ Interactive Media Systems Group, Vienna University of Technology, Austria Multimedia Information Systems Group, University of Vienna, Austria maia.zaharieva@tuwien.ac.at ABSTRACT In this paper, we explore the application of an adaptive clus- tering approach for the diversification of image retrieval re- sults in the context of the MediaEval 2016 Retrieving Di- verse Social Images Task. The proposed approach exploits available textual descriptions, the visual content of the im- ages, and a set of common clustering techniques to select (a) sailing boats on sea (b) trees reflected in water the best combination for each image query individually and in an unsupervised manner. Figure 1: Retrieval results for two image queries. Each line corresponds to a desired image groupings. 1. INTRODUCTION The immense amount on publicly available media con- to efficiently describe the retrieval results in terms of diver- tent commonly challenges end users in making use of the sification of the final image set. Figure 1 shows examples for broad variety of accessible data. As a result, a lot of recent desired image groupings for two queries: sailing boats on sea research focuses on the optimization of retrieval results in and trees reflected in water. While the results of the first terms of improved relevance estimation and increased diver- query indicate potential relevance of the overall composi- sification [3, 6, 13, 21]. The MediaEval Retrieving Diverse tion and edge-based descriptors, the second query suggests Social Images task fosters the development and comparabil- the use of color-based descriptors. On the contrary, color ity of algorithms in this context [12]. The goal of the task information is less meaningful for the first query since the in 2016 is to refine a set of images retrieved from Flickr as retrieved images exhibit common color settings. Similarly, result of a general (and often multi-topic) query. edges do not provide enough discriminative power to support Increasing the relevance commonly leads to decreased di- the building of the desired image groupings for the second versity of the underling image set and vice versa. The mag- query. Therefore, in our study we consider a set of commonly nitude of this reciprocal effect is difficult to estimate for dif- employed visual- and text-based descriptors to represent the ferent data settings. Therefore, in order to exploit the real image data. Next to the two convolutional neural network potential of a clustering-based approach for data diversifica- (CNN)-based descriptors provided by the organizers [12], we tion, we consider all images as relevant for the given query. additionally employ the first 36 coefficients of the discrete To address a general image retrieval scenario with arbitrary cosine transform (DCT) [1], intensity histogram (IH) [10], queries, we only consider commonly available textual infor- KANSEI shape descriptor [15], and six MPEG-7 visual de- mation (title, description, tags) and the visual content of scriptors [4, 18]: color layout (CL), color structure (CS), the images. The proposed approach does not make any as- edge histogram (EH), homogeneous texture (HT), region- sumptions about the initial image query or data characteris- based shape (RS), and scalable color (SC). As text-based tics. Moreover, the approach autonomously selects the best features we consider the well-established term frequency- combination of image descriptions and clustering approach inverse document frequency (TF-IDF) [14]. We compute the for each query individually and, thus, it is fully adaptive for TF-IDF vector for each image using the available textual de- different queries and data settings. Preliminary experiments scription (title, tags, and descriptions) in combination and demonstrate the generalization ability of the proposed ap- individually. The textual descriptions are first preprocessed proach and both its potentials and limitations. to increase their expressiveness, i.e., we remove potential occurrences of the corresponding user name, web links, and 2. APPROACH stopwords and we additionally stem all remaining terms. The fundamental assumption behind the proposed ap- The unsupervised detection of existing groupings in a data- proach is that different queries require for different features set is commonly performed by means of a clustering algo- rithm. However, the choice of a clustering approach is not a ∗ This work has been partly funded by the Vienna Science trivial decision. Different clustering approaches commonly and Technology Fund (WWTF) through project ICT12-010. address different data characteristics. Furthermore, poten- tial clustering parameters usually require for an additional Copyright is held by the author/owner(s). MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, parameter tuning process. In this context, clustering in- Netherlands. ternal validation indices are commonly applied in order to Table 1: Optimal solution: visual features. Num- Table 2: Experiments on the development dataset. bers in the cells correspond to topic IDs. Approach P@20 CR@20 F1@20 Baseline: Flickr 0.6979 0.3717 0.4674 P Feature AP EM KM XM CNN ad 58 11, 25 29 38, 51 6 Optimal solution: CNN gen 4, 33, 39, 59 17, 54 6 visual features 0.8179 0.6575 0.7122 DCT 48 19, 60, 67 50 5 text features 0.8136 0.6453 0.7043 IH 45 2 28, 53 6, 65 6 visual + text 0.8250 0.6634 0.7186 KANSEI shape 9, 20, 26, 36, 46 13, 57 7 Best performing fixed settings: MPEG7 CL 22, 47, 55 5, 31, 61 12, 18, 21 9 MPEG7 CS 15 7, 62, 70 4 visual features 0.6657 0.4453 0.5237 MPEG7 EH 32, 49, 68 14, 34, 40, 42, 52 8 text features 0.6636 0.4274 0.5045 MPEG7 HT 1, 44 8, 27, 63 24 6 visual + text 0.6657 0.4453 0.5237 MPEG7 RS 23, 69 10, 41 35, 64, 66 7 Proposed adaptive approach: MPEG7 SC P 3, 43, 56 16, 30, 37 6 visual features 0.6500 0.4398 0.5123 3 25 26 16 70 text features 0.6729 0.4230 0.5029 visual+text 0.6286 0.4061 0.4803 assess the quality of a clustering solution in terms of com- pactness and/or separability of the detected clusters [16]. Table 3: MediaEval 2016 benchmark results. We exploit the performance of several, broadly employed validation indices covering different aspects of a clustering Run Configuration P@20 CR@20 F1@20 solutions: 1) compactness in terms of sum of squares and run 1 Adaptive, visual features 0.5141 0.4024 0.4292 run 2 Adaptive, text features 0.5406 0.4130 0.4463 C-index [11], 2) separability by means of single linkage dis- run 3 Adaptive, visual+text features 0.5430 0.4130 0.4471 tance between two clusters, 3) combination of compactness run 4 Fixed settings, visual features 0.4969 0.3603 0.4006 and separability in terms of Calinski-Harabasz [5], Davies- Bouldin [7], and Silhouette [20], and 4) consistency compar- ison measures by means of Gamma and Tau indices [2]. We Additionally, the achieved performance in terms of F1@20- employ the clustering validation measures to select the best score significantly outperforms the baseline defined by the combination of image feature and clustering algorithm for original Flickr result. The best performing clustering solu- each query individually. As clustering methods we inves- tion considers the DCT feature in combination with the KM tigate two model-based approaches: Affinity Propagation clustering approach and the C validation index. This result (AP) [9] and expectation maximization (EM) [8], and two is notably lower than the achievable performance. Addition- partitional approaches: k-means (KM) [17] and X-means ally, the combination is selected based on experiments on a (XM) [19]. The clustering methods were selected for their single dataset and, thus, overfitting the data. Experiments efficiency. Additionally, the only parameter tuning concerns with the performance of the clustering validation indices in- the potential specification of the expected number of clus- dicated the C-index as the best performing validation mea- ters, k. In this case, we perform clustering for various set- sure stressing the importance of compactness for the final tings, k = {5, 10, 20, 30, 40, 50}, and consider each clustering clustering solution. The results achieved using the proposed solution individually. The final selection of a clustering so- adaptive approach in combination with the C-index are pre- lution for a given query is based on the quality assessment sented in Table 2. The results are slightly lower than the of all possible combinations between the considered cluster- best performing fixed setting yet not overfitting the data. ing approaches and the employed image features. The final Table 3 summarizes the results on the test set. Runs 1 − selection of images from the clusters follows a Round-Robin 3 employ the adaptive approach while run 4 exploits the approach according to the Flickr-provided relevance scores. best performing fixed settings on the development set as We start by selecting the image with the best relevance score proof-of-concept for its overfitting. The proposed approach from each cluster. These images, sorted in ascending order, adapts well to the new dataset indicated by the difference in constitute the m highest ranked results, where m is the num- the selected features and clustering approaches. The fixed ber of detected clusters. The selected images are removed selection of image feature and clustering approach in run 4 from their clusters and the selection process is repeated until performs lowest as expected. Overall, the results are lower the required number of retrieved results is achieved. in comparison to the development dataset. However, this can only be evaluated in relation to the baseline which is not available for the test set yet. 3. EXPERIMENTAL RESULTS In our first experiment we investigate whether or not dif- ferent datasets require for different features. For this pur- 4. CONCLUSION pose we perform clustering on the development dataset us- In this paper we presented an initial study of the applica- ing all possible combinations between the considered clus- bility of an unsupervised and adaptive clustering-based ap- tering approaches (including potential clustering configura- proach for the diversification of image retrieval results. The tions) and the employed visual- and text-based features. results indicate that different image queries favor different The best clustering solution is selected using the ground image representations and different clustering methods. The truth information in terms of highest F1@20-score. Due to optimal solution for a dataset achieves an outstanding per- space limitations Table 1 shows an overview of the optimal formance. However, the considered validation indices could clustering combinations for each topic (query) using the vi- not reflect the optimal solution. This might be due to the sual features only. The achieved results are presented in Ta- fact that different clustering solutions require for different ble 2 (see Optimal solution). The balanced distribution of validations. Our future work will exploit the potential of the clustering solutions across the employed visual features combining clustering validation indices in order to approach indicate that different image queries favor different features. the best achievable performance. 5. REFERENCES A. Yamada. Color and texture descriptors. IEEE [1] N. Ahmed, T. Natarajan, and K. R. Rao. Discrete Transactions on Circuits and Systems for Video cosine transform. IEEE Transactions on Computers, Technology, 11(6):703–715, Jun 2001. C-23(1):90–93, 1974. [19] D. Pelleg and A. W. Moore. X-means: Extending [2] F. B. Baker and L. J. Hubert. Measuring the power of k-means with efficient estimation of the number of hierarchical cluster analysis. Journal of the American clusters. In International Conference on Machine Statistical Association, 70(349):31–38, 1975. Learning (ICML), pages 727–734, 2000. [3] G. Boato, D.-T. Dang-Nguyen, O. Muratov, [20] P. Rousseeuw. Silhouettes: A graphical aid to the N. Alajlan, and F. G. B. De Natale. Exploiting visual interpretation and validation of cluster analysis. saliency for increasing diversity of image retrieval Journal of Computational and Applied Mathematics, results. Multimedia Tools and Applications, 20(1):53–65, 1987. 75(10):5581–5602, 2016. [21] E. Spyromitros-Xioufis, S. Papadopoulos, A. L. [4] M. Bober. Mpeg-7 visual shape descriptors. IEEE Ginsca, A. Popescu, Y. Kompatsiaris, and I. Vlahavas. Transactions on Circuits and Systems for Video Improving diversity in image search via supervised Technology, 11(6):716–719, Jun 2001. relevance scoring. In ACM International Conference [5] T. Calinski and J. Harabasz. A dendrite method for on Multimedia Retrieval, pages 323–330, 2015. cluster analysis. Communications in Statistics, 3(1):1–27, 1974. [6] D. T. Dang-Nguyen, L. Piras, G. Giacinto, G. Boato, and F. G. B. D. Natale. A hybrid approach for retrieving diverse social images of landmarks. In IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, June 2015. [7] D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, 1979. [8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38, 1977. [9] B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972–976, 2007. [10] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice-Hall, Inc., 3rd edition, 2006. [11] L. J. Hubert and J. R. Levin. A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin, 83(6):1072–1080, 1978. [12] B. Ionescu, A. L. Ginsca, M. Zaharieva, B. Boteanu, M. Lupu, and H. Müller. Retrieving diverse social images at MediaEval 2016: Challenge, dataset and evaluation. In MediaEval 2016 Multimedia Benchmark Workshop, 2016. [13] B. Ionescu, A. Popescu, A.-L. Radu, and H. Müller. Result diversification in social image retrieval: a benchmarking framework. Multimedia Tools and Applications, 75(2):1301–1331, 2016. [14] K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21, 1972. [15] H. Kobayashi, Y. Okouchi, and S. Ota. Image retrieval system using KANSEI features. In Pacific Rim International Conference on Artificial Intelligence: Topics in Artificial Intelligence, pages 626–635, 1998. [16] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu. Understanding of internal clustering validation measures. In IEEE International Conference on Data Mining (ICDM), pages 911–916, 2010. [17] S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982. [18] B. Manjunath, J.-R. Ohm, V. Vasudevan, and