AN APPLICATION OF ANT COLONY OPTIMIZATION TO IMAGE CLUSTERING Tomas Piatrik and Ebroul Izquierdo Multimedia & Vision Research Group, Queen Mary University of London ABSTRACT are based on specific and different visual cues, representing various aspects of the content. The aim for the machine is to Content-based image retrieval can be dramatically improved learn associations between complex combinations of low- by providing a good initial clustering of visual data. The level features and semantic concepts. In the literature, the problem of image clustering is that most current algorithms problem of the selection of important features is solved by are not able to identify individual clusters that exist in feature selection techniques [2]. These techniques derive a different feature subspaces. In this paper, we propose a set of important features for the classification and clustering novel approach for subspace clustering based on Ant analysis, however, they can only find one relevant feature Colony Optimization and its learning mechanism. The subset for all clusters. Therefore, instead of examining the proposed algorithm breaks the assumption that all of the dataset as a whole, subspace clustering algorithms localize clusters in a dataset are found in the same set of dimensions their search and are able to uncover clusters that exist in by assigning weights to features according to the local multiple, possibly overlapping subspaces. correlations of data along each dimension. Experiment In this paper we propose a novel feature selection results on real image datasets show the need for feature approach for clustering based on Ant Colony Optimization selection in clustering and the benefits of selecting features (ACO). Proposed method clusters the data and assigns locally. weights to features according to the local correlations of data along each dimension. In the next section we present a background on the proposed clustering technique. The evaluation using two image datasets are presented and some 1. INTRODUCTION conclusions are discussed in Section 3. In the past decade, there has been a rapid growth in the 2. CLUSTERING USING ANTS consumption of digital media, specifically, video and images. As the use of digital media increases, effective Recent developments in the heuristic optimization retrieval and management techniques become more techniques are concentrated on natural and biological important. A fundamental step towards content based image systems. The ACO is one of the most popular optimization annotation and retrieval is classification. The traditional techniques in the area of swarm intelligence [3]. The real supervised classification techniques assume prior power of ants resides in their colony brain and pheromone- knowledge to be available for all the thematic classes that driven communication within the colony. are present in the considered dataset. On the contrary, In our proposal, The ACO model plays its part in unsupervised classification does not require predefined assigning both images and feature weights to a cluster and knowledge about the data. The main task of classification each ant is giving its own clustering solution [4]. Each ant with unsupervised learning, commonly known as clustering, will assign each image xi , 1 ≤ i ≤ n to the cluster is to partition a given data set into groups (clusters) such that the data points in a cluster are more similar to each π u , 1 ≤ u ≤ k , with the probability P( xi ,π u ) . For each centroid other than points in different clusters [1]. The clustering of c u , and for each feature Fl , new feature weights are images to semantic meaningful classes involves many image computed according to intra-cluster similarities. For each attributes including a particular combination of colour, texture or shape features. The problem is that most current image xi , 1 ≤ i ≤ n , new generalized centroids are image clustering algorithms typically consider all features, recomputed and the assigned pheromone to each solution is or dimensions, of the data in an attempt to learn as much as incremented according to quality of the clustering. A widely possible about the objects. However, different low-level adopted definition of optimal clustering is a partitioning that image descriptors and similarity measures are not designed the intra cluster similarity is minimized while the inter to be combined naturally and straightforwardly in a cluster similarity is maximized. The algorithm stops when meaningful way. The low-level descriptors used in this all ants achieve the same clustering results. work have different scales in different dimensions and they (a) (b) Fig. 1. Several representative images for different categories taken from the a) Corel Image dataset, b) Flickr dataset 3. EXPERIMENTAL EVALUATION AND algorithm less dependent on the initial parameters; hence it CONCLUSION makes it more stable. Furthermore, our experiment results on real-world image datasets show the need for local feature In this section we evaluate the proposed algorithm using selection. In our future work we will investigate the issue of real-world image datasets. We compare our proposed automatically finding the optimal number of clusters. subspace clustering algorithm based on ACO (denoted by 4. ACKNOWLEDGMENT SC-ACO) with PROCLUS, and global feature selection based on K-Means (denoted by GFS-K-Means), and K- The research leading to this paper was supported by the Means without feature selection. For visual representation European Commission under contract FP6-027026, of images, following low-level features (descriptors) are Knowledge Space semantic inference for automatic used in this paper: Colour Layout (CLD), Colour Structure annotation and retrieval of multimedia content – K-Space. (CSD), Dominant Colour (DCD), Edge Histogram (EHD) and Grey Level Co-occurrence Matrix (GLC). First dataset 5. REFERENCES was obtained from The Corel Image database and includes 600 images divided to 6 categories, each consist of 100 [1] Xu, R., and Wunch, D. 2005. Survey of Clustering Algorithms. images. Second dataset consist of 500 images taken from IEEE Trans. Neural Network, Vol.6, No.3. 645-678 Flickr which are segmented into regions and manually annotated [5]. [2] Molina, L. C., Belanche, L., and Nebot, A. 2002. Feature Selection Algorithms: A Survey and Experimental Evaluation. Average Error Rates Technical Report LSI-02-62-R (Universitat Politècnica de Method Catalunya, Barcelona, Spain, 2002) Corel set Flickr set [3] Colorni, A., Dorigo, M., Maniezzo, V. 1991. Distributed SC-ACO 0.35±0.02 0.25±0.03 optimization by ant colonies. In Proceedings of ECAL'91 PROCLUS 0.37±0.05 0.3±0.06 European Conference on Artificial Life (Amsterdam, Netherlands GFS-K-Means 0.47±0.07 0.45±0.07 1991). Elsevier Publishing, 134-142 K-Means 0.52±0.09 0.41±0.08 [4] Piatrik, T. and Izquierdo, E. 2006. Image Classification using Table 1. Average error rates for clustering results an Ant Colony Optimization approach. In Proceedings of 1st Representative samples of images from both datasets are International Conference on Semantic and Digital Media depicted in Figure 1. From experimental results in Table 1 Technologies (Athens, Greece, December 6-8, 2006) can be seen that our proposed algorithm performs the best [5] Papadopoulos, T. G., Chandramouli, K., Mezaris, V., for both datasets. All tested methods depend on Kompatsiaris, I., Izquierdo, E., and Strintzis, M. G. 2008 A initialization of centroids what causes unstable clustering. Comparative Study of Classification Techniques for Knowledge- From experimental results on both synthetic and real Assisted Image Analysis. In Proc. 9th International Workshop on datasets can be seen that the ACO makes the clustering Image Analysis for Multimedia Interactive services.