=Paper=
{{Paper
|id=Vol-1177/CLEF2011wn-ImageCLEF-FariaEt2011
|storemode=property
|title=RECOD at ImageCLEF 2011: Medical Modality Classification using Genetic Programming
|pdfUrl=https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-FariaEt2011.pdf
|volume=Vol-1177
}}
==RECOD at ImageCLEF 2011: Medical Modality Classification using Genetic Programming==
RECOD at ImageCLEF 2011: Medical Modality Classification using Genetic Programming Fábio Augusto Faria, Rodrigo Tripodi Calumby, and Ricardo da Silva Torres Institute of Computing - University of Campinas Campinas, São Paulo, Brazil {ffaria,tripodi,rtorres}@ic.unicamp.br http://www.ic.unicamp.br Abstract. This paper describes the participation of the RECOD group on the ImageCLEF 2011 Medical Modality Classification sub-task. We present an approach based on genetic programming and kNN for image classification. In our approach the genetic programming is used for the learning of good functions for the combination of similarities obtained from a set of global descriptors for different visual evidences such as color, texture, and shape. For each class of the dataset a combination function was learned and used as a kNN classifier. Final classification results were generated by a majority voting scheme with the voting functions from each class. Preliminary experiments have shown a good effectiveness of the approach and its potential for improvements. Keywords: Genetic Programming, Medical Images, Image Classifica- tion, Pattern Recognition 1 Introduction Currently with the advance of the technology of acquisition and storage of images along with the popularization of the Internet use we can notice the emerging of big image collections. Furthermore the increase of storage and processing power makes possible the use of intelligent computer systems for image manipulation based on machine learning. Machine learning techniques are used, for instance, on image processing, pattern analysis and recognition, and image retrieval, and classification [1]. In the medical field, the classification task is conducted for several types of images (e.g., x-ray, CT, and endoscopy) [2, 7, 4]. In [2], learning techniques are used for the detection and classification of benign and malign tumors and in [3] it is used to determine the gestational age of newborns. Moreover intelligent image classification approaches are used in several different fields such as diseases diagnosis [20], biometrics [6, 14], biology [23], and remote sensing [18]. In the image classification task, descriptors can be used to extract visual features from the images. A visual descriptor can be described by two func- tions; (1) an extraction algorithm that encodes visual characteristics (e.g., color, texture, and shape) into a feature vector; (2) a similarity measure (distance 2 Fábio Augusto Faria, Rodrigo Tripodi Calumby, and Ricardo da Silva Torres function) that computes the similarity of two images using their respective fea- ture vectors. Hence different descriptors potentially describes different features for distinct visual evidences or even for the same evidence. For instance, the BIC [19] and JAC [24] descriptors characterize the color evidence by different means. One descriptor may be more effective for a dataset than another one, but no descriptor is perfect for all collections. Therefore the use of learning techniques to combine different visual evidences has the objective of increasing the effectiveness of the classifiers by taking ad- vantage of the power of different but potentially complementary descriptors. In this work we propose the use of Genetic Programming as a learning tech- nique together with kNN classifiers (GP+kNN) for image classification tasks. This technique has been used in several application such as data mining, signal processing, image retrieval and classification [12, 5, 8, 26, 17]. The remaining of this text is organized as follows: Section 2 briefly describes the genetic programming technique. Section 3 presents our proposed approach for image classification using genetic programming. In Section 4 we present ex- perimental setup and results for the ImageCLEF 2011 Medical Modality Classi- fication. Finally, Section 5 presents our conclusions and future work. 2 Genetic Programming Genetic Programming [13] is a machine learning technique based on biology evo- lutionary concepts, such as natural selection and survival. In this context each potential solution is seen as an individual evolving in a population. Evolution- ary transformations are iteratively applied on the populations thus simulating sequential generations of individuals. Hence the individuals evolve by genetic transformations such as reproduction, mutation and crossover. In general, GP individuals are encoded in a tree structure that represent programs that will evolve throughout the solution space towards a good solution for the problem. A fitness function is used to evaluate the individuals, and then it is possible to measure the quality of the solutions found in the evolutionary process. A basic evolution algorithm for genetic programming is presented in Algorithm 1. Algorithm 1 Basic GP evolution algorithm. 1: Generate initial population of individuals 2: for N generations do 3: Calculate the fitness of each individual 4: Select the individuals to genetic operations 5: Apply reproduction 6: Apply crossover 7: Apply mutation 8: end for Modality Classification using Genetic Programming 3 At first the initial population is generated, usually in random fashion (line 1). The iterative process starts for the evolution of the population through N gen- erations (line 2). For each generation the fitness of the individuals are computed (line 3) and the genetic operators are then applied to the selected individuals. In order to evolve the population some individuals are properly selected (line 4) and they are subjected to genetic operators. The operators are responsible to introduce variability on the solutions making the moving through the solution space possible and consequently the discovery of better solutions. The reproduc- tion operator just copies an individual to next generation (line 5). The crossover operator takes two individuals and exchange sub-trees from them creating two new individuals (line 6). The mutation operator just selects a random sub-tree from an individual and exchanges it for a new randomly generated sub-tree (line 7). In our approach tree-based GP individuals are used to encode candidate functions that combine similarity measures obtained from different visual de- scriptors. In these trees inner nodes are arithmetic operators and the leaves are input nodes for similarity values computed using visual descriptors. Figure 1 presents an example combination function δDP G that combines four different visual descriptors di . (a) (b) Fig. 1. (a) Individual represented as a tree (b) Corresponding similarity function. Details of our GP-based classification approach are presented in Section 3. 3 GP+kNN Classification Method The approach for visual image classification of this paper was adapted from the one originally proposed in [25] for textual documents. Our approach is presented in Algorithm 2. At the beginning the classifier is trained by the search of the best GP indi- viduals (Ici ), which means the search for the best similarity functions for each class ci using the training set (D) and the validation set (V) (lines 1 to 9). For 4 Fábio Augusto Faria, Rodrigo Tripodi Calumby, and Ricardo da Silva Torres Algorithm 2 Approach for image classification using GP and kNN classifiers Require: Training set (D), validation set (V), and test set (T ), number of classes (Nc ), number of generations (Ngen ) 1: for all i | 1