=Paper= {{Paper |id=Vol-1177/CLEF2011wn-ImageCLEF-FariaEt2011 |storemode=property |title=RECOD at ImageCLEF 2011: Medical Modality Classification using Genetic Programming |pdfUrl=https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-FariaEt2011.pdf |volume=Vol-1177 }} ==RECOD at ImageCLEF 2011: Medical Modality Classification using Genetic Programming== https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-FariaEt2011.pdf

RECOD at ImageCLEF 2011: Medical Modality
Classification using Genetic Programming

Fábio Augusto Faria, Rodrigo Tripodi Calumby, and Ricardo da Silva Torres

Institute of Computing - University of Campinas
Campinas, São Paulo, Brazil
{ffaria,tripodi,rtorres}@ic.unicamp.br
http://www.ic.unicamp.br

Abstract. This paper describes the participation of the RECOD group
on the ImageCLEF 2011 Medical Modality Classification sub-task. We
present an approach based on genetic programming and kNN for image
classification. In our approach the genetic programming is used for the
learning of good functions for the combination of similarities obtained
from a set of global descriptors for different visual evidences such as color,
texture, and shape. For each class of the dataset a combination function
was learned and used as a kNN classifier. Final classification results were
generated by a majority voting scheme with the voting functions from
each class. Preliminary experiments have shown a good effectiveness of
the approach and its potential for improvements.

Keywords: Genetic Programming, Medical Images, Image Classifica-
tion, Pattern Recognition

1 Introduction
Currently with the advance of the technology of acquisition and storage of images
along with the popularization of the Internet use we can notice the emerging of
big image collections. Furthermore the increase of storage and processing power
makes possible the use of intelligent computer systems for image manipulation
based on machine learning. Machine learning techniques are used, for instance,
on image processing, pattern analysis and recognition, and image retrieval, and
classification [1].
In the medical field, the classification task is conducted for several types
of images (e.g., x-ray, CT, and endoscopy) [2, 7, 4]. In [2], learning techniques
are used for the detection and classification of benign and malign tumors and
in [3] it is used to determine the gestational age of newborns. Moreover intelligent
image classification approaches are used in several different fields such as diseases
diagnosis [20], biometrics [6, 14], biology [23], and remote sensing [18].
In the image classification task, descriptors can be used to extract visual
features from the images. A visual descriptor can be described by two func-
tions; (1) an extraction algorithm that encodes visual characteristics (e.g., color,
texture, and shape) into a feature vector; (2) a similarity measure (distance
2 Fábio Augusto Faria, Rodrigo Tripodi Calumby, and Ricardo da Silva Torres

function) that computes the similarity of two images using their respective fea-
ture vectors. Hence different descriptors potentially describes different features
for distinct visual evidences or even for the same evidence. For instance, the
BIC [19] and JAC [24] descriptors characterize the color evidence by different
means. One descriptor may be more effective for a dataset than another one,
but no descriptor is perfect for all collections.
Therefore the use of learning techniques to combine different visual evidences
has the objective of increasing the effectiveness of the classifiers by taking ad-
vantage of the power of different but potentially complementary descriptors.
In this work we propose the use of Genetic Programming as a learning tech-
nique together with kNN classifiers (GP+kNN) for image classification tasks.
This technique has been used in several application such as data mining, signal
processing, image retrieval and classification [12, 5, 8, 26, 17].
The remaining of this text is organized as follows: Section 2 briefly describes
the genetic programming technique. Section 3 presents our proposed approach
for image classification using genetic programming. In Section 4 we present ex-
perimental setup and results for the ImageCLEF 2011 Medical Modality Classi-
fication. Finally, Section 5 presents our conclusions and future work.

2 Genetic Programming

Genetic Programming [13] is a machine learning technique based on biology evo-
lutionary concepts, such as natural selection and survival. In this context each
potential solution is seen as an individual evolving in a population. Evolution-
ary transformations are iteratively applied on the populations thus simulating
sequential generations of individuals. Hence the individuals evolve by genetic
transformations such as reproduction, mutation and crossover.
In general, GP individuals are encoded in a tree structure that represent
programs that will evolve throughout the solution space towards a good solution
for the problem. A fitness function is used to evaluate the individuals, and then
it is possible to measure the quality of the solutions found in the evolutionary
process. A basic evolution algorithm for genetic programming is presented in
Algorithm 1.

Algorithm 1 Basic GP evolution algorithm.
1: Generate initial population of individuals
2: for N generations do
3: Calculate the fitness of each individual
4: Select the individuals to genetic operations
5: Apply reproduction
6: Apply crossover
7: Apply mutation
8: end for
Modality Classification using Genetic Programming 3

At first the initial population is generated, usually in random fashion (line 1).
The iterative process starts for the evolution of the population through N gen-
erations (line 2). For each generation the fitness of the individuals are computed
(line 3) and the genetic operators are then applied to the selected individuals.
In order to evolve the population some individuals are properly selected (line 4)
and they are subjected to genetic operators. The operators are responsible to
introduce variability on the solutions making the moving through the solution
space possible and consequently the discovery of better solutions. The reproduc-
tion operator just copies an individual to next generation (line 5). The crossover
operator takes two individuals and exchange sub-trees from them creating two
new individuals (line 6). The mutation operator just selects a random sub-tree
from an individual and exchanges it for a new randomly generated sub-tree (line
7).
In our approach tree-based GP individuals are used to encode candidate
functions that combine similarity measures obtained from different visual de-
scriptors. In these trees inner nodes are arithmetic operators and the leaves are
input nodes for similarity values computed using visual descriptors. Figure 1
presents an example combination function δDP G that combines four different
visual descriptors di .

(a) (b)

Fig. 1. (a) Individual represented as a tree (b) Corresponding similarity function.

Details of our GP-based classification approach are presented in Section 3.

3 GP+kNN Classification Method
The approach for visual image classification of this paper was adapted from the
one originally proposed in [25] for textual documents. Our approach is presented
in Algorithm 2.
At the beginning the classifier is trained by the search of the best GP indi-
viduals (Ici ), which means the search for the best similarity functions for each
class ci using the training set (D) and the validation set (V) (lines 1 to 9). For
4 Fábio Augusto Faria, Rodrigo Tripodi Calumby, and Ricardo da Silva Torres

Algorithm 2 Approach for image classification using GP and kNN classifiers
Require: Training set (D), validation set (V), and test set (T ), number of classes
(Nc ), number of generations (Ngen )
1: for all i | 1