=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-ImageCLEF-TommasiEt2007
|storemode=property
|title=CLEF2007: Image Annotation Task: an SVM-based Cue Integration Approach
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-TommasiEt2007.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/TommasiOC07a
}}
==CLEF2007: Image Annotation Task: an SVM-based Cue Integration Approach==
CLEF2007 Image Annotation Task: an SVM-based Cue Integration Approach Tatiana Tommasi, Francesco Orabona, and Barbara Caputo IDIAP Research Institute, Centre Du Parc, Av. des Pres-Beudin 20, P. O. Box 592, CH-1920 Martigny, Switzerland {ttommasi, forabona, bcaputo}@idiap.ch Abstract This paper presents the algorithms and results of our participation to the medical image annotation task of ImageCLEFmed 2007. We proposed, as a general strategy, a multi-cue approach where images are represented both by global and local descrip- tors, so to capture different types of information. These cues are combined during the classification step following two alternative SVM-based strategies. The first algorithm, called Discriminative Accumulation Scheme (DAS), trains an SVM for each feature type, and considers as output of each classifier the distance from the separating hyper- plane. The final decision is taken on a linear combination of these distances: in this way cues are accumulated, thus even when they both are misleaded the final result can be correct. The second algorithm uses a new Mercer kernel that can accept as input different feature types while keeping them separated. In this way, cues are selected and weighted, for each class, in a statistically optimal fashion. We call this approach Multi Cue Kernel (MCK). We submitted several runs, testing the performance of the single-cue SVM and of the two cue integration methods. Our team was called BLOOM (BLanceflOr-tOMed.im2) from the name of our sponsors. The DAS algorithm obtained a score of 29.9, which ranked fifth among all submissions. We submitted two versions of the MCK algorithm, one using the one-vs-all multiclass extension of SVMs and the other using the one-vs-one extension. They scored respectively 26.85 and 27.54, ranking first and second among all submissions. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor- mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Management]: Database Applications—Image databases; I.5 [Pattern Recognition]: I.5.2 De- sign Methodology General Terms Measurement, Performance, Experimentation Keywords Automatic Image Annotation, Cue Integration, Support Vector Machines, Kernel Methods 1 Introduction The amount of medical image data produced nowadays is constantly growing, with average-sized radiology departments producing several tera-bites of data annually. The cost of manually an- notating these images is very high; furthermore, manual classification induces errors in the tag assignment, which means that a part of the available knowledge is not accessible anymore to physicians [5]. This calls for automatic annotation algorithms able to perform the task reliably, and benchmark evaluations are thus extremely useful for boosting advances in the field. The ImageCLEFmed annotation task has been established in 2005, and in 2007 it has provided partic- ipants with 11000 training and development data, spread across 116 classes. The task consisted in assigning the correct label to 1000 test images. For further informations on the annotation task of ImageCLEF 2007 we refer the reader to [6]. This paper describes the algorithms submitted by the BLOOM (BLanceflOr-tOMed.im2) team, at its first participation to the CLEF benchmark competition. In order to achieve robustness, a crucial property for a reliable automatic system, we opted for a multi-cue approach, using raw pixels as global descriptors and SIFT features as local descriptors. The two feature types were combined together using two different SVM-based integration schemes. The first is the Discriminative Accumulation Scheme (DAS), proposed first in [7]. For each feature type, an SVM is trained and its output consists of the distance from the separating hyperplane. Then, the decision function is built as a linear combination of the distances, with weighting coefficients determined via cross validation. We submitted a run using this method (BLOOM-BLOOM DAS) that obtained a score of 29.9, ranking fifth among all submissions. The second integration scheme consists in designing a new Mercer kernel, able to take as input different feature types for each image data. We call it Multi Cue Kernel (MCK); the main advantage of this approach is that features are selected and weighted during the SVM training, thus the final solution is optimal as it minimizes the structural risk. We submitted two runs using this algorithm, the first (BLOOM-BLOOM MCK oa) using the one-vs-all multiclass extension of SVM; the second (BLOOM-BLOOM MCK oo) using instead the one-vs-one extension. These two runs ranked first and second among all submissions, with a score of respectively 26.85 and 27.54. These results overall confirm the effectiveness of using multiple cues for automatic image annotation. The rest of the paper is organized as follows: section 2 describes the two types of feature descriptors we used at the single cue stage. Section 3 gives details on the two alternative SVM- based cue integration approaches. Section 3 reports the experimental procedure adopted and the results obtained, with a detailed discussion on the performance of each algorithm. The paper concludes with a summary discussion. 2 Single Cue Image Annotation The aim of the automatic image annotation task is to classify images into a set of classes. In par- ticular classes were defined in relation to the four independent axis of modality, body orientation, body region, and biological system examined, according to the IRMA code [3]. The labels are hierarchical therefore, errors in the annotation are counted depending on the level at which the error is done and on the number of possible choices. For each image the error ranges from 0 to 1, respectively if the image is correctly classified or if the predicted label is completely wrong. It is also possible to assign a “don’t know” label, in this case the score is 0.5. The strategy we propose is to extract a set of features from each image and to use then a Support Vector Machine (SVM) to classify the images. We have explored a local approach, using SIFT descriptors, and a global approach, using the raw pixels. 2.1 Feature Extraction We explored the idea of “bag of words” for classification, a common concept in many state of the art approaches in images classification. This is based on the idea that it is possible to transform the images into a set of prespecified visual words, and to classify the images using the statistics of appearance of each word as feature vectors. Most of these systems are based on the use of the SIFT descriptor [4]. The basic idea of SIFT is to describe an area of an image in a way that is robust to noise, illumination, scale, translation and rotation changes. The SIFT points are selected in the image as local maxima of the scale- space, in this sense the SIFT points are intrinsically easy to be tracked. Despite the usefulness of SIFT, there is no reason to believe that these points are the most informative for a classification task. This has been pointed out by different works and systematically verified by [8]. In that work it is shown that a dense random sampling of the SIFT points is always superior to any strategy based on interest points detectors. Moreover due to the low contrast of the radiographs it would be difficult to use any interest point detector. So in our approach we densely sampled each input image, extracting in each point a SIFT descriptor. Another modification we made is based on the fact that the rotation invariance could be useless for the ImageCLEF classification task, as the various structures present in the radiographs are likely to appear always with the same orientation. Moreover the scale is not likely to change too much between images of the same class, so we extracted the SIFT at only one octave, the one that gave us the best classification performances. In this sense we have decoupled the extraction of a SIFT keypoint from the description of the point itself. To keep the complexity of the description of each image low and at the same to retain as much information as possible, we matched each extracted SIFT with a number of template SIFTs. These template SIFTs form our vocabulary of visual words. It is built using a standard K-means algorithm, with K equal to 500, on a random collection of SIFTs extracted from the training images. Various sizes of vocabulary were tested with no significant differences, so we have chosen the smaller one with good recognition performances. Note that in this phase also testing images can be used, because the process is not using the labels and it is unsupervised. At this point each image could be described with the raw counts of each visual word. To add some kind of spatial information to our features we divided the images in four subimages, collecting the histograms separately for each subimage. In this way the dimension of the input space is multiplied by four, but in our tests we gained about 3% of classification performances. We have extracted 1500 SIFT in each subimage: such dense sampling adds robustness to the histograms. See Figures 1 and 2 for an example. Another approach that we explored was the simplest possible global description method: the raw pixels. The images were resized to 32x32 pixels, regardless of the original dimension, and normalized to have sum equal to one, then the 1024 raw pixels values were used as input features. This approach is at the same time a baseline for the classification system and a useful “companion” method to boost the performance of the SIFT based classifier (see section 2.2). 2.2 Classification For the classification step we used an SVM with an exponential χ2 as kernel, for both the local and global approaches: Ã ! XN (Xi − Yi )2 K(X, Y ) = exp −γ . (1) i=1 Xi + Yi The parameter γ was tuned through cross-validation (see section 4). This kernel has been suc- cessfully applied for histogram comparison and it has been demonstrated to be positive definite [2], thus it is a valid kernel. Even if the labels are hierarchical, we have chosen to use the standard multi-class approaches. This choice is motivated by the finding that, with our features, the recognition rate was lower using an axis-wise classification. This could be due to the fact that each super-class has a variability so 100 40 80 30 60 20 40 10 20 0 0 0 100 200 300 400 500 0 100 200 300 400 500 100 60 80 50 40 60 30 40 20 20 10 0 0 0 100 200 300 400 500 0 100 200 300 400 500 (a) (b) Figure 1: (a) Radiographic image divided in 4 subimages and (b) corresponding counts of the visual words. high that the chosen features are not able to model it, while they can very well model the small sub-classes. In particular we have tested both one-vs-one and one-vs-all multi-class extension for SVM. 3 Multi Cue Annotation Due to the fundamental difference in how local and global features are computed it is reasonable to suppose that the two representations provide different kinds of information. Thus, we expect that by combining them through an integration scheme, we should achieve a better performance, namely higher classification performance and higher robustness. In the computer vision and pattern recognition literature some authors have suggested different methods to combine information derived from different cues (for a review on the topic we refer the reader to [9]). Some of them are based on building new representations, but this technique does not solve the robustness problem because if one of the cues gives misleading information it is quite probable that the new feature vector will be adversely affected. Moreover, the dimension of such a feature vector would increase as the number of of cues grows, implying longer learning and recognition times, greater memory requirements and possibly curse of dimensionality effects. The strategy we follow in this paper is to use integration schemes, thus keeping the feature descriptors separated and fusing them at a mid- or high- level. In the rest of the section we describe the two alternative integration schemes we used in the ImageCLEF competition. The first, the Discriminative Accumulation Scheme (DAS, [7]), is a high-level integration scheme, meaning that each single cue first generate a set of hypotheses on the correct label of the test image, and then those hypotheses are combined together so to obtain a final output. This method is described in section 3.1. The second, the Multi Cue Kernel (MCK), is a mid-level integration scheme, meaning that the different features descriptors are kept separated but they are combined in a single classifier generating the final hypothesis. This algorithm is described in section 3.2. 3.1 Discriminative Accumulation Scheme The Discriminative Accumulation Scheme is an integration scheme for multiple cues that does not neglect any cue contribution. It is based on a weak coupling method called accumulation. The main idea of this method is that information from different cues can be summed together. Nj Suppose we are given M object classes and for each class, a set of Nj training images {Iij }i=1 , j = 1, . . . M . For each image, we extract a set of P different cues: (a) (b) Figure 2: Difference between random sampling and interest point detector. In (a) the four most present visual words in the image are drawn, each with a different color. In (b) the result of standard SIFT extraction in the same octave used in (a). Tp = Tp (Iij ), p = 1...P (2) N so that for an object j we have P new training sets {Tp (Iij )}i=1 j , j = 1, . . . M, p = 1 . . . P . For each we train an SVM. Kernel functions may differ from cue to cue and model parameters can be estimated during the training step via cross validation. Given a test image Iˆ and assuming M ≥ 2, for each single-cue SVM we compute the distance from the separating hyperplane: mp Xj ³ ´ Dj (p) = p αij ˆ + bp . yij Kp Tp (Iij ), Tp (I) (3) j i=1 After collecting all the distances {Dj (p)}P p=1 for all the j objects j = 1, . . . , M and the p cues ˆ p = 1, . . . , P , we classify the image I using the linear combination: P X M j ∗ = argmax{ ap Dj(p) }, ap ∈ < + . (4) j=1 p=1 The coefficients {ap }P p=1 are evaluated via cross validation during the training step. 3.2 Multi Cue Kernel DAS can be defined a high-level integration scheme, as fusion is performed as a post-processing step after the single-cue classification stage. As an alternative, we developed a mid-level integrat- ing scheme based on multi-class SVM with a Multi Cue Kernel KM C . This new kernel combines different features extracted form images; it is a Mercer kernel, as positively weighted linear com- bination of Mercer kernels are Mercer kernels themselves [1]: P X KM C ({Tp (Ii )}p , {Tp (I)}p ) = ap Kp (Tp (Ii ), Tp (I)). (5) p=1 In this way it is possible to perform only one classification step, identifying the best weighting factors ap while optimizing the other kernel parameters. Another advantage of this approach is that it makes it possible to work both with one-vs-all and one-vs-one SVM extensions to the multiclass problem. 4 Experiments Our experiments started evaluating the performance of local and global features separately before testing our integration methods. Two sets of experiments using single-cue SVM were ran to select the best kernel parameters through cross validation. The original dataset was divided in three parts: training, validation and testing. We merged them together and extracted 5 random and disjoint train/test splits of 10000/1000 images. We considered as the best parameters the one giving the best average score on the 5 splits. Note that, according to the method used for the score evaluation, the best average score is not necessary the best recognition rate. Besides obtaining the optimal parameters, these experiments showed that the SIFT features outperform the raw pixel ones. It could be predictable since the last year ImageCLEF competition results showed that local features are generally more informative than global features for the annotation task. Then we adopted the same experimental setup for DAS and MCK. In particular for DAS we used the distances from the separating hyperplanes associated with the best results of the previous step, so the cross validation was used only to search the best weights for cue integration. On the other hand, for MCK the cross validation was applied to look for the best kernel parameters and the best feature’s weights at the same time. In both cases weights could vary from 0 to 1. Finally we used the results of the previous phases to run our submission experiment on the 1000 unlabeled images of the challenge test set using all the 11000 images of the original dataset as training. The ranking, name and score of our submitted runs together with the score gain respect to the best run of other participants are listed in Table 1. Our two runs based on the MCK algorithm ranked first and second among all submissions stating the effectiveness of using multiple cues for automatic image annotation. It is interesting to note that even if DAS has a higher recognition rate, its score is worse than that obtained using the feature SIFT alone. This could be due to the fact that when the label predicted by the global approach, the raw pixels, is wrong, the true label is far from the top of the decision ranking. In Table 2 there is a summary of the parameters used for our runs and the number of support vectors obtained. As we could expect, the best feature weight (see (4) and (5)) for SIFT results higher than that for raw pixels for all the integration methods. The number of support vectors for the MCK run using one-vs-one multiclass SVM extension (MCK oa) is slightly higher than that used by the single cue SIFT oa but lower than that used by PIXEL oa. For the MCK run using one-vs-one multiclass SVM extension (MCK oo) the number of support vectors is even lower than that of both the single cues SIFT oo and PIXEL oo. These results show that combining two features with the MCK algorithm can simplify the classification problem. For DAS we counted the support vectors summing the ones from SIFT oa and PIXEL oa but considering only once the support vectors associated with the training images that resulted in common between the single cues. The number of support vector for DAS exceed that obtained for both MCK oa and MCK oo showing a higher complexity of the classification problem. Table 3 shows in details some examples of classification results. The first, second and third column contain examples of images misclassified by one of the two cues but correctly classified by DAS and MCK oa. The fourth column shows an example of an image misclassified by both cues and by DAS but correctly classified by MCK oa. It is interesting to note that combining local and global features can be useful to recognize images even if they are compromised by the presence of artifacts that for medical images can be prosthesis or reference labels put on the acquisition screen. A deeper analysis of our results can be done considering the performance of the single-cue, discriminative accumulation and multicue kernel approach for each class. In Table 4 the number Rank Name Score Gain Rec. rate 1 BLOOM-BLOOM MCK oa 26.8470167911 4.0828086669 89.7% 2 BLOOM-BLOOM MCK oo 27.5449911826 3.3848342754 89.0% 3 BLOOM-BLOOM SIFT oo 28.7301320009 2.1996934571 88.4% 4 BLOOM-BLOOM SIFT oa 29.45575794 1.474067518 88.5% 5 BLOOM-BLOOM DAS 29.9033537771 1.0264716809 88.9% 28 BLOOM-BLOOM PIXEL oa 68.2130545639 −37.2832291059 79.9% 29 BLOOM-BLOOM PIXEL oo 72.410704904 −41.4808794460 79.2% Table 1: Ranking of our submitted runs, name, score and gain respect to the best run of the other participants. Rank Name γsif t γpixel C asif t apixel #SV 1 BLOOM-BLOOM MCK oa 0.5 5 5 0.80 0.20 7916 2 BLOOM-BLOOM MCK oo 0.1 1.5 20 0.90 0.10 7037 3 BLOOM-BLOOM SIFT oo 0.05 40 7173 4 BLOOM-BLOOM SIFT oa 0.25 10 7704 5 BLOOM-BLOOM DAS 0.25 5 10 0.76 0.24 9090 28 BLOOM-BLOOM PIXEL oa 5 10 8329 29 BLOOM-BLOOM PIXEL oo 3 20 7381 Table 2: Here are shown the best parameters obtained by cross validation and used for the classification, together with the number of Support Vectors for each of our submitted runs. of images correctly recognized for each class are listed and it is possible to note that in few cases PIXEL oa outperforms SIFT oa, and to observe where MCK oa outperforms both SIFT oa and DAS. The difference between our approaches can be better evaluated considering the confusion matrices. They are shown as images in Figure 3. We ordered the classes following the way in which they are listed in table 4 and used a colormap corresponding to the number of images varying from zero to five to let the misclassified images stand out. It is clear that our methods differ principally for how the wrong images are labeled. The more matrices present sparse values out of the diagonal and far away from it, the worse the method is. PIXEL oa 11◦ 1◦ 12◦ 5◦ SIFT oa 1◦ 2◦ 2◦ 5◦ DAS 1◦ 1◦ 1◦ 2◦ MCK oa 1◦ 1◦ 1◦ 1◦ Table 3: Example of images misclassified by one or both cues and correctly classified by DAS or MCK.The values correspond to the decision rank. PIXEL oa PIXEL oa PIXEL oa PIXEL oa MCK oa MCK oa MCK oa MCK oa SIFT oa SIFT oa SIFT oa SIFT oa TOT TOT TOT TOT DAS DAS DAS DAS CLASS CLASS CLASS CLASS 1121-110-213-700 0 0 0 0 3 1121-120-516-700 2 2 2 2 4 1121-210-331-700 0 0 0 0 1 1121-240-441-700 4 4 4 4 5 1121-110-411-700 13 11 12 8 14 1121-120-517-700 2 2 2 3 3 1121-220-213-700 2 2 2 1 2 1121-240-442-700 4 4 4 4 4 1121-110-414-700 38 38 38 35 38 1121-120-800-700 22 22 22 22 22 1121-220-230-700 17 17 17 17 17 1121-320-941-700 10 10 10 10 10 1121-110-415-700 9 9 8 6 9 1121-120-911-700 5 5 5 5 6 1121-220-310-700 7 7 7 6 7 1121-420-212-700 4 2 3 4 4 1121-115-700-400 13 13 13 12 13 1121-120-914-700 6 6 6 6 6 1121-220-330-700 0 0 0 0 1 1121-420-213-700 4 4 4 4 4 1121-115-710-400 2 2 2 2 3 1121-120-915-700 5 5 5 5 6 1121-228-310-700 1 1 1 0 1 1121-430-213-700 8 8 8 6 9 1121-116-917-700 1 1 1 1 1 1121-120-918-700 2 2 2 3 3 1121-229-310-700 1 0 1 1 1 1121-430-215-700 0 0 0 0 1 1121-120-200-700 33 33 33 33 34 1121-120-919-700 2 2 2 1 2 1121-230-462-700 2 2 2 2 2 1121-460-216-700 1 1 1 2 2 1121-120-310-700 20 20 20 20 20 1121-120-921-700 9 9 9 5 9 1121-230-463-700 5 5 5 5 5 1121-490-310-700 1 1 1 0 1 1121-120-311-700 3 3 3 3 3 1121-120-922-700 11 11 11 7 11 1121-230-911-700 0 0 0 0 2 1121-490-415-700 6 6 6 6 6 1121-120-320-700 11 11 11 9 11 1121-120-930-700 0 0 0 0 2 1121-230-914-700 1 1 1 0 1 1121-490-915-700 4 4 4 4 4 1121-120-330-700 22 23 22 20 23 1121-120-933-700 0 0 0 0 1 1121-230-915-700 1 1 1 1 1 1122-220-333-700 0 0 0 0 0 1121-120-331-700 1 0 1 1 1 1121-120-934-700 2 1 2 1 2 1121-230-921-700 7 7 7 6 7 1123-110-500-000 84 78 80 69 91 1121-120-413-700 3 3 3 0 3 1121-120-942-700 9 10 9 7 10 1121-230-922-700 6 6 6 6 8 1123-112-500-000 0 0 0 0 5 1121-120-421-700 4 4 4 4 5 1121-120-943-700 9 9 9 9 10 1121-230-930-700 0 0 0 0 1 1123-121-500-000 5 5 5 5 8 1121-120-422-700 4 4 4 3 4 1121-120-950-700 0 1 0 0 1 1121-230-934-700 1 1 1 0 2 1123-127-500-000 182 184 184 172 196 1121-120-433-700 1 2 1 1 2 1121-120-951-700 1 2 2 0 3 1121-230-942-700 9 9 9 9 9 1123-211-500-000 89 89 89 88 89 1121-120-434-700 2 2 2 0 2 1121-120-956-700 1 0 0 0 2 1121-230-943-700 7 6 6 6 7 1124-310-610-625 6 6 6 6 6 1121-120-437-700 0 0 0 0 1 1121-120-961-700 3 3 3 3 4 1121-230-950-700 0 0 0 0 1 1124-310-620-625 7 6 6 6 7 1121-120-438-700 0 0 0 0 1 1121-120-962-700 5 4 5 3 5 1121-230-953-700 0 0 0 0 1 1124-410-610-625 7 7 7 7 7 1121-120-441-700 4 4 4 2 5 1121-127-700-400 0 0 0 0 3 1121-230-961-700 4 4 4 3 4 1124-410-620-625 7 7 7 7 7 1121-120-442-700 3 3 3 3 4 1121-127-700-500 0 0 0 0 1 1121-230-962-700 2 2 2 0 3 1121-120-91a-700 0 0 0 0 1 1121-120-451-700 1 1 1 0 1 1121-129-700-400 1 1 1 1 1 1121-240-413-700 0 0 0 0 2 1121-12f-466-700 0 0 0 0 1 1121-120-452-700 0 0 0 0 1 1121-200-411-700 9 7 7 4 13 1121-240-421-700 4 4 4 3 5 1121-12f-467-700 2 2 2 2 2 1121-120-454-700 0 0 0 0 1 1121-210-213-700 1 1 1 1 1 1121-240-422-700 2 2 2 1 3 1121-4a0-310-700 2 2 2 0 2 1121-120-462-700 4 4 4 5 5 1121-210-230-700 13 13 13 11 13 1121-240-433-700 1 1 1 0 3 1121-4a0-414-700 8 7 8 8 8 1121-120-463-700 7 7 7 6 7 1121-210-310-700 10 10 10 9 10 1121-240-434-700 1 1 1 0 3 1121-4a0-914-700 3 3 3 2 5 1121-120-514-700 1 1 1 0 2 1121-210-320-700 11 11 11 10 12 1121-240-437-700 0 0 0 0 2 1121-4a0-918-700 0 1 1 0 1 1121-120-515-700 3 3 3 3 3 1121-210-330-700 20 20 20 18 21 1121-240-438-700 0 0 0 0 1 1121-4b0-233-700 4 4 4 3 4 Table 4: Performance of the single-cue, discriminative accumulation and multicue kernel approach for each class. 5 Conclusions This paper presented a discriminative multi-cue approach to medical image annotation. We com- bined global and local information using two alternative fusion strategies, the discriminative ac- cumulation scheme [7] and the multi cue kernel. This last method gave the best performance obtaining a score of 26.85, which ranked first among all submissions. This work can be extended in many ways. First, we would like to use various types of local and global descriptors, so to select the best features for the task. Second, we would like to add shape descriptors in our fusion scheme, which should result in a better performance. Finally, our algorithm does not exploit at the moment the natural hierarchical structure of the data, but we believe that this information is crucial for achieving significant improvements in performance. Future work will explore these directions. Acknowledgments This work was supported by the ToMed.IM2 project (B. C. and F. O), under the umbrella of the Swiss National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management (IM2, www.im2.ch), and by the Blanceflor Boncompagni Ludovisi foundation (T. T., www.blanceflor.se). The support is gratefully acknowledged. References [1] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines (and Other Kernel-Based Learning Methods). CUP, 2000. [2] C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225, 2004. [3] Schubert Henning Keysers Daniel Kohnen Michael Wein Berthold B. Lehmann, Thomas M. The irma code for unique classification of medical images. In Proceedings of SPIE Medical Imaging, volume 5033, pages 440–451, May 2003. 5 5 10 10 4.5 4.5 20 20 4 4 30 30 3.5 3.5 40 40 3 3 50 50 60 2.5 60 2.5 70 2 70 2 80 80 1.5 1.5 90 90 1 1 100 100 0.5 0.5 110 110 0 0 10 20 30 40 50 60 70 80 90 100 110 10 20 30 40 50 60 70 80 90 100 110 (a) (b) 5 5 10 10 4.5 4.5 20 20 4 4 30 30 3.5 3.5 40 40 3 3 50 50 60 2.5 60 2.5 70 2 70 2 80 80 1.5 1.5 90 90 1 1 100 100 0.5 0.5 110 110 0 0 10 20 30 40 50 60 70 80 90 100 110 10 20 30 40 50 60 70 80 90 100 110 (c) (d) Figure 3: These images represent the confusion matrices respectively for (a) SIFT oa, (b) Pixel oa, (c) DAS and (d) MCK oa. We ordered the classes following the way in which they are listed in table 4 and used a colormap corresponding to the number of images varying from zero to five to let the misclassified images stand out. All the position in the matrices containing five or more images appear dark red. [4] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the In- ternational Conference on Computer Vision (ICCV), volume 2, pages 1150–1157, Washington, DC, USA, 1999. IEEE Computer Society. [5] M-O-Gueld, M. Kohnen, D. Keysers, H. Schubert, B. B. Wein, J. Bredno, and T. M. Lehmann. Quality of dicom header information for image categorization. In Proceedings of SPIE Medical Imaging, volume 4685, pages 280–287, 2002. [6] Henning Müller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer, Thomas M. Deserno, Paul Clough, and William Hersh. Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [7] M.E Nilsback and B. Caputo. Cue integration through discriminative accumulation. In Pro- ceedings of the International conference on Computer Vision and Pattern Recognition, 2004. [8] E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Proceedings of the European Conference on Computer Vision, 2006. [9] R. Polikar. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3):21–45, 2006.