Subfigure and Multi-Label Classification using a Fine-Tuned Convolutional Neural Network Ashnil Kumar1 , David Lyndon1 , Jinman Kim1 , and Dagan Feng1,2 1 School of Information Technologies, University of Sydney, Australia 2 Med-X Research Institute, Shanghai Jiao Tong University, China {ashnil.kumar}@sydney.edu.au Abstract. This paper describes the submission of the BMET group to the Subfigure Classification and Multi-Label Classification tasks of the ImageCLEF 2016 medical subtrack. Our method creates a new optimised feature extractor by using medical images to fine-tune a CNN that has been pre-trained on general image data. Our classification method shows promising result in both the the subfigure classification and multi-label classification subtasks. Key words: convolutional neural network, fine-tuning, subfigure clas- sification, multi-label classification 1 Introduction This paper describes the submission of the BMET group to two of the the ImageCLEF 2016 Medical Tasks: Subfigure Classification and Multi-Label Clas- sification [1, 2]. A primary challenge of these tasks is to automatically extract relevant representations (content and semantics) from the image data that allow easy differentiation of different modalities [3]. Previous attempts combined a vast range of image derived features that were sampled both globally over the whole image and locally over several different sub-patches [4, 5]. These features were designed by humans to represent some characteristic of the underlying image data, e.g., textures, colours, binary patterns. Convolutional neural networks (CNNs) were used to optimise feature ex- traction for the ImageCLEF Medical Tasks in 2015 [6]. In our prior work, we designed a new CNN for both modality classification [7] and x-ray body region identification [8]. Choi [9] used generic features learned by a CNN from a large, well-labelled natural image dataset. However, the size of the challenge dataset limited the ability to learn the best image features. In this paper, we describe a method for modality classification that uses a smaller medical image dataset to fine-tune (optimise) a CNN that was pre- trained on a large natural image dataset. This method allows us to adapt or adjust the generic features learned from natural images to be more specific for the medical imaging modalities in the ImageCLEF datasets. We apply our method to the Subfigure Classification and Multi-Label Classification tasks. 2 Materials We used the Subfigure Classification training dataset (6776 images, 30 classes) to fine-tune our CNN. The test datasets consisted of 4166 images for Subfigure Classification and 1084 images for Multi-Label Classification. A full description of the datasets can be found in the ImageCLEF 2016 overview papers [1, 2]. 3 Methods We used the well-established AlexNet architecture [10] pre-trained (initialised) on the ImageNet natural image dataset (1000 classes, > 1 million samples) [11]. We fine-tuned the initial AlexNet filter weights (derived from natural images) for 100 epochs using back-propagation so that they were more appropriate for the 30 classes in the dataset. Dropout was used to avoid overfitting. We increased the robustness of our algorithm to translation and orientation using a 24-fold data augmentation scheme. We generated 6 crops (original, top left, top right, bottom left, bottom right, centre) and 4 reflections (no flip, x axis, y-axis and both axes) of each training sample; 90% of the augmented dataset was used for fine-tuning and 10% for validation. The fine-tuned CNN produced a 4096-dimensional feature vector for each input image. To improve efficiency, we reduced the dimensionality using Prin- ciple Component Analysis (PCA) [12] to select the principle components that explained 99% of the variation in the data (1453 dimensions). We trained a multi-class support vector machine (SVM) using the PCA- reduced features extracted from all 24 augmented variations of the training dataset. During classification, we generated the feature vectors for each test image and its 5 crops, and used the SVM to obtain the posterior probability and per-class score that each crop depicted a particular modality. When using per-class SVM scores, we linearly scaled them to the range [0, 1] to reduce the impact of very large outlier scores. We investigated several different schemes to determine the class of an input image, as described in our runs. We implemented our method in MATLAB, using the MatConvNet library [13] for our implementation of CNN fine-tuning. For our experiments we used the pre-trained AlexNet provided as a part of MatConvNet. 4 Runs We submitted 5 runs to the Subfigure Classification (SC) task and 2 runs to the Multi-Label Classification (ML) task. SC1 Mean SVM posterior probabilities of all crops. SC2 Mean of the per-class SVM score, which were scaled across all crops. SC3 Mean of the per-class SVM scores, scaled separately for each crop. SC4 Majority class across all crops. The per-class scores were not scaled. SC5 Maximum SVM posterior probability across all crops. ML1 For each crop, the label was the modality with the highest SVM score. ML2 For each crop, the label was the modality with the highest probability. Table 1: Subfigure Classification Run Type Correctness (%) SC1 visual 77.55 SC2 visual 77.53 SC3 visual 77.50 SC4 visual 77.26 SC5 visual 76.38 Table 2: Multi-Label Classification Run Hamming Loss F-Measure ML1 0.0131 0.295 ML2 0.0135 0.320 5 Results and Discussion Table 1 shows the results of our Subfigure Classification runs. The best outcome came from averaging the posterior probabilities calculated from classifying each crop. Table 2 shows the results of our Multi-Label Classification runs. The low Hamming Loss indicates that our runs had very few incorrectly predicted labels. 6 Conclusions We presented a method for subfigure modality classification and multi-label clas- sification that used fine-tuned CNNs as a feature extractor. We expect improved results through the use of deeper CNNs such as Deep Residual Networks [14]. Acknowledgments. We gratefully acknowledge the support of NVIDIA Cor- poration through the donation of the Tesla K40 GPU used for this research. This work was supported in part by ARC grants. This work was also sup- ported by the Faculty of Engineering and Information Technologies, The Uni- versity of Sydney, under the Faculty Research Cluster program. References 1. Villegas, M., Müller, H., Garcı́a Seco de Herrera, A., Schaer, R., Bromuri, S., Gilbert, A., Piras, L., Wang, J., Yan, F., Ramisa, A., Dellandrea, E., Gaizauskas, R., Mikolajczyk, K., Puigcerver, J., Toselli, A.H., Sánchez, J.A., Vidal, E.: General Overview of ImageCLEF at the CLEF 2016 Labs. Lecture Notes in Computer Science. (2016) 2. Garcı́a Seco de Herrera, A., Schaer, R., Bromuri, S., Müller, H.: Overview of the ImageCLEF 2016 Medical Task. In: CLEF2016 Working Notes. (2016) 3. Kumar, A., Kim, J., Cai, W., Feng, D.: Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data. Journal of Digital Imaging 26(6) (2013) 1025–1039 4. Pelka, O., Friedrich, C.M.: FHDO biomedical computer science group at medical classication task of ImageCLEF 2015. In: CLEF 2015 Working Notes. Volume 1391. (2015) 5. Abedini, M., Cao, L., Codella, N., Connell, J.H., Garnavi, R., Geva, A., Merler, M., Nguyen, Q.B., Pankanti, S.U., Smith, J.R., Sun, X., Tzadok, A.: IBM research at ImageCLEF 2013 medical tasks. In: CLEF 2013 Working Notes. Volume 1179. (2013) 6. Villegas, M., Müller, H., Gilbert, A., Piras, L., Wang, J., Mikolajczyk, K., de Her- rera, A.G.S., Bromuri, S., Amin, M.A., Mohammed, M.K., Acar, B., Uskudarli, S., Marvasti, N.B., Aldana, J.F., del Mar Roldán Garcı́a, M.: General Overview of ImageCLEF at the CLEF 2015 Labs. Lecture Notes in Computer Science. (2015) 7. Lyndon, D., Kumar, A., Kim, J., Leong, P.H.W., Feng, D.: Convolutional neural networks for medical classification. In: CLEF 2015 Working Notes. Volume 1391. (2015) 8. Lyndon, D., Kumar, A., Kim, J., Leong, P.H.W., Feng, D.: Convolutional neural networks for medical clustering. In: CLEF 2015 Working Notes. Volume 1391. (2015) 9. Choi, S.: X-ray Image Body Part Clustering using Deep Convolutional Neural Network: SNUMedinfo at ImageCLEF 2015 Medical Clustering Task. In: CLEF 2015 Working Notes. (2015) 10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con- volutional neural networks. In Pereira, F., Burges, C., Bottou, L., Weinberger, K., eds.: Advances in Neural Information Processing Systems 25. Curran Associates, Inc. (2012) 1097–1105 11. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3) (2015) 211–252 12. Jolliffe, I.: Principal component analysis. Wiley Online Library (2002) 13. Vedaldi, A., Lenc, K.: MatConvNet – Convolutional Neural Networks for MAT- LAB. In: Proceeding of the ACM Int. Conf. on Multimedia. (2015) 689–692 14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)