Feature Visualisation of Classification of Diabetic Retinopathy Using a Convolutional Neural Network Harry Pratt1 , Frans Coenen3∗ , Simon P. Harding1,2 , Deborah M. Broadbent2 , Yalin Zheng1,2 1 Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, L7 8BX, 2 St. Paul’s Eye Unit, Royal Liverpool University Hospital, Liverpool, L7 8XP, 3 Department of Computer Science, University of Liverpool, Liverpool, L69 3BX sghpratt@liverpool.ac.uk, coenen@liverpool.ac.uk, sharding@liverpool.ac.uk, dbroadbe@liverpool.ac.uk, yzheng@liverpool.ac.uk Abstract these papers have learned features of DR in order to deter- mine the level of DR severity within a fundus image using Convolutional Neural Networks (CNNs) have been clinically labeled images. demonstrated to achieve state-of-the-art results on However, the DR classification predictions presented in complex computer vision tasks, including medi- these papers do not offer any insight in to the reasoning be- cal image diagnosis of Diabetic Retinopathy (DR). hind the CNN model predictions. Although the CNN mod- CNNs are powerful because they determine rele- els have learned from ground truths based on a clinical grad- vant image features automatically. However, the ing framework the methods do not present the features that current inability to demonstrate what these features have been learned by the CNN in order to arrive at the pre- are has led to CNNs being considered to be ‘black diction. DR feature extraction from fundus images typi- box’ methods whose results should not be trusted. cally involves manual algorithms [Ravishankar et al., 2009; This paper presents a method for identifying the ManojKumar et al., 2015] which are undertaken before the learned features of a CNN and applies it in the con- classification process commences. The extracted features text of the diagnosis of DR in fundus images using then correspond to a predicted severity of the disease. In the well-known DenseNet. We train the CNN to di- the case of CNN models we wish to implement the reverse agnose and determine the severity of DR and then procedure. Through dissecting the CNN model we wish to successfully extract feature maps from the CNN determine which features have led to the prediction. which identify the regions and features of the im- Feature extraction is a vital process in the grading of DR ages which have led most strongly to the CNN pre- because the manual process used by clinicians are typically diction. This feature extraction process has great feature based processes, for example the process prescribed potential, particularly for encouraging confidence in[ETDRS Study Group, 1991]. Deep learning in the clini- in CNN approaches from users and clinicians, and cal community is widely perceived to be black box. Conse- can aid in the further development of CNN meth- quently it is unclear to clinicians whether the feature based ods. There is also potential for determining previ- framework used in manual grading is the same as the classifi- ously unidentified features which may contribute to cation framework produced by the CNN. As a result there is a classification. a lack of trust in the ability of deep learner. In [Zhou et al., 2015] Class Activation Maps (CAMs) were presented as a method of determining the regions within a 1 Introduction CNN input image which have contributed most towards the Convolutional Neural Networks (CNNs), a deep learning classification. In the case of disease classification this offers approach to image classification, can offer extremely fast insight into the areas of the image containing features of the classification predictions based on learning relevant features. disease under consideration. The severity of DR within a fun- These features are learned within the network structure itself; dus image directly relates to the location of certain features from labeled images that the network has ‘seen’. Recently [ETDRS Study Group, 1991]. These features, their location CNNs have been used to enhance accuracy on a wide range and how they relate to DR classification are presented in Ta- of computer vision tasks [Krizhevsky et al., 2012]. This has ble 1. The idea of saliency maps was presented in [Simonyan extended to the application of automated medical image diag- et al., 2013]. Saliency maps offer a method of determining nosis. For example, the classification of Diabetic Retinopathy the most significant pixels involved in the classification pre- (DR) severity through the use of colour fundus images [Pratt diction of an image. et al., 2016; Gulshan et al., 2016]. The CNNs presented in This paper aims to open the CNN black-box in order to make CNNs more transparent in the context of feature based ∗ Contact Author prediction of DR. Deep learning classification methods do Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 23 c c P not justify prediction values. This paper presents a novel • For class c softmax input is Sc = k wk Fk where wk method of extending CNN black box prediction models so is the weight for node k that they become feature based models. Through determin- Sc ing the learned features and their locations we explore how • Softmax output, probability, is given as Pc = P e Sc c=0 e the CNN reached its prediction and how this corresponds to • The weighted sum of feature maps, the CAM, is defined the manual feature based grading. as, X CAMc = wkc fk (x, y). (1) 2 Method k Initially a CNN was trained on fundus images to predict DR Therefore, it is clear the CAM for class c, CAMc , directly severity. Once the model had been trained the model param- relates to the prediction value of the class Sc . The weights w eters remained immutable throughout the rest of the process. in the definition of CAMc and Sc remain constant from the The trained model was then used to produce prediction val- trained CNN. This therefore indicates the direct importance ues, saliency maps and CAMs for unseen test images. Atten- of the activation at pixel fk (x, y) to the prediction within the tion maps and other techniques would produce similar results CAM of an image to class c. Therefore, for our CNN trained to the class activation maps and saliency maps if applied to for DR severity, CAMs are an effective method for determin- the CNN. The two selected methods were used as they com- ing the region of pixels relating to disease severity prediction. pliment each other and highlighted features within the image This process is shown in Figure 2. in different manners. For evaluation, these were compared to the clinical ground truth and the features identified within the 2.4 Saliency Maps images. The idea of saliency maps is to compute the gradient of the 2.1 Dataset output class with respect to the input image. This tells us The dataset used for training and evaluation was from Kaggle how the output category value changes with respect to a small [Kaggle, 2016]. The dataset is a large set of 88,702 high- change in the input image pixels. Therefore, like CAMs, in resolution retina fundus images; 78,076 training, 10,626 test- saliency maps the weights remain unchanged. Positive values ing. A clinician has graded the level of DR using five classes: in the gradient tell us that a change to that pixel will increase no DR, mild DR, moderate DR, severe DR and proliferative the output class value. Hence, the larger the positive gradi- DR. The images were provided by eyePACS [EyePacs, 2018] ent, the more reliant on this pixel the image is in the classi- from a diabetic screening process. Example images from the fication process. Visualising all of the gradients, which are dataset are given in Figure 1. the same shape as the input image, produces a saliency map which highlights the salient pixels that contribute the most 2.2 Convolutional Neural Network Training towards the output class. Saliency maps are described as fol- lows: The adopted CNN architecture was the well-known DenseNet [Huang et al., 2016] demonstrated in Figure 2. The DenseNet • Let the input image be defined as I weights were initialised with pre-trained ImageNet weights, a • Let Sc (I) be the class score function for the image learning rate of 0.0003 was used with Adam backpropagation on a NVIDIA k40 GPU using Keras [Keras, 2019] library. • We want to rank each pixel (x, y) based on its influence Training was undertaken until the categorical cross entropy on Sc loss function plateaued on the test data. • Sc is a highly non-linear function in a CNN. Hence Sc is approximated with a first-order Taylor expansion in the 2.3 Class Activation Maps neighborhood of the pixel In this section, we define the procedure for producing Class • Sc (I(x,y) ) ≈ wT (x, y) + b Activation Maps (CAMs). CAMs require global average pooling after the final convolution layer in the CNN. Pooling • Where w is the derivative of Sc with respect to image I provides the localisation for the region detection. Applying at point (x, y) the trained CNN to test activated weights in the output layer δSc w= |I depending on nodes that have beeen activated. These weights δI x,y can be projected back on to the convolutional feature maps The computation of an image-specific saliency map for a sin- in order to identify regions of importance for a certain class. gle class is extremely quick, since it only requires a single Hence, to compute the class activation maps of an input im- back-propagation pass. Saliency maps differ from CAMs as age we computed a weighted sum of the feature maps of the they look at how changes in the input image affect the class last convolutional layer. CAMs are defined as follows: prediction as opposed to combining feature maps in order to • Let input image I with coordinates (x, y) be I(x, y) determine the most filtered region of an image. • Let fk (x, y) be the activation of a node k in the last layer of convolution 3 Results • The P result of global average pooling is Fk = The purpose of the paper is to give an insight in how qualita- (x,y) fk (x, y) tive features can be derived and presented (quantitative results 24 Feature Grading DR Level No apparent retinopathy No Retinopathy • Haemorrhages/Microaneurysms only < 2A Mild • < 6 Cotton Wool Spots in the absence of other features • < 6 Cotton Wool Spots with Haemorrhages/Microaneurysms <2A • Single venous loop • Haemorrhages/Microaneurysms ≥ 2A in 1-3 quadrants Moderate • ≥ 6 Cotton Wool Spots • 1 quadrant Venous Beading/Looping/Reduplication • Intraretinal microvascular abnormalities < 8A • 4 quadrants Haemorrhages/Microaneurysms ≥ 2A Severe • 2-4 quadrants Venous Beading/Looping/Reduplication • 1 quadrant Intraretinal microvascular abnormalities ≥ 8A • Neovascularisation of disc < 10A alone Early • Neovascularisation Elsewhere < 12 disc area (DA) alone Proliferative • Neovascularisation Elsewhere ≥ 12 DA and no Preretinal/vitreous haemorrhage • Neovascularisation of disc ≥ 1/3 DA (10A) alone High-risk • Neovascularisation Elsewhere ≥ 1/2 DA and Preretinal/vitreous haemorrhage Proliferative • Vitreous haemorrhage precluding adequate view of fundus • Traction retinal detachment (TRD) Neovascularisation of disc/elsewhere have inactivated Stable treated Fibrovascular proliferation disc/elsewhere Stable treated Table 1: Clinical diagnosis of DR based on various feature types with different contributions to classification. One feature in each list is required for the equivalent DR grading. 2A, 8A and 10A refer to ‘standard photographs’ from ETDRS [ETDRS Study Group, 1991]. Figure 1: Fundus images from Kaggle dataset; (a) No DR (b) Mild DR (c) Moderate DR (d) Severe DR (e) Proliferative DR. Note: Little obvious difference between (a),(b) and (c), however it is important to distinguish these. CAMs and saliency maps should detect features such as Haemorrhages/Microaneurysms around the vessels. 25 Figure 2: Top: The DenseNet architecture used for training. Bottom: The combination of the trained weights w from the final layer and the feature maps of the last convolution layer to produce the CAM. The feature maps vary depending on the input image. have been widely discussed in the literature). However, in or- age training. Figure 3 demonstrates that in the early stages der to determine the level of quantitative results required in of retinopathy the CNN looks along the vessel structure and order to achieve this level of qualitative output the qualita- looks for deviation normal vessel structures. This is shown tive results must be defined. The multi-class DenseNet model through the lightest pixels being the vessels in the saliency achieved 0.81 quadratic weighted kappa on the test data for map for the no DR and mild DR cases. Haemorrhages and the multi-class problem. microaneurysms from the early stage of the disease tend to CAMs from test images, with an example result for each lie around the vessel structure and abnormal vessels are a key class of DR, are presented in Figure 3. The colour range is distinction between no DR and mild or moderate DR. It is from red to green. The closer the region is to red the more also apparent that the saliency maps in the moderate class that region has contributed towards the prediction. Similarly, have “light” pixels spread around the retina as the CNN looks in the saliency maps, the lighter the pixel the more the pixel for features in more than one region of the retina; which is has contributed to the classification of the image. key to moderate classification. The CAMs of each class of DR demonstrate the links be- In the saliency maps for the severe and proliferative classi- tween the severity ground truth and the input image that the fications we can see identification of features relating to clin- CNN has divulged through the training process. As seen in ical diagnosis. In the severe DR saliency map in Figure 4 we Figure 3, the regions leading to classification of No, Mild or can identify the microaneurysms and cotton wool spots. The Moderate DR relate to the main vessel structure and tend to microaneurysms in different regions of the retina relate di- avoid the macula (centre of the retina). Initial signs of dis- rectly to the severe DR classification. Similarly, in the prolif- ease stem from the vessels in the form of haemorrhages or erative saliency map in Figure 4 the lighter pixels correspond microaneurysms or abnormal vessels as presented in Table 1. to features that the CNN has identified. The laser spots pro- Furthermore, it was also clear from the test image CAMs that duced through treatment to the eye remain dark in the saliency the severe and proliferative classifications look more towards map and therefore the CNN is, correctly, not treating these as the macula. This is shown in the severe and proliferative a feature of disease. An example of this is shown in Figure 4. cases in Figure 3. This corresponds to the clinical classifica- tion process as severe disease requires Haemorrhages or Mi- croaneurysms and Venous Beading/Looping/Reduplication 4 Discussion throughout the retina. However, the saliency maps for the The visualisation techniques presented in this paper demon- proliferative case rarely took in to consideration the optic disc strate that CNN models are achieving some success in repli- region in the classification prediction. This suggests that the cating the clinical process undertaken during diagnosis of CNN model is excluding an important marker for prolifera- fundus images. Similar features are being detected and simi- tive retinopathy; neovasularisation of the disc. lar regions are being related to the appropriate classes. How- The saliency maps provide insight in to the features that ever, in order to fully determine if the CNN has learned a have been detected through the ground truth and input im- similar classification process we would require fundus images 26 Figure 3: Left to right; No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. Top to bottom; Original Kaggle Image, Class Activation Map of preprocessed image and Saliency Map of preprocessed image. annotated with every single feature present in the image and visualised through the use of Class Activation Maps (CAMs) saliency maps annotated to the same criteria. Furthermore, and saliency maps. These methods provide a useful tool to the CNN is only told the severity of the image, not the com- determine if deep learning classification models relate accu- bination of features involved, so it may therefore be deemed rately to clinical diagnosis procedures. The presented meth- unfair that the CNN is expected to learn the precise mecha- ods could also be used in the screening process to reduce the nism that was used to determine the ground truth. Especially time a clinician spends looking for features within a fundus when grader agreement is often variable; complex structures image. CAMs present a good method for ‘flagging’ regions of DR can become subjective when based on such minute fea- of disease, whereas saliency maps present a solution for fea- tures. ture detection. The method presented also discovers features of disease severity that are missed in the automated procedure and there- fore indicates where the CNN needs to be improved; such as Acknowledgment neovasularisation of the disc detection. This could be used to H. Pratt would like to acknowledge everyone in the CRiA determine a general set of features that CNNs struggle to de- imaging team at the Institute of Ageing and Chronic Disease tect. During training image preprocessing techniques could at the University of Liverpool. He would also like to thank be used in order to make the missed features more apparent the Fight for Sight charity for PhD funding and NVIDIA for within the image to aid CNN learning. providing an NVIDIA k40 GPU. The methodologies have been validated on images from the Liverpool Diabetic Eye Screening Program (LDESP) in or- der to test their ability to generalise to other datasets. Figures References 5 and 6 demonstrate the class activation maps and saliency [ETDRS Study Group, 1991] ETDRS Study Group. Grad- maps abilities to generalise to unseen data. Numerous fea- ing diabetic retinopathy from stereoscopic color fundus tures are identified in multiple fundus images from the same photographs? an extension of the modified airlie house eye, including images that aren’t macula centred. classification: ETDRS report number 10. Ophthalmology, 98(5):786–806, 1991. 5 Conclusion [EyePacs, 2018] EyePacs. A free platform for retinopathy In conclusion, we have demonstrated that the correlation be- screening. http://www.eyepacs.com/, 2018. [Online; ac- tween CNN predictions and manual grading of DR can be cessed 30/05/2018]. 27 Figure 4: Top to Bottom; severe and proliferative DR. Left to right; original image with expert labelled features and saliency map with matching region overlay. Rectangles denote laser spots, circles denotes haemorrhages or microaneurysms, squares denote neovascularisation elsewhere, curved squares denote cotton wool spots and curved rectangles denotes venous reduplication. [Gulshan et al., 2016] V Gulshan, L Peng, M Coram, and tronics, Computer Science and Technology (ICERECT), et al. Development and validation of a deep learning algo- pages 240–245, Dec 2015. rithm for detection of diabetic retinopathy in retinal fundus [Pratt et al., 2016] Harry Pratt, Frans Coenen, Deborah M. photographs. JAMA, 316(22):2402–2410, 2016. Broadbent, Simon P. Harding, and Yalin Zheng. Convolu- [Huang et al., 2016] Gao Huang, Zhuang Liu, and Kilian Q. tional neural networks for diabetic retinopathy. Procedia Weinberger. Densely connected convolutional networks. Computer Science, 90:200 – 205, 2016. 20th Conference CoRR, abs/1608.06993, 2016. on Medical Image Understanding and Analysis (MIUA 2016). [Kaggle, 2016] Kaggle. Kaggle: Platform for predictive modelling and analytics competitions, 2016. [Ravishankar et al., 2009] S. Ravishankar, A. Jain, and A. Mittal. Automated feature extraction for early detection [Keras, 2019] Keras. Keras: Deep Learning library for of diabetic retinopathy in fundus images. In 2009 IEEE Theano and TensorFlow, 2019. Conference on Computer Vision and Pattern Recognition, [Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever, pages 210–217, June 2009. and Geoffrey E Hinton. Imagenet classification with deep [Simonyan et al., 2013] Karen Simonyan, Andrea Vedaldi, convolutional neural networks. In F. Pereira, C. J. C. and Andrew Zisserman. Deep inside convolutional Burges, L. Bottou, and K. Q. Weinberger, editors, Ad- networks: Visualising image classification models and vances in Neural Information Processing Systems 25, saliency maps. CoRR, abs/1312.6034, 2013. pages 1097–1105. Curran Associates, Inc., 2012. [Zhou et al., 2015] Bolei Zhou, Aditya Khosla, Àgata [ManojKumar et al., 2015] S B ManojKumar, R Manjunath, Lapedriza, Aude Oliva, and Antonio Torralba. Learn- and H. S. Sheshadri. Feature extraction from the fundus ing deep features for discriminative localization. CoRR, images for the diagnosis of diabetic retinopathy. In 2015 abs/1512.04150, 2015. International Conference on Emerging Research in Elec- 28 Figure 6: (Top) 4 montaged fundus images. (Bottom) Overlayed saliency map from the trained DenseNet multi-class DR model over- layed on the original fundus image. Figure 5: (Left) Fundus images from the Liverpool Diabetic Eye Screening Program (LDESP). Middle) Saliency map from the trained DenseNet multi-class DR model overlayed on the original fundus image. Right) CAMs from the trained DenseNet multi-class DR model overlayed on the original image. 29