Feature Visualisation of Classification of Diabetic Retinopathy Using a
                                 Convolutional Neural Network
 Harry Pratt1 , Frans Coenen3∗ , Simon P. Harding1,2 , Deborah M. Broadbent2 , Yalin Zheng1,2
               1
                Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, L7 8BX,
                   2
                     St. Paul’s Eye Unit, Royal Liverpool University Hospital, Liverpool, L7 8XP,
             3
               Department of Computer Science, University of Liverpool, Liverpool, L69 3BX
                    sghpratt@liverpool.ac.uk, coenen@liverpool.ac.uk, sharding@liverpool.ac.uk,
                                dbroadbe@liverpool.ac.uk, yzheng@liverpool.ac.uk

                              Abstract                                    these papers have learned features of DR in order to deter-
                                                                          mine the level of DR severity within a fundus image using
        Convolutional Neural Networks (CNNs) have been                    clinically labeled images.
        demonstrated to achieve state-of-the-art results on                  However, the DR classification predictions presented in
        complex computer vision tasks, including medi-                    these papers do not offer any insight in to the reasoning be-
        cal image diagnosis of Diabetic Retinopathy (DR).                 hind the CNN model predictions. Although the CNN mod-
        CNNs are powerful because they determine rele-                    els have learned from ground truths based on a clinical grad-
        vant image features automatically. However, the                   ing framework the methods do not present the features that
        current inability to demonstrate what these features              have been learned by the CNN in order to arrive at the pre-
        are has led to CNNs being considered to be ‘black                 diction. DR feature extraction from fundus images typi-
        box’ methods whose results should not be trusted.                 cally involves manual algorithms [Ravishankar et al., 2009;
        This paper presents a method for identifying the                  ManojKumar et al., 2015] which are undertaken before the
        learned features of a CNN and applies it in the con-              classification process commences. The extracted features
        text of the diagnosis of DR in fundus images using                then correspond to a predicted severity of the disease. In
        the well-known DenseNet. We train the CNN to di-                  the case of CNN models we wish to implement the reverse
        agnose and determine the severity of DR and then                  procedure. Through dissecting the CNN model we wish to
        successfully extract feature maps from the CNN                    determine which features have led to the prediction.
        which identify the regions and features of the im-                   Feature extraction is a vital process in the grading of DR
        ages which have led most strongly to the CNN pre-                 because the manual process used by clinicians are typically
        diction. This feature extraction process has great                feature based processes, for example the process prescribed
        potential, particularly for encouraging confidence                in[ETDRS Study Group, 1991]. Deep learning in the clini-
        in CNN approaches from users and clinicians, and                  cal community is widely perceived to be black box. Conse-
        can aid in the further development of CNN meth-                   quently it is unclear to clinicians whether the feature based
        ods. There is also potential for determining previ-               framework used in manual grading is the same as the classifi-
        ously unidentified features which may contribute to               cation framework produced by the CNN. As a result there is
        a classification.                                                 a lack of trust in the ability of deep learner.
                                                                             In [Zhou et al., 2015] Class Activation Maps (CAMs) were
                                                                          presented as a method of determining the regions within a
1       Introduction                                                      CNN input image which have contributed most towards the
Convolutional Neural Networks (CNNs), a deep learning                     classification. In the case of disease classification this offers
approach to image classification, can offer extremely fast                insight into the areas of the image containing features of the
classification predictions based on learning relevant features.           disease under consideration. The severity of DR within a fun-
These features are learned within the network structure itself;           dus image directly relates to the location of certain features
from labeled images that the network has ‘seen’. Recently                 [ETDRS Study Group, 1991]. These features, their location
CNNs have been used to enhance accuracy on a wide range                   and how they relate to DR classification are presented in Ta-
of computer vision tasks [Krizhevsky et al., 2012]. This has              ble 1. The idea of saliency maps was presented in [Simonyan
extended to the application of automated medical image diag-              et al., 2013]. Saliency maps offer a method of determining
nosis. For example, the classification of Diabetic Retinopathy            the most significant pixels involved in the classification pre-
(DR) severity through the use of colour fundus images [Pratt              diction of an image.
et al., 2016; Gulshan et al., 2016]. The CNNs presented in                   This paper aims to open the CNN black-box in order to
                                                                          make CNNs more transparent in the context of feature based
    ∗
        Contact Author                                                    prediction of DR. Deep learning classification methods do
    Copyright © 2019 for this paper by its authors. Use permitted under
    Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                                                   23
                                                                                                                      c           c
                                                                                                                  P
not justify prediction values. This paper presents a novel                • For class c softmax input is Sc =      k wk Fk where wk
method of extending CNN black box prediction models so                      is the weight for node k
that they become feature based models. Through determin-                                                                        Sc

ing the learned features and their locations we explore how               • Softmax output, probability, is given as Pc = P e         Sc
                                                                                                                              c=0 e
the CNN reached its prediction and how this corresponds to                • The weighted sum of feature maps, the CAM, is defined
the manual feature based grading.                                           as,                     X
                                                                                         CAMc =         wkc fk (x, y).         (1)
2     Method                                                                                           k
Initially a CNN was trained on fundus images to predict DR
                                                                         Therefore, it is clear the CAM for class c, CAMc , directly
severity. Once the model had been trained the model param-
                                                                      relates to the prediction value of the class Sc . The weights w
eters remained immutable throughout the rest of the process.
                                                                      in the definition of CAMc and Sc remain constant from the
The trained model was then used to produce prediction val-
                                                                      trained CNN. This therefore indicates the direct importance
ues, saliency maps and CAMs for unseen test images. Atten-
                                                                      of the activation at pixel fk (x, y) to the prediction within the
tion maps and other techniques would produce similar results
                                                                      CAM of an image to class c. Therefore, for our CNN trained
to the class activation maps and saliency maps if applied to
                                                                      for DR severity, CAMs are an effective method for determin-
the CNN. The two selected methods were used as they com-
                                                                      ing the region of pixels relating to disease severity prediction.
pliment each other and highlighted features within the image
                                                                      This process is shown in Figure 2.
in different manners. For evaluation, these were compared to
the clinical ground truth and the features identified within the      2.4    Saliency Maps
images.
                                                                      The idea of saliency maps is to compute the gradient of the
2.1    Dataset                                                        output class with respect to the input image. This tells us
The dataset used for training and evaluation was from Kaggle          how the output category value changes with respect to a small
[Kaggle, 2016]. The dataset is a large set of 88,702 high-            change in the input image pixels. Therefore, like CAMs, in
resolution retina fundus images; 78,076 training, 10,626 test-        saliency maps the weights remain unchanged. Positive values
ing. A clinician has graded the level of DR using five classes:       in the gradient tell us that a change to that pixel will increase
no DR, mild DR, moderate DR, severe DR and proliferative              the output class value. Hence, the larger the positive gradi-
DR. The images were provided by eyePACS [EyePacs, 2018]               ent, the more reliant on this pixel the image is in the classi-
from a diabetic screening process. Example images from the            fication process. Visualising all of the gradients, which are
dataset are given in Figure 1.                                        the same shape as the input image, produces a saliency map
                                                                      which highlights the salient pixels that contribute the most
2.2    Convolutional Neural Network Training                          towards the output class. Saliency maps are described as fol-
                                                                      lows:
The adopted CNN architecture was the well-known DenseNet
[Huang et al., 2016] demonstrated in Figure 2. The DenseNet               • Let the input image be defined as I
weights were initialised with pre-trained ImageNet weights, a             • Let Sc (I) be the class score function for the image
learning rate of 0.0003 was used with Adam backpropagation
on a NVIDIA k40 GPU using Keras [Keras, 2019] library.                    • We want to rank each pixel (x, y) based on its influence
Training was undertaken until the categorical cross entropy                 on Sc
loss function plateaued on the test data.                                 • Sc is a highly non-linear function in a CNN. Hence Sc is
                                                                            approximated with a first-order Taylor expansion in the
2.3    Class Activation Maps                                                neighborhood of the pixel
In this section, we define the procedure for producing Class              • Sc (I(x,y) ) ≈ wT (x, y) + b
Activation Maps (CAMs). CAMs require global average
pooling after the final convolution layer in the CNN. Pooling            • Where w is the derivative of Sc with respect to image I
provides the localisation for the region detection. Applying               at point (x, y)
the trained CNN to test activated weights in the output layer                                         δSc
                                                                                                w=        |I
depending on nodes that have beeen activated. These weights                                            δI x,y
can be projected back on to the convolutional feature maps
                                                                      The computation of an image-specific saliency map for a sin-
in order to identify regions of importance for a certain class.
                                                                      gle class is extremely quick, since it only requires a single
Hence, to compute the class activation maps of an input im-
                                                                      back-propagation pass. Saliency maps differ from CAMs as
age we computed a weighted sum of the feature maps of the
                                                                      they look at how changes in the input image affect the class
last convolutional layer. CAMs are defined as follows:
                                                                      prediction as opposed to combining feature maps in order to
    • Let input image I with coordinates (x, y) be I(x, y)            determine the most filtered region of an image.
    • Let fk (x, y) be the activation of a node k in the last layer
      of convolution                                                  3     Results
    • The
      P result of global average pooling is Fk                  =     The purpose of the paper is to give an insight in how qualita-
        (x,y) fk (x, y)                                               tive features can be derived and presented (quantitative results


                                                                                                                               24
                                             Feature Grading                                                  DR Level
                                          No apparent retinopathy                                           No Retinopathy
                              • Haemorrhages/Microaneurysms only < 2A                                           Mild
                        • < 6 Cotton Wool Spots in the absence of other features
                   • < 6 Cotton Wool Spots with Haemorrhages/Microaneurysms <2A
                                           • Single venous loop
                        • Haemorrhages/Microaneurysms ≥ 2A in 1-3 quadrants                                    Moderate
                                         • ≥ 6 Cotton Wool Spots
                          • 1 quadrant Venous Beading/Looping/Reduplication
                             • Intraretinal microvascular abnormalities < 8A
                           • 4 quadrants Haemorrhages/Microaneurysms ≥ 2A                                        Severe
                         • 2-4 quadrants Venous Beading/Looping/Reduplication
                       • 1 quadrant Intraretinal microvascular abnormalities ≥ 8A
                                 • Neovascularisation of disc < 10A alone                                        Early
                        • Neovascularisation Elsewhere < 12 disc area (DA) alone                              Proliferative
              • Neovascularisation Elsewhere ≥ 12 DA and no Preretinal/vitreous haemorrhage
                           • Neovascularisation of disc ≥ 1/3 DA (10A) alone                                   High-risk
               • Neovascularisation Elsewhere ≥ 1/2 DA and Preretinal/vitreous haemorrhage                    Proliferative
                       • Vitreous haemorrhage precluding adequate view of fundus
                                   • Traction retinal detachment (TRD)
                          Neovascularisation of disc/elsewhere have inactivated                              Stable treated
                                 Fibrovascular proliferation disc/elsewhere                                  Stable treated

Table 1: Clinical diagnosis of DR based on various feature types with different contributions to classification. One feature in each list is
required for the equivalent DR grading. 2A, 8A and 10A refer to ‘standard photographs’ from ETDRS [ETDRS Study Group, 1991].


Figure 1: Fundus images from Kaggle dataset; (a) No DR (b) Mild DR (c) Moderate DR (d) Severe DR (e) Proliferative DR. Note: Little
obvious difference between (a),(b) and (c), however it is important to distinguish these. CAMs and saliency maps should detect features such
as Haemorrhages/Microaneurysms around the vessels.


                                                                                                                                    25
Figure 2: Top: The DenseNet architecture used for training. Bottom: The combination of the trained weights w from the final layer and the
feature maps of the last convolution layer to produce the CAM. The feature maps vary depending on the input image.


have been widely discussed in the literature). However, in or-        age training. Figure 3 demonstrates that in the early stages
der to determine the level of quantitative results required in        of retinopathy the CNN looks along the vessel structure and
order to achieve this level of qualitative output the qualita-        looks for deviation normal vessel structures. This is shown
tive results must be defined. The multi-class DenseNet model          through the lightest pixels being the vessels in the saliency
achieved 0.81 quadratic weighted kappa on the test data for           map for the no DR and mild DR cases. Haemorrhages and
the multi-class problem.                                              microaneurysms from the early stage of the disease tend to
   CAMs from test images, with an example result for each             lie around the vessel structure and abnormal vessels are a key
class of DR, are presented in Figure 3. The colour range is           distinction between no DR and mild or moderate DR. It is
from red to green. The closer the region is to red the more           also apparent that the saliency maps in the moderate class
that region has contributed towards the prediction. Similarly,        have “light” pixels spread around the retina as the CNN looks
in the saliency maps, the lighter the pixel the more the pixel        for features in more than one region of the retina; which is
has contributed to the classification of the image.                   key to moderate classification.
   The CAMs of each class of DR demonstrate the links be-                In the saliency maps for the severe and proliferative classi-
tween the severity ground truth and the input image that the          fications we can see identification of features relating to clin-
CNN has divulged through the training process. As seen in             ical diagnosis. In the severe DR saliency map in Figure 4 we
Figure 3, the regions leading to classification of No, Mild or        can identify the microaneurysms and cotton wool spots. The
Moderate DR relate to the main vessel structure and tend to           microaneurysms in different regions of the retina relate di-
avoid the macula (centre of the retina). Initial signs of dis-        rectly to the severe DR classification. Similarly, in the prolif-
ease stem from the vessels in the form of haemorrhages or             erative saliency map in Figure 4 the lighter pixels correspond
microaneurysms or abnormal vessels as presented in Table 1.           to features that the CNN has identified. The laser spots pro-
Furthermore, it was also clear from the test image CAMs that          duced through treatment to the eye remain dark in the saliency
the severe and proliferative classifications look more towards        map and therefore the CNN is, correctly, not treating these as
the macula. This is shown in the severe and proliferative             a feature of disease. An example of this is shown in Figure 4.
cases in Figure 3. This corresponds to the clinical classifica-
tion process as severe disease requires Haemorrhages or Mi-
croaneurysms and Venous Beading/Looping/Reduplication
                                                                      4    Discussion
throughout the retina. However, the saliency maps for the             The visualisation techniques presented in this paper demon-
proliferative case rarely took in to consideration the optic disc     strate that CNN models are achieving some success in repli-
region in the classification prediction. This suggests that the       cating the clinical process undertaken during diagnosis of
CNN model is excluding an important marker for prolifera-             fundus images. Similar features are being detected and simi-
tive retinopathy; neovasularisation of the disc.                      lar regions are being related to the appropriate classes. How-
   The saliency maps provide insight in to the features that          ever, in order to fully determine if the CNN has learned a
have been detected through the ground truth and input im-             similar classification process we would require fundus images


                                                                                                                                 26
Figure 3: Left to right; No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. Top to bottom; Original Kaggle Image, Class
Activation Map of preprocessed image and Saliency Map of preprocessed image.


annotated with every single feature present in the image and         visualised through the use of Class Activation Maps (CAMs)
saliency maps annotated to the same criteria. Furthermore,           and saliency maps. These methods provide a useful tool to
the CNN is only told the severity of the image, not the com-         determine if deep learning classification models relate accu-
bination of features involved, so it may therefore be deemed         rately to clinical diagnosis procedures. The presented meth-
unfair that the CNN is expected to learn the precise mecha-          ods could also be used in the screening process to reduce the
nism that was used to determine the ground truth. Especially         time a clinician spends looking for features within a fundus
when grader agreement is often variable; complex structures          image. CAMs present a good method for ‘flagging’ regions
of DR can become subjective when based on such minute fea-           of disease, whereas saliency maps present a solution for fea-
tures.                                                               ture detection.
   The method presented also discovers features of disease
severity that are missed in the automated procedure and there-
fore indicates where the CNN needs to be improved; such as
                                                                     Acknowledgment
neovasularisation of the disc detection. This could be used to       H. Pratt would like to acknowledge everyone in the CRiA
determine a general set of features that CNNs struggle to de-        imaging team at the Institute of Ageing and Chronic Disease
tect. During training image preprocessing techniques could           at the University of Liverpool. He would also like to thank
be used in order to make the missed features more apparent           the Fight for Sight charity for PhD funding and NVIDIA for
within the image to aid CNN learning.                                providing an NVIDIA k40 GPU.
   The methodologies have been validated on images from the
Liverpool Diabetic Eye Screening Program (LDESP) in or-
der to test their ability to generalise to other datasets. Figures   References
5 and 6 demonstrate the class activation maps and saliency           [ETDRS Study Group, 1991] ETDRS Study Group. Grad-
maps abilities to generalise to unseen data. Numerous fea-             ing diabetic retinopathy from stereoscopic color fundus
tures are identified in multiple fundus images from the same           photographs? an extension of the modified airlie house
eye, including images that aren’t macula centred.                      classification: ETDRS report number 10. Ophthalmology,
                                                                       98(5):786–806, 1991.
5   Conclusion                                                       [EyePacs, 2018] EyePacs. A free platform for retinopathy
In conclusion, we have demonstrated that the correlation be-           screening. http://www.eyepacs.com/, 2018. [Online; ac-
tween CNN predictions and manual grading of DR can be                  cessed 30/05/2018].


                                                                                                                           27
Figure 4: Top to Bottom; severe and proliferative DR. Left to right; original image with expert labelled features and saliency map with
matching region overlay. Rectangles denote laser spots, circles denotes haemorrhages or microaneurysms, squares denote neovascularisation
elsewhere, curved squares denote cotton wool spots and curved rectangles denotes venous reduplication.


[Gulshan et al., 2016] V Gulshan, L Peng, M Coram, and                   tronics, Computer Science and Technology (ICERECT),
  et al. Development and validation of a deep learning algo-             pages 240–245, Dec 2015.
  rithm for detection of diabetic retinopathy in retinal fundus       [Pratt et al., 2016] Harry Pratt, Frans Coenen, Deborah M.
  photographs. JAMA, 316(22):2402–2410, 2016.                            Broadbent, Simon P. Harding, and Yalin Zheng. Convolu-
[Huang et al., 2016] Gao Huang, Zhuang Liu, and Kilian Q.                tional neural networks for diabetic retinopathy. Procedia
  Weinberger. Densely connected convolutional networks.                  Computer Science, 90:200 – 205, 2016. 20th Conference
  CoRR, abs/1608.06993, 2016.                                            on Medical Image Understanding and Analysis (MIUA
                                                                         2016).
[Kaggle, 2016] Kaggle. Kaggle: Platform for predictive
  modelling and analytics competitions, 2016.                         [Ravishankar et al., 2009] S. Ravishankar, A. Jain, and
                                                                         A. Mittal. Automated feature extraction for early detection
[Keras, 2019] Keras. Keras: Deep Learning library for                    of diabetic retinopathy in fundus images. In 2009 IEEE
  Theano and TensorFlow, 2019.                                           Conference on Computer Vision and Pattern Recognition,
[Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever,               pages 210–217, June 2009.
  and Geoffrey E Hinton. Imagenet classification with deep            [Simonyan et al., 2013] Karen Simonyan, Andrea Vedaldi,
  convolutional neural networks. In F. Pereira, C. J. C.                 and Andrew Zisserman.          Deep inside convolutional
  Burges, L. Bottou, and K. Q. Weinberger, editors, Ad-                  networks: Visualising image classification models and
  vances in Neural Information Processing Systems 25,                    saliency maps. CoRR, abs/1312.6034, 2013.
  pages 1097–1105. Curran Associates, Inc., 2012.
                                                                      [Zhou et al., 2015] Bolei Zhou, Aditya Khosla, Àgata
[ManojKumar et al., 2015] S B ManojKumar, R Manjunath,                  Lapedriza, Aude Oliva, and Antonio Torralba. Learn-
  and H. S. Sheshadri. Feature extraction from the fundus               ing deep features for discriminative localization. CoRR,
  images for the diagnosis of diabetic retinopathy. In 2015             abs/1512.04150, 2015.
  International Conference on Emerging Research in Elec-


                                                                                                                                 28
                                                                  Figure 6: (Top) 4 montaged fundus images. (Bottom) Overlayed
                                                                  saliency map from the trained DenseNet multi-class DR model over-
                                                                  layed on the original fundus image.


Figure 5: (Left) Fundus images from the Liverpool Diabetic Eye
Screening Program (LDESP). Middle) Saliency map from the
trained DenseNet multi-class DR model overlayed on the original
fundus image. Right) CAMs from the trained DenseNet multi-class
DR model overlayed on the original image.


                                                                                                                           29