<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Based on Neural-Symbolic Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Se-In Jang</string-name>
          <email>sjang7@mgh.harvard.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michaël J.A. Girard</string-name>
          <email>mgirard@ophthalmic.engineering</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandre H. Thiery</string-name>
          <email>a.h.thiery@nus.edu.sg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Statistics and Data Science, National University of Singapore</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Duke-NUS Medical School</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Gordon Center for Medical Imaging, Massachusetts General Hospital and Harvard Medical School</institution>
          ,
          <addr-line>Boston</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute for Molecular and Clinical Ophthalmology</institution>
          ,
          <addr-line>Basel</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Ophthalmic Engineering and Innovation Laboratory, Singapore Eye Research Institute</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose an explainable diabetic retinopathy (ExplainDR) classification model based on neural-symbolic learning. To gain explainability, a high-level symbolic representation should be considered in decision making. Specifically, we introduce a human-readable symbolic representation, which follows a taxonomy style of diabetic retinopathy characteristics related to eye health conditions to achieve explainability. We then include human-readable features obtained from the symbolic representation in the disease prediction. Experimental results on a diabetic retinopathy classification dataset show that our proposed ExplainDR method exhibits promising performance when compared to that from state-of-the-art methods applied to the IDRiD dataset, while also providing interpretability and explainability.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>(A. H. Thiery)</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Diabetic Retinopathy (DR) is one of the leading causes of vision loss afecting the working
age population worldwide [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thanks to the success of deep learning, convolutional neural
networks (CNNs) based deep learning approaches have been recently applied to DR
classification problems [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. Most of the research eforts devoted to CNN-based DR classification
methods have been devoted to designing robust neural architectures (e.g. ResNet and DenseNet)
for enhanced classification accuracy [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Although deep-learning-based DR classification
approaches have demonstrated excellent performance, understanding the decision making
process remains a challenge because of the black-box nature of the deep learning methods. This
lack of explainability has hindered the adoption of deep-learning based methods in clinical
      </p>
      <p>To gain confidence that developed deep learning methods are robust, researchers have
designed and used visually interpretable tools. For instance, gradient-weighted class activation
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>Severity Grade
No DR:
Mild NPDR*:
Moderate NPDR:
Severe NPDR:</p>
      <p>Description
No visible sign of abnormalities
Presence of MAs only
More than just MAs but less than severe NPDR
&gt; 20 intraretinal HEs, Venous beading,
Intraretinal microvascular abnormalities, No signs of PDR</p>
      <p>Neovascularization, Vitreous/pre-retinal HE
*NPDR: Non-Proliferative DR, **PDR: Proliferative DR</p>
      <p>
        PDR**:
mapping (Grad-CAM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a popular approach that can highlight suspected lesions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
However, most of these post-processing tools generate images (e.g. attention maps) that can only be
interpreted by expert ophthalmologists. To circumvent this issue, in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a capsule network [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
was adopted to encode visually interpretable feature scores for X-ray images in a human-level
representation – importantly, these scores can also be interpreted by radiologists. However,
this approach could not be considered an explainable model per se since a taxonomy style of
characteristics or attributes (such as eyes, a nose, and a mouth that can be used to define a given
face) was not involved in the decision making process [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In order to achieve interpretability and completeness for an explainable DR classification
model, we have to understand how DR severity is defined clinically. Table 1 summarizes grading
criteria for DR severity. Clinically, DR is diagnosed based on the presence of one or more
retinal lesions such as Microaneurysms (MA), Hemorrhages (HE), Soft Exudates (SE) and Hard
Exudates (EX) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In addition, Diabetic Macular Edema (DME) severity is also assessed based
on the presence of EXs in the macula region [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Neural-symbolic learning [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] is a suitable approach to produce computational tools for
integrated machine learning and reasoning for explainability [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Neural-symbolic learning
uses deep neural networks to generate high-level symbolic representation that humans can
understand. Logical operations are then conducted using symbolic representation for decision
making. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], a neural-symbolic learning system for visual question answering was presented
to find an answer from a structural scene representation. This system encoded an image
into a compact symbolic representation and then performed symbolic program execution that
included logical operations manually designed for reasoning. However, due to the manual
design, updating logics for improving performances is not an easy task since the logics should
consider relationships between each other.
      </p>
      <p>In this paper, we propose an explainable diabetic retinopathy (ExplainDR) classification model
based on neural-symbolic learning to generate a human-readable symbolic representation. The
proposed symbolic representation follows a taxonomy style of diabetic retinopathy
characteristics consisting of several abnormalities such as MA, HE, SE and EX via a deep neural network
for segmentation. The proposed human-readable feature representation is meant to be directly
interpretable by both ophthalmologists and patients.</p>
      <p>
        In this paper, we aim to develop a neural-symbolic AI approach to accurately diagnose
DR. Such an approach may be of clinical value, because we first generate high-level symbolic
representations that are subsequently used to make a DR diagnosis. In other words, our approach
has the advantage to remain easily interpretable by both clinicians and patients. The algorithm
was tested on the the IDRiD dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and heavily relied on lesion segmentation and disease
severity gradings.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Works</title>
      <sec id="sec-3-1">
        <title>2.1. Visually interpretable based deep learning models</title>
        <p>
          In order to improve the black box based deep learning models, visually interpretable tools
[
          <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19, 20, 21, 22</xref>
          ] for map generation (e.g. attention maps) have been recently applied to DR
problems. In [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], an attention network was used as a clustering method to generate an
attention map that can highlight the suspected lesions. This can also be achieved with Class
Activation Mapping (CAM) [
          <xref ref-type="bibr" rid="ref19 ref24">19, 24</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], a regression based activation map was developed
to include severity level information in the generated saliency map. In [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a Grad-CAM method
that can evaluate the suspected lesions without requiring architectural changes or re-training
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], was adopted to use diferent CNN architectures for improving visual interpretability. In
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], a combination of lower-layer and higher-layer saliency maps was developed to accurately
locate the lesions. Although the above methods could provide clinical value, they still could not
explain why and how the developed models could visually localize the suspected lesions.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Neural-Symbolic Learning</title>
        <p>
          The goal of neural-symbolic learning is to provide a coherent, unifying view for logic and
connectionism to contribute to the modelling and understanding of cognition and, thereby,
behavior [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The neural-symbolic learning includes a neural network implementation of a
logic, a logical characterisation of a neural network system and a hybrid learning system that
profitably achieves symbolic and connectionist approaches together to artificial intelligence.
Deep neural networks can learn complex input data such as images, audio and text to generate
high-level representations, which are useful in decision making [27]. A logic network on top of
a deep neural network to learn the relations of those abstractions, can then help systems to be
able to explain itself. In [28], DeepProbLog was developed by combining an end-to-end learning
with reasoning, where outputs of the neural networks were applied as inputs to ProbLog [29].
In [30], a neural-symbolic framework called logical neural networks (LNN) was designed to
simultaneously provide key properties of both neural networks for learning and symbolic logics
for knowledge and reasoning. LNN considers every neuron to have a meaning as a component
of a formula in a weighted real-valued logic. In LNN, an idea of a 1-to-1 correspondence between
neurons and the elements of logical formulae was presented by observing the weights of neurons
that can act like AND or OR operations. Based on this idea, LNN has achieved a diferentiable
model that can minimize a logical loss function for refutation of logical contradiction.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Explainable Diabetic Retinopathy Classification</title>
      <p>In this section, we propose an explainable diabetic retinopathy (ExplainDR) classification method
based on neural-symbolic learning. Fig. 1 illustrates an overview of the proposed ExplainDR
method. Our proposed neural-symbolic learning method includes a U-Net segmentation network
[31] used to generate a high-level symbolic representation and a fully connected network (FCN)
for learning the generated symbolic representation to predict decision instead of designing
logical operations [32]. The U-Net segmentation network extracts a higher-level representation
in a symbolic space than the pixel-level representation. To produce the high-level symbolic
representation in a taxonomy style, we train the U-Net segmentation network using four
segmentation labels, namely Microaneurysms (MA), Hemorrhages (HE), Soft Exudates (SE) and
Hard Exudates (EX) which are the main factors to decide about DR severity. Based on the four
output images   , 1 ≤  ≤ 4 produced by the segmentation network for each eye condition (i.e.
 = 1 for MA), we extract a human-readable feature vector as symbolic representation using a
quantization technique. This feature vector counts the segmented regions in each segmentation
output image   by setting
  = {   
}=1
where   is a set of the segmented regions   in   and   is the number of segmented regions
within each set   . The human-readable feature vector is then given by
(1)
(2)</p>
      <p>= [ | 1| , | 2| , | 3| , | 4| ] ∈ ℕ4,
where |  | is the number of segmented regions in   . The human-readable feature vector is trained
using the FCN instead of performing the logical operations to avoid the eforts of designing
considerable logic combinations for decision making.</p>
      <p>For instance, from an unseen test image, the human-readable feature vector is obtained from
each segmented output through the trained segmentation network. Based on the trained FCN,
the decision prediction is performed using the human-readable feature vector. We then generate
explanation by combining the human-readable feature vector and the predicted decision as
follows:
• The DR diagnosis of “image 1” is “moderate NPDR” because there are 33 MA, 13 HE, 5 SE
and 27 EX regions, respectively.
• The DR diagnosis of “image 2” is “mild NPDR” because there are 20 MA, 5 HE, 1 SE and 3</p>
      <p>EX regions, respectively.</p>
      <p>Additionally, similar to other interpretable DR methods, the visually interpretable images (i.e.
segmented images) are also provided. Therefore, we achieve an explainable DR classification
method, which includes human-readable symbolic representation in the decision making process,
whereas typical AI black-box models only address pixel-level representations.</p>
      <sec id="sec-4-1">
        <title>3.1. Extension of the symbolic representation</title>
        <p>Our proposed human-readable feature vector consists of the simple symbolic representation in
only four dimensions, and for the four eye conditions (e.g. MA, HE, SE and EX). In order to
improve the simple symbolic representation, we propose to consider the sizes of the segmented
lesions for better symbolic representation while removing false or noisy segmented lesions.
Each segmented lesion   is classified into one of three subsets: small, medium or large size as
follows:









= {  ∶  0 &lt;   ≤  1,∀ } ,
= {  ∶  1 &lt;   ≤  2,∀ } ,
= {  ∶  2 &lt;   ≤  3,∀ } ,
where the size   is given by the number of the connected pixels in each segmented lesion   . 
is a threshold that experimentally defines the small, medium and large sizes of the segmented
lesions. The improved human-readable feature vector is then given by:
  = [| 
1 | , | 
1
| , | 1 | , … , | 
4 | , | 
4
| , | 4 |] ∈ ℕ12.</p>
        <p>We note that the extended human-readable feature vector is still under a taxonomy style that
can ofer logical explanation according to the diferent sizes of the segmented lesion within
(3)
(4)
each eye condition.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiments</title>
      <sec id="sec-5-1">
        <title>4.1. Experimental settings</title>
        <p>
          In our experiment, we use the Indian Diabetic Retinopathy Image Dataset (IDRiD)1 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], since
this is the one public dataset that provides both lesion segmentation and disease severity
gradings. The images have the resolution of 4288 × 2848 pixels. Each image is resized to
1024 × 1024 pixels. In the lesion segmentation dataset, four labels such as Microaneurysms (MA),
Hemorrhages (HE), Soft Exudates (SE) and Hard Exudates (EX) are included. In the severity
grading dataset, five labels for diabetic retinopathy (DR) such as no DR, mild NPDR, moderate
NPDR, severe NPDR and PDR are provided. Additionally, three labels for diabetic macular
edema (DME) such as no EX, presence of EX outside and within the macula center are also
given. The lesion segmentation dataset has 187 training images and 95 test images in total 282
images. The severity grading dataset provides 413 training images and 103 test images in total
516 images.
        </p>
        <p>
          In the IDRiD challenge [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], they provided a specific accuracy evaluation metric counts when
the following condition is satisfied:
(  ==  ̂ ) and ( 
==  ̂ ) ,
(5)
where  is a true label, and  ̂ is a predicted label for DR and DME. In Equation (3), the thresholds
are experimentally set at  0 = 10,  1 = 500,  2 = 1000 and  3 = 10000 respectively.
        </p>
        <p>
          In the segmentation network, the ResNet34 structure [33] is used with the Adam optimizer
following a batch size of 2, a learning rate of 0.0001 and a dropout probability of 0.1 for 20 epochs
with early stopping. The data augmentation of the segmentation networks includes random
lfipping, gamma contrast with a range (0.5, 1.5) and a contrast limited adaptive histogram
equalization. The FCN layers are given by: [
          <xref ref-type="bibr" rid="ref12 ref12 ref25 ref25">12, 25, 50, 75, 100, 75, 50, 25, 12</xref>
          ]. In the FCN
layers, the Adam optimizer is adopted with a batch size of 16, a learning rate of 0.01 and a
dropout probability of 0.1 for 20 epochs with early stopping. The segmentation network is
ifrst trained using the lesion segmentation training set. The FCN layers are then trained using
the proposed symbolic feature vectors obtained from the severity grading training set via the
trained segmentation network. We split the training sets into 80% for training and 20% for
validation.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Results</title>
        <p>In order to observe the efect of our proposed ExplainDR method, we conduct an ablation study
to evaluate the extension of the human-readable feature vector. We compare the proposed
ExplainDR method with the state-of-the-art methods using the IDRiD dataset. Fig. 2 qualitatively
shows the segmentation results for eye conditions such as MA, HE, SE and EX using six images
from the severity grading dataset. According to small, medium and large (sml) size regions
of each eye condition, the 6 extracted human-readable feature vectors for each image are as
follows:
(1) smlMA: 37, 0, 0, smlHE: 26, 2, 2, smlSE: 0, 0, 0, smlEX: 197, 5, 3
(2) smlMA: 59, 0, 0, smlHE: 54, 4, 4, smlSE: 0, 0, 0, smlEX: 96, 2, 0
(3) smlMA: 25, 0, 0, smlHE: 27, 4, 1, smlSE: 10, 0, 0, smlEX: 90, 0, 1
(4) smlMA: 8, 0, 0, smlHE: 9, 0, 0, smlSE: 0, 0, 0, smlEX: 122, 3, 1
(5) smlMA: 4, 0, 0, smlHE: 6, 0, 0, smlSE: 0, 0, 0, smlEX: 1, 0, 0
(6) smlMA: 1, 0, 0, smlHE: 5, 0, 0, smlSE: 2, 1, 1, smlEX: 0, 0, 0
The explanation along with the predicted decision using the human-readable features are
generated as follows:
(1) The image 1 is classified as severe NPDR because 37 small MAs, 26 small HEs, 2 medium</p>
        <p>HEs, 2 large HEs, 197 small EXs, 5 medium EXs and 3 large EXs are detected.
(2) The image 2 is classified as PDR because 59 small MAs, 54 small HEs, 4 medium HEs, 4 large</p>
        <p>HEs, 96 small EXs, and 2 medium EXs are detected.
(3) ...
(4) The image 6 is classified as mild NPDR because 1 small MA, 5 small HEs, 2 small HEs, 1
medium HE, 1 large HE are detected.
Here, we note that the above explanations can be compared to the severity grading criteria
shown in Table 1 by summing all the numbers of the small, medium and large size regions
for each eye condition. This helps non-experts to analyze the generated explanations for
self-diagnosis.</p>
        <p>
          To observe the impact of symbolic feature extension of the proposed ExplainDR method,
Table 2 shows an ablation study for: 1) ExplainDR with 4 dimensions of the simple symbolic
features and 2) ExplainDR with 12 dimensions of the extended symbolic features. The extension
of the symbolic representation outperforms that of the simple symbolic representation since
the detailed categorization of the simple symbolic representation provides more discriminative
symbolic representation than the simple symbolic representation. For performance comparison,
Table 3 summarizes accuracy performances of the proposed ExplainDR method and the
state-ofthe-art methods [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The proposed method without utilizing any external dataset (e.g. Kaggle2,
Messidor3 and DiaretDB14) shows the second-best performance with interpretable images and
texts in the leaderboard on the IDRiD dataset. Whereas, the state-of-the-art methods with
external datasets provide the accuracy performances without any explanation.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>This paper presented an explainable diabetic retinopathy (ExplainDR) classification method
based on neural-symbolic learning which generated a high-level symbolic representation via a
segmentation network. The generated symbolic representation was extended according to the
sizes of the segmented lesions to produce more discriminative symbolic representation. The DR
severity is predicted by the fully connected network, which was trained using the extended
symbolic representation. We qualitatively showed that our proposed symbolic representation
was human-readable in the taxonomy style associated with the eye health conditions, as well as
an explanation with the reasons of the DR severity. The proposed ExplainDR method showed
promising performances to the state-of-the-art methods in terms of classification accuracies on
the IDRiD dataset as well as providing interpretability and explainability.</p>
      <p>The limitations of our works are: 1) The accuracy and explainablity performances of the
proposed ExplainDR are afected by the quality of the segmentation results; 2) Diferent decision
outputs can be observed due to the nature of stochastic learning (e.g. FCN); and 3) An enhanced
design is needed to adopt other datasets if there is no annotation of the lesion segmentation
and the DR classification together. Our future works accordingly are as follows: 1) Study of the
2https://www.kaggle.com/c/diabetic-retinopathy-detection
3https://www.adcis.net/en/third-party/messidor
4http://www2.it.lut.fi/project/imageret/diaretdb1
efect of the segmentation performance; 2) Use of least-squares based methods as a deterministic
learning approach instead of the stochastic learning approach; and 3) Study of adoption of other
datasets without annotation of the lesion segmentation.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the anonymous reviewers for their constructive comments
to improve this paper. SI would like to thank Prof. Alexandre H. Thiery for his numerous
supports. AHT acknowledges support from the Singapore Ministry of Education Tier 1 grant
(R155-000-228-114).
Artificial Intelligence Organization, 2020, pp. 1–1.
[27] A. Garcez, M. Gori, L. Lamb, L. Serafini, M. Spranger, S. Tran, Neural-symbolic computing:
An efective methodology for principled integration of machine learning and reasoning,
Journal of Applied Logics 6 (2019) 611–632.
[28] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, L. De Raedt, Deepproblog: Neural
probabilistic logic programming, Advances in Neural Information Processing Systems 31
(2018) 3749–3759.
[29] L. De Raedt, A. Kimmig, H. Toivonen, Problog: A probabilistic prolog and its application
in link discovery., in: IJCAI, volume 7, Hyderabad, 2007, pp. 2462–2467.
[30] R. Riegel, A. Gray, F. Luus, N. Khan, N. Makondo, I. Y. Akhalwaya, H. Qian, R. Fagin,
F. Barahona, U. Sharma, et al., Logical neural networks, arXiv preprint arXiv:2006.13155
(2020).
[31] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image
segmentation, in: International Conference on Medical image computing and
computerassisted intervention, Springer, 2015, pp. 234–241.
[32] G. G. Towell, J. W. Shavlik, Knowledge-based artificial neural networks, Artificial
intelligence 70 (1994) 119–165.
[33] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
770–778.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <article-title>Diabetic retinopathy screening update</article-title>
          ,
          <source>Clinical diabetes 27</source>
          (
          <year>2009</year>
          )
          <fpage>140</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>U.</given-names>
            <surname>Schmidt-Erfurth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadeghipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Gerendas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Waldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bogunović</surname>
          </string-name>
          , Artificial intelligence in retina,
          <source>Progress in retinal and eye research 67</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. S. W.</given-names>
            <surname>Ting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Pasquale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S. W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmetterer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Y.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and deep learning in ophthalmology</article-title>
          ,
          <source>British Journal of Ophthalmology</source>
          <volume>103</volume>
          (
          <year>2019</year>
          )
          <fpage>167</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Ting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Varadarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Burlina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmetterer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Pasquale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Bressler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Webster</surname>
          </string-name>
          , et al.,
          <article-title>Deep learning in ophthalmology: the technical and clinical considerations</article-title>
          ,
          <source>Progress in retinal and eye research 72</source>
          (
          <year>2019</year>
          )
          <fpage>100759</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Pratt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Coenen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Broadbent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Harding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <article-title>Convolutional neural networks for diabetic retinopathy</article-title>
          ,
          <source>Procedia computer science 90</source>
          (
          <year>2016</year>
          )
          <fpage>200</fpage>
          -
          <lpage>205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>Lesion detection and grading of diabetic retinopathy via two-stages deep convolutional neural networks</article-title>
          , in: International conference
          <article-title>on medical image computing and computer-assisted intervention</article-title>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>540</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          , Grad-cam:
          <article-title>Visual explanations from deep networks via gradient-based localization</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chetoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Akhloufi</surname>
          </string-name>
          ,
          <article-title>Explainable diabetic retinopathy using eficientnet</article-title>
          ,
          <source>in: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine &amp; Biology Society (EMBC)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1966</fpage>
          -
          <lpage>1969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>LaLonde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Torigian</surname>
          </string-name>
          , U. Bagci,
          <article-title>Encoding visual attributes in capsules for explainable medical diagnoses</article-title>
          , in: International Conference on Medical Image Computing and
          <string-name>
            <surname>Computer-Assisted</surname>
            <given-names>Intervention</given-names>
          </string-name>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sabour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Frosst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Dynamic routing between capsules</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3859</fpage>
          -
          <lpage>3869</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunning</surname>
          </string-name>
          ,
          <source>Explainable artificial intelligence (xai)</source>
          ,
          <source>Defense Advanced Research Projects Agency (DARPA)</source>
          ,
          <source>nd Web 2</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Yau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kawasaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Lamoureux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kowalski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Dekker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fletcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grauslund</surname>
          </string-name>
          , et al.,
          <article-title>Global prevalence and major risk factors of diabetic retinopathy</article-title>
          ,
          <source>Diabetes care 35</source>
          (
          <year>2012</year>
          )
          <fpage>556</fpage>
          -
          <lpage>564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Decencière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Cazuguel,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cochener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordonez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Massin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Erginay</surname>
          </string-name>
          , et al.,
          <article-title>Feedback on a publicly distributed image database: the messidor database</article-title>
          ,
          <source>Image Analysis &amp; Stereology</source>
          <volume>33</volume>
          (
          <year>2014</year>
          )
          <fpage>231</fpage>
          -
          <lpage>234</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Porwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pachade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kokare</surname>
          </string-name>
          , G. Deshmukh,
          <string-name>
            <given-names>J.</given-names>
            <surname>Son</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bae</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          , et al.,
          <article-title>Idrid: Diabetic retinopathy-segmentation and grading challenge</article-title>
          ,
          <source>Medical image analysis 59</source>
          (
          <year>2020</year>
          )
          <fpage>101561</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Besold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Foldiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Icard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kuhnberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miikkulainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <article-title>Neural-symbolic learning and reasoning: Contributions and challenges</article-title>
          , in: AAAI Spring Symposium Series,
          <year>2015</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>03</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. d.</given-names>
            <surname>Garcez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <article-title>Neurosymbolic ai: the 3rd wave</article-title>
          , arXiv preprint arXiv:
          <year>2012</year>
          .
          <volume>05876</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Besold</surname>
          </string-name>
          , A. d. Garcez,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          , K.-U. Kuhnberger,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lowd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M. V.</given-names>
            <surname>Lima</surname>
          </string-name>
          , et al.,
          <article-title>Neural-symbolic learning and reasoning: A survey and interpretation</article-title>
          ,
          <source>arXiv preprint arXiv:1711.03902</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kohli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <article-title>Neural-symbolic vqa: disentangling reasoning from vision and language understanding</article-title>
          ,
          <source>in: Proceedings of the 32nd International Conference on Neural Information Processing Systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1039</fpage>
          -
          <lpage>1050</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lapedriza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>Learning deep features for discriminative localization</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>2921</fpage>
          -
          <lpage>2929</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>in: Proceedings of the 31st international conference on neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>4768</fpage>
          -
          <lpage>4777</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Axiomatic attribution for deep networks</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3319</fpage>
          -
          <lpage>3328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Smilkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thorat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Viégas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wattenberg</surname>
          </string-name>
          ,
          <article-title>Smoothgrad: removing noise by adding noise</article-title>
          ,
          <source>arXiv preprint arXiv:1706.03825</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Zoom-in-net: Deep mining lesions for diabetic retinopathy detection</article-title>
          , in: International Conference on Medical Image Computing and
          <string-name>
            <surname>Computer-Assisted</surname>
            <given-names>Intervention</given-names>
          </string-name>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>267</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Ma, W. Qian,
          <article-title>An interpretable ensemble deep learning model for diabetic retinopathy disease classification</article-title>
          ,
          <source>in: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>2045</fpage>
          -
          <lpage>2048</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Diabetic retinopathy detection via deep convolutional networks for discriminative localization and visual explanation</article-title>
          ,
          <source>arXiv preprint arXiv:1703.10757</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Ellg:
          <article-title>Explainable lesion learning and generation for diabetic retinopathy detection</article-title>
          ,
          <source>in: International Joint Conferences on Artificial Intelligence Workshop on Disease Computational Modeling</source>
          , International Joint Conferences on
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>