<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uncertainty Quantification in Chest X-Ray Image Classification using Bayesian Deep Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yumin Liu</string-name>
          <email>yuminliu@ece.neu.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claire Zhao</string-name>
          <email>claire.zhao@philips.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Rubin</string-name>
          <email>jonathan.rubin@philips.com</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Deep neural networks (DNNs) have proven their effectiveness on numerous tasks. However, research into the reliability of DNNs falls behind their successful applications and remains to be further investigated. In addition to prediction, it is also important to evaluate how confident a DNN is about its predictions, especially when those predictions are being used within medical applications. In this paper, we quantify the uncertainty of DNNs for the task of Chest X-Ray (CXR) image classification. We investigate uncertainties of several commonly used DNN architectures including ResNet, ResNeXt, DenseNet and SENet. We then propose an uncertaintybased evaluation strategy that retains subsets of held-out test data ordered via uncertainty quantification. We analyze the impact of this strategy on the classifier performance. In addition, we also examine the impact of setting uncertainty thresholds on the performance. Results show that utilizing uncertainty information may improve DNN performance for some metrics and observations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Neural networks have been very successful in many fields such as
natural language processing [
        <xref ref-type="bibr" rid="ref23 ref41">41, 23</xref>
        ], computer vision [
        <xref ref-type="bibr" rid="ref18 ref8">18, 8</xref>
        ], speech
recognition [
        <xref ref-type="bibr" rid="ref15 ref5">15, 5</xref>
        ], machine translation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], control system [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], auto
driving [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and so on. However, there is much less research
available on how reliable neural network predictions are. A common
criticism of neural networks is that they are a black box that can
perform very well for many tasks, yet lacking interpretability. On the
other hand, it is very important to ensure the reliability of a system
involved in high risk fields, including stock-market analysis,
selfdriving cars and medical imaging [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. As the rapid development of
machine learning and artificial intelligence especially deep learning,
they are getting more and more applications in health areas
including disease diagnosis [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], drug discovery [
        <xref ref-type="bibr" rid="ref25 ref30">25, 30</xref>
        ] and medical
imaging [
        <xref ref-type="bibr" rid="ref16 ref33 ref7">7, 16, 33</xref>
        ]. Rather than just being told a final result by an
machine learning algorithm, shareholders (doctors, physicians,
radiologists, etc) would like to know how “confident” a neural network
model is, so that they can take different actions according to
different confidence levels. For example, in a medical image classification
scenario, a neural network model is applied to detect whether a
patient has a certain type of lung pathology by classifying his/her chest
X-ray images. An ideal situation would be that physicians can trust
the result of the neural network, if it is highly confident (low
uncertainty) about its prediction. On the contrary, if the neural network
gives a prediction with low confidence (or high uncertainty), then
the prediction could not be trusted and the patient’s scan should be
further examined by a radiologist. Applying this mechanism is
beneficial since there are lots of X-ray images everyday but there are
limited radiologist resources. It can help prioritize X-ray images for
radiologists to examine, require more attention to low confidence
instances and support treatment recommendations for highly confident
instances.
      </p>
      <p>
        Neural network-based deep learning algorithms are also getting
popular for medical X-ray image processing [
        <xref ref-type="bibr" rid="ref1 ref27 ref35">27, 1, 35</xref>
        ]. It is
necessary to examine the uncertainty of neural network models in medical
X-ray image processing. The confidence of a prediction by a machine
learning method can be measured by the uncertainty of the method
outputs. A typical way to estimate uncertainty is through Bayesian
learning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which regards the parameters of methods as random
variables and attempts to get the posterior distribution of the
parameters during training while marginalizing out the parameters to get the
distribution of the prediction during inference. Bayesian learning is
well developed in traditional non-neural network machine learning
framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORKS</title>
      <p>
        In recent years Bayesian learning and estimation of prediction
uncertainty have gained more and more attention in neural networks
context due to the wide application of deep neural networks in many
areas [
        <xref ref-type="bibr" rid="ref11 ref12 ref12 ref13 ref14 ref22 ref24 ref26 ref3 ref31 ref32 ref32 ref40">11, 3, 12, 13, 22, 14, 32, 24, 40, 26, 12, 31, 32</xref>
        ].
      </p>
      <p>
        The authors in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] introduced a method called “Bayes By
Backprop” to learn the posterior distribution on the weights of neural
networks and get weight uncertainty. Essentially this method assumes
the weights come from a multivariate Gaussian distribution and
updates the mean and covariance of the Gaussian instead of the weight
samples during training. During inference the network weights are
drawn from the learned distribution. This method is mathematically
grounded, backpropagation-compatible and can learn the distribution
of network weights directly, but it cannot utilize pre-trained model
and has to build the corresponding model for every neural network
architecture. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] reformulated dropout in neural networks as
approximate Bayesian inference in deep Gaussian processes and thus can
estimate uncertainty in neural networks with dropout layers. This
method requires dropout layers applied before every weight layer.
During inference, the dropout layers with random 0-1s drawn from
Bernoulli distribution mask out some weights and only use a subset
of the weights learned during training phase to make a prediction. In
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], the authors further proposed that there are two types of
uncertainties and they showed the benefits of explicitly formulating these
two uncertainties separately. The first type is called aleatoric
uncertainty (or data uncertainty), which is due to the noise in the data and
cannot be eliminated, while the other type is called epistemic
uncertainty (or model uncertainty), which accounts for uncertainty in the
model and can be eliminated given enough data. The network
architectures have to be modified to add extra outputs in order to model
these uncertainties. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] adopted this typing of uncertainty, but
modified the formulation of aleatoric and epistemic uncertainty to avoid
the requirement of extra outputs.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] proposed a method called “Stochastic Weight Averaging
Gaussian (SWAG)” to approximate the posterior distribution over the
weights of neural networks as a Gaussian distribution by utilizing
information in Stochastic Gradient Descent (SGD). This method has
an advantage in that it can be applied to almost all existing neural
networks without modifying their original architectures and can
directly leverage pre-trained models. [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] also decomposed predictive
uncertainty in deep learning into two components and modeled them
separately. They shown that quantifying the uncertainty can help to
improve the predictive performance in medical image super
resolution. [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] investigated the relationship between uncertain labels in
CheXpert [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and Chest X-ray14 [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] data sets and the estimated
uncertainty for corresponding instances using Bayesian neural
network and suggested that utilizing uncertain labels helped prevent
over-confident for ambiguous instances.
      </p>
      <p>Despite the above works in Bayesian deep neural network
learning and uncertainty quantification, there are few works on
evaluating the effects of uncertainty-based evaluation strategies for medical
image classification. To the best of our knowledge, we are the first
to apply uncertainty quantification strategies for chest X-ray image
classification using deep neural networks and evaluate their impacts
on performances. The main contributions of this paper are:
We apply uncertainty quantification to five deep neural network
models for chest X-ray image classification and analyze their
performances.</p>
      <p>We investigate the impact that uncertainty information has on
classification task performance by evaluating subsets of held-out test
data ordered via uncertainty quantification.
3</p>
    </sec>
    <sec id="sec-3">
      <title>METHOD</title>
      <p>
        In this section, we will introduce the basic ideas of Bayesian Neural
Networks and one of its approximations – SWAG [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], which is used
in this paper. We also describe the uncertainty quantification method
used in this paper.
3.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Bayesian Neural Network</title>
      <p>In the ordinary deterministic neural networks, we get point
estimation of the network weights w which are regarded as fixed values
and will not be changed after training. During inference, for each
input xi we get one deterministic prediction p(yijxi) = p(yijxi; w)
without getting the uncertainty information.</p>
      <p>In the Bayesian neural network settings, in addition to the
target prediction, we also want to get the uncertainty for the
prediction. To do so we regard the neural network weights as random
variables that subject to some form of distribution and try to estimate
the posterior distribution of the network weights given the training
data during training. We then integrated out the weights and get the
distribution over the prediction during inference. From the
prediction distribution we can further calculate the prediction output and
corresponding uncertainty. More specifically, let D = f(X; Y )g
and w be the training data and weights of a neural network,
respectively. The ordinary deterministic neural network methods try to get a
point estimate of w by either maximum likelihood estimator (MLE)
w = arg maxw p(Djw) or maximum a posterior (MAP): w =
arg maxw p(wjD) where p(wjD) = p(wp)p(D(D) jw) / p(w)p(Djw).
The w are fixed after training and used for inference for the new
data. In Bayesian learning, we estimate the posterior distribution
p(wjD) during training and marginalize out w during the inference
to get a probability distribution of the prediction.</p>
      <p>p(yjx; D) = Ew p(wjD)[p(yjx; w)] = R p(yjx; w)p(wjD)dw
(1)
After getting the p(yjx), we can calculate the statistical moments of
the predicted variable and regard the first and second moment (i.e.,
mean and variance) as the prediction and uncertainty, respectively.</p>
      <p>However, in practice there are two major difficulties. The first one
is that p(D) = R p(w)p(Djw)dw is usually intractable and thus
we cannot get exact p(wjD). The second lies in that Eq. (1) is also
usually intractable for neural networks. One common approach to
deal with the first difficulty is to use a simpler form of distribution
q(wj ) with hyperparameters to approximate p(wjD) by
minimizing the Kullback-Leibler (KL) divergence between q(wj ) and
p(wjD). This turns the problem into an easier optimization
problem:
= arg min KL[q(wj )jjp(wjD)]
= arg min</p>
      <p>Z
q(wj )log q(wj ) dw
p(wjD)
For the second difficulty, the usual approach is to use sampling to
estimate Eq. (1), and it becomes
p(yjx)</p>
      <p>Ew q(wj )[p(yjx; w)]
1 PT
T i=1 p(yjx; w(i)) (3)
where w(i) q(wj ).</p>
      <p>
        People had proposed different methods to approximate the
posterior p(wj ) or to get the samples of w [
        <xref ref-type="bibr" rid="ref12 ref13 ref26 ref3">26, 3, 12, 13</xref>
        ].
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Stochastic Weight Averaging Gaussian (SWAG)</title>
      <p>
        The basic idea of SWAG [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] is to regard the weights of the
neural networks as random variables and get their statistical moments
through training with SGD. Then use these moments to fit a
multivariate Gaussian to get the posterior distribution of the weights.
After the original training process in which we get the optimal weights,
we continue to train the model using the same training data with
SGD and get T samples of the weights w1, w2, ,wt, ,wT . The
mean of those samples is w = T1 PT
t=1 wt. The mean of the square
is w2 = T1 PtT=1 wt2 and we define a diagonal matrix diag =
diag(w2 w2) and a deviation matrix R = [R1; ; Rt; ; RT ]
whose columns Rt = wt wt, where wt is the running
average of the first t weights samples wt = 1t Ptj=1 wj . In the
original paper, the authors used the last K columns of R to get
the low rank approximation of R. The K-rank approximation is
Rb = [RT K+1; ; RT ]. Then the mean and covariance matrix
for the fitted Gaussian are given by:
(2)
(4)
(5)
wSW A = w
1
SW A = 2
diag + 2(K
1
1) RbRbT
      </p>
      <p>During inference, for each input (image) xi, sample the weights
from the Gaussian ws N (wSW A; SW A) then update the batch
norm statistics by performing one epoch of forward pass, and then
the sample prediction is given by p(y^isjxi) = p(yijxi; ws). Repeat
the precedure for S times and we get S predictions y^i1, y^i2, , y^is,
, y^iS for the same input xi. By using these S predictions we can
get the final prediction and uncertainty. For regression problem, the
final prediction will be y^i = S1 PS</p>
      <p>s=1 y^is.
3.3</p>
    </sec>
    <sec id="sec-6">
      <title>Uncertainty Quantification</title>
      <p>
        Some methods had been proposed to quantify the uncertainty in
classification [
        <xref ref-type="bibr" rid="ref22 ref24">24, 22</xref>
        ]. Here we adopt the method proposed by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] since
it does not require extra output and does not need to modify the
network architectures.
      </p>
      <p>For a classification problem, suppose there are C classes, denote
ps , [ps1; ps2; ; psc] = p(yjx; s), s 2 f1; 2; ; Sg as the
softmax (or sigmoid in binary case if C = 2) output of the
neural network for a same repeated input x for S times, then the
predicted “probability” is the average of those S sample outputs p =
1 PS
S s=1 ps The predicted class label index is y^ = arg maxc p. The
aleatoric uncertainty Ua and the epistemic uncertainty Ue are Ua =
S1 PsS=1[diag(ps) pspsT ], Ue = S1 PsS=1(ps p)(ps p)T The
total uncertainty is Utotal = Ua + Ue. For binary classification, the
sigmoid output is a scalar and the uncertainty equations are reduced
to</p>
      <p>Ua =
Ue =</p>
      <p>S
1 X ps(1
S
s=1</p>
      <p>S
1 X(ps
S
s=1
ps)
p)2
y^ =
(1 p</p>
      <p>0:5
0
p &lt; 0:5
where p = S1 PS</p>
      <p>s=1 ps and ps = p(y = 1jx; s) = 1
0jx; s). The predicted label is:
In this way, we can get uncertainties for all the instances.
3.4</p>
    </sec>
    <sec id="sec-7">
      <title>Transfer Learning</title>
      <p>Transfer learning is a widely used technique to help improve
performance for deep neural networks in image classification. Here we can
also benefit from transfer learning by loading pre-trained neural
network models trained by ImageNet (http://image-net.org)
dataset. The SWAG method has one advantageous characteristic that
it does not require to modify any architecture of the original
neural networks and therefore we can fully utilize pre-trained models
trained by ImageNet dataset to speed up training process and get
better predictions. In the initialization stage, we download the
pretrained model parameters and use them to initialize our models to be
trained.
3.5</p>
    </sec>
    <sec id="sec-8">
      <title>Procedure</title>
      <p>
        Basically we follow the method in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] to approximate the Bayesian
neural network and the formulas in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] to quantify uncertainty of
the models. The overall algorithm for SWAG and uncertainty
quantification is shown in Algorithm 1. We initialize the model with
corresponding pre-trained model, and then fine-tune it by training using
chest X-ray images and observation labels. After that we perform
SWAG algorithm by continuing training using Stochastic Gradient
Descent for T epochs and calculate statistics w, w2,
diag and Rb,
(6)
(7)
(8)
p(y =
      </p>
      <p>Algorithm 1 Uncertainty Quantification</p>
      <sec id="sec-8-1">
        <title>1: Input:</title>
        <p>D = f(X; Y )g / Xi: training / evaluating chest X-ray images
and corresponding observation labels</p>
      </sec>
      <sec id="sec-8-2">
        <title>2: Initialization:</title>
        <p>load pre-trained neural network (NN) models by ImageNet</p>
      </sec>
      <sec id="sec-8-3">
        <title>3: Training:</title>
        <p>Fine-tune NN models using cheXpert dataset</p>
      </sec>
      <sec id="sec-8-4">
        <title>4: Perform SWAG:</title>
      </sec>
      <sec id="sec-8-5">
        <title>Continue training with SGD</title>
        <p>i) train NN models using SGD for some epochs with D
ii) save statistics of the weights for those epochs
iii) calculate wSW A and SW A using Eq. 4 and 5
vi) fit a Gaussian using wSW A as mean and SW A as
covariance
Prediction
for s from 1 to S
draw weights ws N (wSW Aj SW A)
update batch norm statistics using D
p(yisjXi) = p(yisjXi; ws)
end for</p>
      </sec>
      <sec id="sec-8-6">
        <title>5: Calculate Outputs:</title>
        <p>p(yijXi) = S1 PS</p>
        <p>s=1 p(yisjXi)
Calculate y^i, Ua and Ue using Eq. (8), (6) and (7).</p>
        <p>Utotal = Ua + Ue
6: Return:</p>
        <p>
          y^i, Ua, Ue, Utotal
from which we can get wSW A and SW A using Eq. 4 and 5. Then
we fit a multivariate Gaussian using wSW A as mean and SW A as
covariance and get an approximated distribution for the neural
network weights. When doing a prediction, an input chest X-ray image
is repeatedly fed into the network for S times, each time with a new
set of weights sampled from the Gaussian distribution. The S
output probabilities are used to calculate the final predicted label y^i and
uncertainty Utotal = Ua + Ue. It is worthwhile to note that, after
drawing sample weights the network batch norm statistics need to
be updated for the models that use batch normalization. It can be
achieved by running one epoch with partial or full training set D.
More detailed justification for the necessity was given in the original
paper [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>DATASET</title>
      <p>
        We perform experiments using the CheXpert data set [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. CheXpert
is a large chest X-ray dataset released by researchers at Stanford
University. This dataset consists of 224,316 chest radiographs of 65,240
patients. Each data instance contains a chest X-ray image and a
vector label describing the presence of 14 observations (pathologies) as
positive, negative, or uncertain. The labels were extracted from
radiology reports using natural language processing approaches. For
our experiments we focus on 5 observations, namely Cardiomegaly,
Edema, Atelectasis, Consolidation and Pleural Effusion. As [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] had
pointed out, these 5 observations were selected based on their clinical
importance and prevalence in this dataset. In their experiment they
also used these 5 observations to evaluate the labeling approaches. A
sample image for each observation is shown in Figure 1.
      </p>
      <p>
        The original dataset consists of training set and validation set and
we do not have access to test set. The labels for the training set were
generated by automated rule-based labeler which extract
information from radiology reports. This was done by the Stanford research
group who released the dataset. There are three possible values for
the label of an instance for a given observation, i.e., 1, 0 and 1. 1
means the observation is positive (or exists), 0 means negative (or
not exists), and 1 means not certain about whether the observation
exists. The labels for the validation set were determined by the
majority vote from three board-certified radiologists and only contains
positive (1) or negative (0) values. The original paper [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
investigated several different ways to deal with the uncertain labels ( 1),
such as regarding them as positive (1), negative (0), the same with
the majority class, or a separate class. They found out that for
different observations, the optimal ways to deal with the uncertain labels
are different, and they gave the replacement for 5 observations
mentioned above. Based on the results from [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and for simplicity, we
replace the uncertain labels with 0 or 1 for different observations.
      </p>
      <p>Specifically, the uncertain labels of cardiomegaly, consolidation
and pleural effusion are replaced with 0, while edema and
atelectasis with 1. Therefore the problem becomes a multi-label binary
image classification problem. The predicted result is a five dimensional
vector with element value being 1 or 0, where 1 means that the
network predicts existence for the corresponding observation while 0
means the network predicts not existence of the corresponding
observation. We follow the official training set / validation set split given
by the data set provider. After removing invalid instances, we get a
total number of 223,414 instances for training and 234 instances for
validation. We first initialize the neural network’s parameters with
corresponding downloaded pre-trained model parameters, and then
train the neural network using the training set and test their
performance on the validation set. We will use the original training set as
the training set and original validation set as the evaluation set in our
experiments.</p>
      <p>In Figure 2 we show the patient statistics of the 5 observations
after replacing the uncertain labels in the training set. The prevalence
is the ratio of the number of positive instances over the total
number of instances. From the figure we can see that all five observations
are imbalance as the prevalence being under 50%. Besides, there is
a gap in the prevalence for the training and evaluation sets in all
observations, which will probably affect the performance of the neural
network models.
5</p>
    </sec>
    <sec id="sec-10">
      <title>EXPERIMENT</title>
      <p>
        In this section, we perform experiments and present the
investigation results of uncertainty quantification and strategy on five
different neural network models using PyTorch implementation. These
neural networks are DenseNet [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] with 121 layers (denote as
DenseNet121), DenseNet with 201 layers (denote as DenseNet201),
ResNet [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] with 152 layers (denote as ResNet152), ResNeXt [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]
with 101 layers (denote as ResNeXt101) and Squeeze-and-Excitation
network [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] with 154 layers (denote as SENet154). ResNet uses
(a) Prevalence of observations
(b) Gender proportion
(c) Training set age histogram
      </p>
      <p>
        (d) Validation set age histogram
skip connections to mitigrate the gradient vanishment problem and
was the winner of ILSVRC 2015 [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] and COCO 2015 (http://
cocodataset.org) competition. ResNeXt is a variant of ResNet
and won the 2nd place in ILSVRC 2016 classification task. DenseNet
further utilizes the concept of skip connections by connecting
previous layer output to all its subsequent layers and forming “dense” skip
connections. DenseNet further alleviates vanishing gradient
problem, reduce number of parameters and reuses intermediate features,
and is widely used since it was proposed. SENet uses
squeeze-andexcitation block to model image channel interdependencies and won
the ILSVRC 2017 competition for classification task.
      </p>
      <p>All networks are trained as binary classifiers for multi-label
classification instead of training separate models for each class.</p>
      <p>The pipeline of the experiment is shown in Figure 4. We use
PyTorch implementation. The neural network models and
pretrained parameters are from torchvision (except SENet154 which
is from pretrainedmodels, https://github.com/Cadene/
pretrained-models.pytorch).</p>
      <p>In our experiment we set the number of sample weights T = 5,
the number of columns of the deviation matrix K = 10 and the
number of repeated prediction samples S = 10. During training,
we use Adam optimizer with weight decay regularizer and
ReduceLROnPlateau learning rate scheduler. The the initial learning rate is
1 10 5 and weight decay coefficient is 0:005. The maximum
number of fine-tuning epoch is 50 epochs. The original chest X-ray
images are resized and randomly cropped to 256 256 (except for
SENet154 which has a fixed input size 224 224). We stop
finetuning the model when the AUC (explained below) does not increase
for consecutive 10 epochs and save the model with the best AUC as
the optimal trained model.</p>
      <p>We use four metrics to evaluate the network classification
performance: Area under curve (AUC), Sensitivity, Specifity and Precision.
Those metrics are widely used for machine learning and medicine
community. The AUC is often used to measure the quality of a
classifier and is defined as the area under the Receiver Operating
Characteristic (ROC) curve which plots the sensitivity against the false
positive rate. The sensitivity (or true positive rate or recall) is defined as
the ratio of the number of correctly predicted positive instances over
the number of total positive instances. The specificity is defined as
the ratio of the number of correctly predicted negative instances over
the total number of negative instances. And the precision is defined
as the ratio of the number of correctly predicted positive instances
over the number of instances that are predicted as positive.
5.1</p>
    </sec>
    <sec id="sec-11">
      <title>Without Strategy</title>
      <p>First we compare the AUC of the original ordinary deterministic
neural networks with the AUC corresponding neural networks after
performing SWAG but before applying any uncertainty strategies. The
results are shown in Table 1. The “Average” column is the average
over all 5 observations. The bold font indicates better performance.
For edema and pleural effusion, the original neural network performs
better than SWAG for most of the networks. For cardimegaly,
consolidation and atelectasis, the performances are mixed. This maybe
because edema and pleural effusion are harder to detect and more
sensitive to network weights perturbation. On the whole the SWAG
algorithm does not outperform the original neural network. These
might be accountable because SWAG uses a Gaussian to
approximate the distribution over the optimal weights and then draws
sample weights from the approximated Gaussian distribution, and may
deviate from the optimal weights if the approximation is inaccurate.
Therefore we need to adopt some strategy to prevent the performance
from deterioration. The benefit lies in that we can get the uncertainty
estimation for each prediction while keeping similar or even better
prediction results.
Next we utilize the uncertainty quantification information to
determine if the performances can be improved. One strategy is to sort
instances according to uncertainty in an ascending order, and then
take those instances with less uncertainty into consideration and
discard the rest. In clinical practice, the discarded instances could be
flagged for further evaluation by a physician.</p>
      <p>Ideally we would expect a decreasing trend for the metrics when
data coverage increase as shown in Figure 6. The horizontal axis
“Data coverage” is the percentage of instances being considered. For
example, a data coverage of 20% means that only the top twenty
percent of the least uncertain (or the most confident) instances are taken
into consideration and the rest are discarded.</p>
      <p>Figure 3 shows the comparison of performances with regard to the
foure metrics (AUC, sensitivity, specificity and precision) between
the original deterministic networks and Bayesian neural networks
with uncertainty strategy. The solid lines are the Bayesian neural
network with uncertainty strategy, while the dashed lines are the
original ordinary deterministic networks without any uncertainty strategy.
Different colors represent different observations.</p>
      <p>From Figure 3 we can see that for edema and pleural effusion, the
AUC decreases as the coverage increases, and are above the
corresponding original AUC until around 45% and 90% coverage,
respectively. This means that applying the uncertainty strategy can improve
AUC for these two observations. The highest AUC gain can be 8%
and 6% for edema and pleural effusion, respectively. We also observe
similar trend in sensitivity, specificity and precision for both edema
and pleural effusion. Three observations (cardiomegaly, atelectasis
and consolidation) have low sensitivity as most of the predictions are
negative. On the contrary the specificity is high.</p>
      <p>The highest gains for applying the uncertainty strategy are shown
in the Table 2. The effect of the uncertainty strategy over the five
observations with the model DenseNet201 can be summarized as in the
Table 3. The symbols p, , and represents helpful, not helpful,
mixed behavior and missing value, respectively. For edema and
pleural effusion, applying uncertainty strategy is beneficial for improving
all four metrics. However, for other observations, it does not show
benefits or only limited benefits for some metrics. The reason why
it show varied behavior may be interesting and needs further
investigation. Similarly, we summarize the effect of applying uncertainty
strategy for different neural network architectures and the results are
shown in Table 4 to Table 7. From the tables we can see that applying</p>
      <p>Despite that for some observations (e.g., pleural effusion), several
metrics performance benefit a lot from applying the uncertainty
strategy, we should also notice that the strategy does not help to improve
performance for some other observations with regard to these
metrics, and in some cases even degrade the performance. The reasons
behind might be varied and needs more investigation. For example,
this may be that the neural network weight distribution approximated
by the SWAG algorithm does not capture the true distribution, or
even the uncertainty quantification formulas are inappropriate.
5.3</p>
    </sec>
    <sec id="sec-12">
      <title>With Absolute Threshold Strategy</title>
      <p>We also plot the total uncertainty distribution for each observation, as
shown in Figure 5. From the figure we can see that for cardiomegaly,
the estimated uncertainty tends to be smaller, while for edema,
atelectasis and plueral effusion, the proportion of larger estimated
uncertainty is higher. Consolidation has a relatively even distribution
for estimated uncertainty. This suggest that edema, atelectasis and
pleural effusion are more prone to be affected by setting an
uncertainty threshold. Combining this finding with the results in Table
2, we set thresholds for both edema and pleural effusion to check
the influence on metric performance. We only consider the instances
whose estimated uncertainty is smaller than the threshold to compute
the performance metrics. We vary the threshold from 0:2 to 0:24 by a
step of 0:01 and the results are shown in Figure 7. The black dashed
line is the average metric values of the original deterministic neural
network, while the solid color thin lines are metric values for each
observation, and the thick brown line is the average metric values
of all five observation after applying threshold only to edema and
pleural effusion. Comparing the thick brown line with the dash black
line, we can see that the average specificity and precision have been
improved while the average AUC and sensitivity roughly keep the
same. This means that applying uncertainty threshold to edema and
pleural effusion is beneficial.
6</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSION</title>
      <p>In this paper we investigate uncertainty quantification in medical
image classification using Bayesian deep neural networks. We train five
different deep neural network models on the CheXpert X-ray image
data for five clinical observations and quantify the model uncertainty.
Then we analyze the performance of the network for situations with
and without applying uncertainty strategy. The results show that the
uncertainty quantification and strategy improve several performance
metrics for some observations. This suggests that uncertainty
quantification is helpful in medical image classification using neural
networks. However, the results also show that in some cases the strategy
is not helpful, or can even deteriorate the performance. Further
analysis may be needed to examine this phenomenon.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Yaniv</given-names>
            <surname>Bar</surname>
          </string-name>
          , Idit Diamant, Lior Wolf, Sivan Lieberman, Eli Konen, and Hayit Greenspan, '
          <article-title>Chest pathology detection using deep learning with non-medical training'</article-title>
          ,
          <source>in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI)</source>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>297</lpage>
          . IEEE, (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Christopher</surname>
            <given-names>M Bishop</given-names>
          </string-name>
          ,
          <article-title>Pattern recognition and machine learning</article-title>
          , springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Charles</given-names>
            <surname>Blundell</surname>
          </string-name>
          , Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra, '
          <article-title>Weight uncertainty in neural networks'</article-title>
          ,
          <source>arXiv preprint arXiv:1505.05424</source>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mariusz</given-names>
            <surname>Bojarski</surname>
          </string-name>
          ,
          <source>Davide Del Testa</source>
          , Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal,
          <string-name>
            <surname>Lawrence D Jackel</surname>
            , Mathew Monfort, Urs Muller,
            <given-names>Jiakai</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , et al., '
          <article-title>End to end learning for selfdriving cars'</article-title>
          ,
          <source>arXiv preprint arXiv:1604.07316</source>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>William</given-names>
            <surname>Chan</surname>
          </string-name>
          , Navdeep Jaitly, Quoc Le, and Oriol Vinyals, '
          <article-title>Listen, attend and spell: A neural network for large vocabulary conversational speech recognition'</article-title>
          ,
          <source>in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , pp.
          <fpage>4960</fpage>
          -
          <lpage>4964</lpage>
          . IEEE, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Junyoung</given-names>
            <surname>Chung</surname>
          </string-name>
          , Kyunghyun Cho, and Yoshua Bengio, '
          <article-title>A characterlevel decoder without explicit segmentation for neural machine translation'</article-title>
          ,
          <source>arXiv preprint arXiv:1603.06147</source>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Marleen de Bruijne</surname>
          </string-name>
          .
          <article-title>Machine learning approaches in medical image analysis: From detection to diagnosis</article-title>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Chao</given-names>
            <surname>Dong</surname>
          </string-name>
          , Chen Change Loy, and Xiaoou Tang, '
          <article-title>Accelerating the super-resolution convolutional neural network'</article-title>
          ,
          <source>in European conference on computer vision</source>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>407</lpage>
          . Springer, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Meherwar</given-names>
            <surname>Fatima</surname>
          </string-name>
          and Maruf Pasha, '
          <article-title>Survey of machine learning algorithms for disease diagnostic'</article-title>
          ,
          <source>Journal of Intelligent Learning Systems and Applications</source>
          ,
          <volume>9</volume>
          (
          <issue>01</issue>
          ),
          <fpage>1</fpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Konstantinos</surname>
            <given-names>P Ferentinos</given-names>
          </string-name>
          , '
          <article-title>Deep learning models for plant disease detection and diagnosis', Computers and</article-title>
          Electronics in Agriculture,
          <volume>145</volume>
          ,
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yarin</surname>
            <given-names>Gal</given-names>
          </string-name>
          ,
          <article-title>Uncertainty in deep learning</article-title>
          ,
          <source>Ph.D. dissertation</source>
          ,
          <source>PhD thesis</source>
          , University of Cambridge,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          and Zoubin Ghahramani, '
          <article-title>Bayesian convolutional neural networks with bernoulli approximate variational inference'</article-title>
          ,
          <source>arXiv preprint arXiv:1506.02158</source>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          and Zoubin Ghahramani, '
          <article-title>Dropout as a bayesian approximation: Representing model uncertainty in deep learning'</article-title>
          ,
          <source>in international conference on machine learning</source>
          , pp.
          <fpage>1050</fpage>
          -
          <lpage>1059</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Yarin</surname>
            <given-names>Gal</given-names>
          </string-name>
          , Riashat Islam, and Zoubin Ghahramani, '
          <article-title>Deep bayesian active learning with image data'</article-title>
          ,
          <source>in Proceedings of the 34th International Conference on Machine Learning-Volume 70</source>
          , pp.
          <fpage>1183</fpage>
          -
          <lpage>1192</lpage>
          . JMLR. org, (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Alex</surname>
            <given-names>Graves</given-names>
          </string-name>
          , Abdel-rahman
          <string-name>
            <surname>Mohamed</surname>
          </string-name>
          , and Geoffrey Hinton, '
          <article-title>Speech recognition with deep recurrent neural networks'</article-title>
          ,
          <source>in 2013 IEEE international conference on acoustics, speech and signal processing</source>
          , pp.
          <fpage>6645</fpage>
          -
          <lpage>6649</lpage>
          . IEEE, (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Hayit</surname>
            <given-names>Greenspan</given-names>
          </string-name>
          , Bram Van Ginneken, and
          <string-name>
            <surname>Ronald</surname>
            <given-names>M Summers</given-names>
          </string-name>
          , '
          <article-title>Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique'</article-title>
          ,
          <source>IEEE Transactions on Medical Imaging</source>
          ,
          <volume>35</volume>
          (
          <issue>5</issue>
          ),
          <fpage>1153</fpage>
          -
          <lpage>1159</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kaiming</surname>
            <given-names>He</given-names>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and Jian Sun, '
          <article-title>Deep residual learning for image recognition'</article-title>
          ,
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Andrew</surname>
            <given-names>G Howard</given-names>
          </string-name>
          ,
          <article-title>Menglong Zhu</article-title>
          , Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, 'Mobilenets:
          <article-title>Efficient convolutional neural networks for mobile vision applications'</article-title>
          ,
          <source>arXiv preprint arXiv:1704.04861</source>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Jie</surname>
            <given-names>Hu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Li</given-names>
            <surname>Shen</surname>
          </string-name>
          , and Gang Sun, '
          <article-title>Squeeze-and-excitation networks'</article-title>
          ,
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pp.
          <fpage>7132</fpage>
          -
          <lpage>7141</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Gao</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Zhuang Liu,
          <string-name>
            <surname>Laurens Van Der Maaten</surname>
          </string-name>
          , and
          <article-title>Kilian Q Weinberger, 'Densely connected convolutional networks'</article-title>
          ,
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pp.
          <fpage>4700</fpage>
          -
          <lpage>4708</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Jeremy</surname>
            <given-names>Irvin</given-names>
          </string-name>
          , Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball,
          <string-name>
            <given-names>Katie</given-names>
            <surname>Shpanskaya</surname>
          </string-name>
          , et al., 'Chexpert:
          <article-title>A large chest radiograph dataset with uncertainty labels and expert comparison'</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>07031</volume>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Kendall</surname>
          </string-name>
          and Yarin Gal, '
          <article-title>What uncertainties do we need in bayesian deep learning for computer vision</article-title>
          ?',
          <source>in Advances in neural information processing systems</source>
          , pp.
          <fpage>5574</fpage>
          -
          <lpage>5584</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Ankit</surname>
            <given-names>Kumar</given-names>
          </string-name>
          , Ozan Irsoy,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Ondruska</surname>
          </string-name>
          , Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher, '
          <article-title>Ask me anything: Dynamic memory networks for natural language processing'</article-title>
          ,
          <source>in International conference on machine learning</source>
          , pp.
          <fpage>1378</fpage>
          -
          <lpage>1387</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Yongchan</surname>
            <given-names>Kwon</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joong-Ho</surname>
            <given-names>Won</given-names>
          </string-name>
          , Beom Joon Kim, and Myunghee Cho Paik, '
          <article-title>Uncertainty quantification using bayesian neural networks in classification: Application to ischemic stroke lesion segmentation'</article-title>
          ,
          <source>Medical Imaging with Deep Learning</source>
          ,
          <year>2018</year>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25] Antonio Lavecchia, '
          <article-title>Machine-learning approaches in drug discovery: methods and applications'</article-title>
          ,
          <source>Drug discovery today</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <fpage>318</fpage>
          -
          <lpage>331</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Wesley</surname>
            <given-names>Maddox</given-names>
          </string-name>
          , Timur Garipov, Pavel Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson, '
          <article-title>A simple baseline for bayesian uncertainty in deep learning'</article-title>
          , arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>02476</volume>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Pranav</surname>
            <given-names>Rajpurkar</given-names>
          </string-name>
          , Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz,
          <string-name>
            <given-names>Katie</given-names>
            <surname>Shpanskaya</surname>
          </string-name>
          , et al., 'Chexnet:
          <article-title>Radiologist-level pneumonia detection on chest x-rays with deep learning'</article-title>
          ,
          <source>arXiv preprint arXiv:1711.05225</source>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Muhammad</given-names>
            <surname>Imran</surname>
          </string-name>
          <string-name>
            <surname>Razzak</surname>
          </string-name>
          , Saeeda Naz, and Ahmad Zaib, '
          <article-title>Deep learning for medical image processing: Overview, challenges and the future'</article-title>
          , in Classification in BioApps,
          <volume>323</volume>
          -
          <fpage>350</fpage>
          , Springer, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Olga</surname>
            <given-names>Russakovsky</given-names>
          </string-name>
          , Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein,
          <string-name>
            <surname>Alexander C. Berg</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei,
          <article-title>'ImageNet Large Scale Visual Recognition Challenge'</article-title>
          ,
          <source>International Journal of Computer Vision (IJCV)</source>
          ,
          <volume>115</volume>
          (
          <issue>3</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Benjamin</surname>
          </string-name>
          Sanchez-Lengeling and
          <article-title>Ala´n Aspuru-Guzik, 'Inverse molecular design using machine learning: Generative models for matter engineering'</article-title>
          ,
          <source>Science</source>
          ,
          <volume>361</volume>
          (
          <issue>6400</issue>
          ),
          <fpage>360</fpage>
          -
          <lpage>365</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Schulam</surname>
          </string-name>
          and Suchi Saria, '
          <article-title>Can you trust this prediction? auditing pointwise reliability after learning'</article-title>
          ,
          <source>in The 22nd International Conference on Artificial Intelligence and Statistics</source>
          , pp.
          <fpage>1022</fpage>
          -
          <lpage>1031</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Murat</surname>
            <given-names>Sensoy</given-names>
          </string-name>
          , Lance Kaplan, and Melih Kandemir, '
          <article-title>Evidential deep learning to quantify classification uncertainty'</article-title>
          ,
          <source>in Advances in Neural Information Processing Systems</source>
          , pp.
          <fpage>3179</fpage>
          -
          <lpage>3189</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Kenji</surname>
            <given-names>Suzuki</given-names>
          </string-name>
          , '
          <article-title>Overview of deep learning in medical imaging'</article-title>
          ,
          <source>Radiological physics and technology</source>
          ,
          <volume>10</volume>
          (
          <issue>3</issue>
          ),
          <fpage>257</fpage>
          -
          <lpage>273</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Ryutaro</surname>
            <given-names>Tanno</given-names>
          </string-name>
          , Daniel Worrall, Enrico Kaden, Aurobrata Ghosh, Francesco Grussu, Alberto Bizzi, Stamatios N Sotiropoulos, Antonio Criminisi, and Daniel C Alexander, '
          <article-title>Uncertainty quantification in deep learning for safer neuroimage enhancement'</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>13418</volume>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Demetri</given-names>
            <surname>Terzopoulos</surname>
          </string-name>
          et al., '
          <article-title>Semi-supervised multi-task learning with chest x-ray images'</article-title>
          ,
          <source>in International Workshop on Machine Learning in Medical Imaging</source>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>159</lpage>
          . Springer, (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Huanqing</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Peter Xiaoping Liu,
          <string-name>
            <given-names>Shuai</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Ding</given-names>
            <surname>Wang</surname>
          </string-name>
          , '
          <article-title>Adaptive neural output-feedback control for a class of nonlower triangular nonlinear systems with unmodeled dynamics'</article-title>
          ,
          <source>IEEE transactions on neural networks and learning systems</source>
          ,
          <volume>29</volume>
          (
          <issue>8</issue>
          ),
          <fpage>3658</fpage>
          -
          <lpage>3668</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Xiaosong</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and
          <string-name>
            <surname>Ronald</surname>
            <given-names>M Summers</given-names>
          </string-name>
          , 'Chestx-ray8:
          <article-title>Hospital-scale chest xray database and benchmarks on weakly-supervised classification and localization of common thorax diseases'</article-title>
          ,
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pp.
          <fpage>2097</fpage>
          -
          <lpage>2106</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Saining</surname>
            <given-names>Xie</given-names>
          </string-name>
          , Ross Girshick, Piotr Dolla´r, Zhuowen Tu, and Kaiming He, '
          <article-title>Aggregated residual transformations for deep neural networks'</article-title>
          ,
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pp.
          <fpage>1492</fpage>
          -
          <lpage>1500</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Hao-Yu</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Junling</given-names>
            <surname>Yang</surname>
          </string-name>
          , Yue Pan, Kunlin Cao, Qi Song,
          <string-name>
            <given-names>Feng</given-names>
            <surname>Gao</surname>
          </string-name>
          , and Youbing Yin, '
          <article-title>Learn to be uncertain: Leveraging uncertain labels in chest x-rays with bayesian neural networks'</article-title>
          ,
          <source>in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops</source>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Jiayu</surname>
            <given-names>Yao</given-names>
          </string-name>
          , Weiwei Pan, Soumya Ghosh, and
          <string-name>
            <surname>Finale</surname>
          </string-name>
          Doshi-Velez, '
          <article-title>Quality of uncertainty quantification for bayesian neural network inference'</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>09686</volume>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Tom</given-names>
            <surname>Young</surname>
          </string-name>
          , Devamanyu Hazarika, Soujanya Poria, and Erik Cambria, '
          <article-title>Recent trends in deep learning based natural language processing'</article-title>
          ,
          <source>ieee Computational intelligenCe magazine</source>
          ,
          <volume>13</volume>
          (
          <issue>3</issue>
          ),
          <fpage>55</fpage>
          -
          <lpage>75</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>