<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Comparison of Uncertainty Estimation Approaches in Deep Learning Components for Autonomous Vehicle Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>CEA LIST</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gif-sur-Yvette</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>fabio.arnez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>huascar.espinoza</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ansgar.radermacher</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>francois.terrier}@cea.fr</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>A key factor for ensuring safety in Autonomous Vehicles (AVs) is to avoid any abnormal behaviors under undesirable and unpredicted circumstances. As AVs increasingly rely on Deep Neural Networks (DNNs) to perform safety-critical tasks, different methods for uncertainty quantification have recently been proposed to measure the inevitable source of errors in data and models. However, uncertainty quantification in DNNs is still a challenging task. These methods require a higher computational load, a higher memory footprint, and introduce extra latency, which can be prohibitive in safety-critical applications. In this paper, we provide a brief and comparative survey of methods for uncertainty quantification in DNNs along with existing metrics to evaluate uncertainty predictions. We are particularly interested in understanding the advantages and downsides of each method for specific AV tasks and types of uncertainty sources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the last decade, Deep Neural Networks (DNNs) have
witnessed great advances in real-world applications like
Autonomous Vehicles (AVs) to perform complex tasks such as
object detection and tracking or vehicle control. Despite
substantial performance improvements introduced by DNNs,
they still have significant safety shortcomings due to their
complexity, opacity and lack of interpretability [McAllister
et al., 2017]. In particular, DNNs are brittle to operational
domain shift and even small data corruption or perturbations
[Kuutti et al., 2020]. This impedes ensuring the
reliability of the DNNs models, which is a precondition for
safetycritical systems to ensure compliance with automotive
industry safety standards and avoid jeopardizing human lives.</p>
      <p>A concrete safety problem is to detect abnormal situations
under uncertain environment conditions and DNN-specific
unpredictability. These situations are difficult to analyze
during system development phases, in a way that they can be
properly mitigated at a real-time scale. Indeed, although a
DNN model achieves great performance in a validation set
from its operation environment, it is currently impossible
to test and provide the same performance guarantees in all
the possible environment configurations the system could
encounter in the real world [Kuutti et al., 2020]. A common
practice to overcome this problem is to use runtime
monitoring of DNN components, so that safety can be ensured even if
the component was not fully validated at design time [Henne
et al., 2020; Koopman et al., 2019]. A central aspect to enable
DNN monitoring is to provide a runtime treatment of
uncertainties associated with DNN’s predictions [McAllister et al.,
2017; Koopman et al., 2019].</p>
      <p>In this paper, we review common uncertainty estimation
methods for DNNs and compare their performance and
benefits for different AV tasks. These methods offer a potential
solution for runtime DNN confidence prediction and
detection of Out-of-Distribution (OOD) samples, since prediction
probability scores in DNNs do not provide a true
representation of uncertainty [Mohseni et al., 2019]. However, these
methods still demand a high computational load, incorporate
extra latency, and require a larger memory footprint. We
compare these factors since they can represent a major
impediment in safety-critical applications with tight time constraints
and limited computation hardware. We also briefly focus on
surveying uncertainty metrics that evaluate the performance
of quantification methods, as another critical factor to ensure
safety in AV systems.</p>
      <p>The remainder of the paper is structured as follows.
Section 2 describes the sources of uncertainty in deep learning
for AVs. Section 3 presents a comparison of recent works
in AV tasks that include uncertainty estimation methods for
DNNs. It provides a brief review of common uncertainty
estimation methods in deep learning as well as metrics for
predictive uncertainty evaluation in classification and regression
tasks. Section 4 discusses the open challenges and possible
directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>Sources of Uncertainty in Deep Learning for</title>
    </sec>
    <sec id="sec-3">
      <title>Autonomous Vehicles</title>
      <p>Autonomous vehicles have to deal with dynamic,
nonstationary and highly unpredictable operational
environments. Taking into account all the details from the
operational environment at design time is an intractable task.
Instead, the operational environment is constrained in a way
2
2.1</p>
      <sec id="sec-3-1">
        <title>Background</title>
        <p>that it considers only a subset of all possible situations that
the system can encounter in operation. This process is known
as Operational Design Domain (ODD) adoption [Koopman
and Fratrik, 2019], and safety requirements are built on the
top of the ODD specification.</p>
        <p>Given the constrained operational environment within
system ODD, ensuring safety in an AV requires the identification
of unfamiliar contexts by modeling AV’s uncertainty
[McAllister et al., 2017]. However, there are many factors, not
only related to the environment, that affect the system
performance by introducing some degree of uncertainty.
[Czarnecki and Salay, 2018] identify a set of factors that contribute
to uncertainty in the perception function in an AV, and in this
manner affect its performance. From this set, we take
special attention to sensor properties, model uncertainty,
situation and scenario coverage, and operational domain
uncertainty factors. In the context of DNNs, the first two
factors can be modeled by using uncertainty estimation methods,
while the last two correspond to some degree of dataset shift
(i.e. breaking the independent and identically distributed
assumption between training and testing data.) and
Out-ofDistribution (OOD) samples [Quionero-Candela et al., 2009;
Mohseni et al., 2019].</p>
        <p>Sensor properties like range, resolution, noise
characteristics, and calibration can influence the amount of information
in the samples delivered to a machine learning model during
training or testing. In consequence, the effect of these
properties are captured as noise and ambiguities inherent to the
obtained samples. This type of noise in the data is known as
Aleatoric uncertainty, and represents the incapability of
completely sensing all the details of the environment [Kendall
and Gal, 2017; Lee et al., 2019b; Gustafsson et al., 2019].
Aleatoric uncertainty can be further further classified into
homoscedastic uncertainty (uncertainty that remains constant
for different samples), and heteroscedastic uncertainty
(uncertainty that can vary between samples).</p>
        <p>Model uncertainty is often referred to as Epistemic
uncertainty, and accounts for uncertainty in the model parameters.
This type of uncertainty captures the ignorance of the model
as a consequence of a dataset that does not represent the ODD
well, or that is not sufficiently large [Kendall and Gal, 2017;
Lee et al., 2019b]. Epistemic uncertainty is expected to
increase in unknown situations (e.g. different environment
ODD conditions such as weather or lightning), and can be
explained away by incorporating more data.</p>
        <p>Situation and scenario coverage is related to the degree in
which situations and scenarios from an ODD are reflected in
training and operation stages; while operational domain
uncertainty refers to a discrepancy between ODD situations and
scenarios present at training and those encountered at
operation (e.g. scenarios from two different ODDs) [Czarnecki
and Salay, 2018]. In both cases, uncertainty can be reduced
by incorporating more data, or by adjusting the ODD
specification. However, it is extremely important to detect and
discover OOD samples (i.e. outliers), especially those that
have not been seen before, since those can lead to highly
confident predictions that are wrong, i.e., the unknown-unknowns
[Bansal and Weld, 2018].</p>
        <p>In a similar fashion as the cases presented before,
automotive industry standard ISO/PAS 21448 or SOTIF (Safety
Of The Intended Functionality) [ISO, 2019], provides a
process to identify unknown and potentially unsafe scenarios to
minimize the risk by recognizing the performance limitations
from sensors, algorithms, or user misuse. Unsafe scenarios
can be further classified into unsafe-known (e.g out of ODD
samples) or unsafe-unknown (e.g. OOD samples). Once an
unknown-unsafe scenario or situation is identified, it becomes
a known-unsafe scenario that can be mitigated at design time
[Rau et al., ; Mohseni et al., 2019].
2.2</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Uncertainty Estimation Methods for DNNs</title>
      <p>In recent years, many probabilistic deep learning methods
have been proposed to obtain an uncertainty measure from
an approximation to the (highly multi-modal) predictive
distribution, as well as methods for calibrating the outputs of
DNNs. In general, there are two approaches for DNN
predictive uncertainty calculation: sampling-based and
samplingfree methods. Sampling-based methods rely on taking
multiple predictive samples based on the same input to get the
estimator that will be associated with uncertainty. Sampling-free
methods require one single predictive output. These methods
are further discussed in Section 3.</p>
    </sec>
    <sec id="sec-5">
      <title>Neural Network Calibration</title>
      <p>Confidence calibration represents the degree to which a
model’s predicted probability estimates the true correctness
likelihood [Guo et al., 2017]. Under ideal circumstances, we
expect that the normalized outputs from a DNN (i.e softmax
outputs) correspond to the true correctness likelihood [Guo
et al., 2017]. From a frequentist perspective, this can be
viewed as a discrepancy measure between local confidence
(or uncertainty) predictions and the expected performance in
the long-run [Hubschneider et al., 2019; Lakshminarayanan
et al., 2017]. For example, we expect that a class predicted
with probability p is correct p% of the time, i.e. from 100
samples predicted with confidence 0.9, we expect 90
correct predictions. DDNs can be calibrated by using
Temperature Scaling, a simple post-processing technique [Guo et
al., 2017], or more recently, Dirichelt calibration [Kull et
al., 2019]. For a regression setting, [Kuleshov et al., 2018;
Hubschneider et al., 2019] formalize the calibration notion
for continuous variables, in which a p% confidence interval
should contain the true outcome p% of the time.</p>
      <p>Despite the improvements achieved with calibration
methods, they can not be seen as a complete solution for
uncertainty estimation problem, since calibration is performed
relative to a validation dataset [Kull et al., 2019; Ashukha et al.,
2020] (i.e., calibration methods rely on in-distribution
samples to learn a calibration map). In the presence of OOD
samples, a model is no longer calibrated. This limits the
contribution of calibration techniques to scenarios where huge
training datasets are available.</p>
      <sec id="sec-5-1">
        <title>Comparison of Uncertainty Estimation</title>
      </sec>
      <sec id="sec-5-2">
        <title>Methods in AV Domain</title>
        <p>In this section, we compare and analyze some common
uncertainty estimation methods in terms of out-of-the-box
calibration in the predictions (i.e. without a prior calibration),
computational budget, memory footprint, and required changes in
the DNN for applying each method (architecture, loss
function, and others). We have chosen the most representative
works to the best of our knowledge in each application. Some
of the listed works introduce improvements by performing
combinations between other methods. This is summarized in
Table 1.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Methods Limited to Aleatoric Uncertainty</title>
      <p>The first four methods listed in Table 1 exclusively deal with
aleatoric uncertainty. In classification tasks, uncertainty is
usually represented by normalized logits at the output layer
(e.g. softmax output) which can be interpreted as a
probability distribution related to aleatoric uncertainty [Gustafsson
et al., 2019]. Unfortunately, normalized outputs as
probability distributions fail to capture model uncertainty and this
very often results in overconfident predictions that are wrong
[Guo et al., 2017], especially in the presence of dataset-shift.
To overcome the problems of softmax, [Gast and Roth, 2018]
propose to use a Dirichlet distribution instead.</p>
      <p>In a regression configuration, deep learning models do not
have an uncertainty representation by default. The outputs of
a DNN are intended to parameterize a probability distribution
(e.g., Gaussian, Laplace) to obtain a probabilistic
representation. This modification of the architecture allows DNNs to
learn aleatoric uncertainty from the data itself by using thes
heteroscedastic loss and maximum likelihood [Kendall and
Gal, 2017; Ilg et al., 2018]. Similarly, in the heteroscedastic
version of the classification, [Kendall and Gal, 2017] place
a Gaussian distribution over the output logits (i.e., each logit
with its respective variance), before the softmax layer is
applied. An alternative approach replaces the input, output and
activation functions of a DNN with probability distributions
[Gast and Roth, 2018]. This method allows the propagation
of a fixed uncertainty at the input to the output of the DNN
employing Assumed Density Filtering (ADF).
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Bayesian Neural Networks</title>
      <p>Bayesian Neural Networks (BNNs), aim to learn a
distribution over the weights instead of point estimates. In this way,
we look for the posterior distribution of the weights given the
data p(wjD), by applying Bayes’ theorem from the data
likelihood and a chosen prior distribution over the weights p(w):
p(wjD) =
p(Djw)p(w)
p(D)</p>
      <p>p(Djw)p(w)
= R p(Djw)p(w)dw
Given the predictive posterior distribution p(wjD), we
obtain the predictive posterior distribution for a new input x
by marginalizing over the model parameters:</p>
      <p>Z
p(y jx ; D) =
p(y jx ; w)p(wjD)dw
(1)
(2)</p>
      <p>Instead of relying on only one configuration of the weights,
we use every possible configuration of the weights (all
possible models) weighted by the posterior on the parameters, to
make a prediction, i.e. p(y jx ; D) = Ep(wjD)[p(y jx ; w)].
This represents the Bayesian Model Average (BMA) and
accounts for epistemic uncertainty [Wilson and Izmailov, 2020;
Gal, 2016; Blundell et al., 2015].</p>
      <p>Unfortunately, the integrals from (1) and (2) are intractable.
Thus, we must build a distribution that approximates the true
posterior distribution on the weights, q(w) p(wjD). Two
main paradigms exist to build q(w): Markov Chain Monte
Carlo (MCMC) and Variational Inference (VI) methods. In
the former, the gold standard is Hamiltonian Monte Carlo
(HMC), and other methods like Stochastic Gradient MCMC
(SG-MCMC) have been explored. However, MCMC
methods are in general hard to scale to large DNNs due to the
highdimensional and multi-modal posterior distribution
[Gustafsson et al., 2019]. In the latter case, VI methods
approximate the posterior over the weights by approximating a
simpler distribution q (w) (e.g. a gaussian) parameterized by .
The parameters of q (w) are found by minimizing the
KLdivergence to p(wjD).</p>
      <p>A particular scalable and easy to implement sample-based
method for approximate VI is Monte Carlo Dropout (MCD)
[Gal and Ghahramani, 2016]. In this method, dropout
regularization is also applied at test time, so that q (w) is a
Bernoulli distribution. Dropout is only performed in some
of the deeper layers of the DNN to model better high-level
features and to avoid slow training [Mukhoti and Gal, 2018;
Kendall et al., 2015]. Dropout probabilities can be set
manually, or the network can tune dropout rates during training
[Gal et al., 2017].</p>
      <p>All the MCD-related methods listed in Table 1 refer to this
approximation of BNNs. It can be noted from the
performance comparison criteria, that the need to take multiple
forward passes (output samples) for the same input to
approximate the distribution from Equation 2 represents a major
impediment to safety-critical applications with tight time
constraints and limited computation hardware.</p>
      <p>To get a representation of both types of uncertainty
(aleatoric and epistemic), the methods presented in Section
3.1 have been used in combination with MCD. For example,
in a regression configuration, a set of T samples are taken
from the predictions of a DNN that parameterize a
distri</p>
      <p>T
bution in its output: fy^t; ^tgt=1. However, since aleatoric
uncertainty is learned from the data itself (by using the
heteroscedastic loss), this approach could produce wrong
uncertainty estimations in samples that include a higher level of
uncertainty than that observed during training. Another
approach presented in [Loquercio et al., 2020], applies MCD to
take samples from a DNN where the input, output and
activation functions are replaced by probability distributions
according to [Gast and Roth, 2018]. This method permits
uncertainty propagation at the input to the output of the DNN
using ADF (e.g., sensor noise can be propagated to the output
of the DNNs). This is an appealing method for AV
applications where sensor properties are commonly known.
Interestingly, the authors show that this method can be applied to
trained DNNs and is architecture agnostic.
3.3</p>
    </sec>
    <sec id="sec-8">
      <title>Deep Ensembles</title>
      <p>A Deep Ensemble (DE) is another sample-based method, in
which M DNNs are trained to obtain the predictive
distribution p(yjx) [Lakshminarayanan et al., 2017]. Each DNN
learns a set of parameters w that are point estimates,
starting for different random initialization and repeating the
minimization M times. In an ensemble, predictions are
averaged and can be considered as a mixture model that is equally
weighted:
p(yjx) =</p>
      <p>M</p>
      <p>i=1
1 XM p(yjx; w^ i); fw^ (i) M
gi=1
(3)</p>
      <p>For classification, equation (3) corresponds to an
average of the softmax probabilities. For regression, the
outputs that parameterize a probability distribution are averaged
to represent the mean and variance of the mixture. In this
manner, both types of uncertainty (aleatoric and epistemic)
can be easily captured. Although DE is considered a
nonBayesian method, expression (3) represents an
approximation of (2) since fw^ (i)giM=1 can be seen as samples taken
from distribution that approximates the true posterior, by
exploring different modes of from p(wjD) [Fort et al., 2019;
Wilson and Izmailov, 2020].</p>
      <p>As presented in Table 1, the DE method tends to
outperform approximate Bayesian inference methods like MCD, for
both, uncertainty estimates and accuracy [Gustafsson et al.,
2019]. A recent work from [Snoek et al., 2019] also shows,
that DE is more robust to dataset shift. These works suggest
that DE should be considered as the new standard method for
predictive distributions and uncertainty estimation. However,
DE has some drawbacks, especially if the target application
is a safety-critical application. DE requires a higher
computational load and a larger memory footprint, as shown in
Table 1. For the training and testing stage, the number of
parameters, and the inference times scale linearly with M . To
mitigate this problem, [Osband et al., 2016] propose a fused
version of ensembles with multiple heads. All the heads share
the convolutional layers (feature extractors) and each head is
trained using boostrap samples.
3.4</p>
    </sec>
    <sec id="sec-9">
      <title>Mixture Density Networks</title>
      <p>Mixture Density Networks (MDN) [Bishop, 1994], is a
sample-free method for regression tasks, where the aim is to
train a DNN that predicts the parameters of a Gaussian
Mixture Model (GMM) given an input x. A GMM is formed by a
weighted sum of K Gaussians, to model the conditional
distribution:</p>
      <p>K
p(yjx) = X
i=1
i(x)N (yj i(x); i(x))
(4)
where i(x); i(x); i(x) represent the set of parameters of
the GMM as a function of the input x for K mixtures. For
training, Negative Log-likelihood (NLL) is used as loss
function.</p>
      <p>By using the law of total variance, [Choi et al., 2018]
formalized the acquisition of aleatoric and epistemic uncertianty
in MDNs. As a first step, the expectation of the GMM is
obtained as a combination of the mixture components in a
weighted sum: E[yjx] = PiK=1 i(x) i(x). The predicted
variance is composed of the weighted sum of the variances
and the weighted variances of the means:</p>
      <p>K
V[yjx] = X
i=1</p>
      <p>K
i(x) i(x) + X
i
i(x)
i(x)</p>
      <p>K
X i(x) i(x)
i
2
(5)
where the first term represents the aleatoric uncertainty and
the second term represents the epistemic uncertainty. We
refer the reader to [Choi et al., 2018] for more details about
uncertainty acquisition in MDNs.</p>
      <p>As pointed out in Table 1, the sampling-free nature of this
method reduces the computation load, memory footprint, and
permits complex distribution modeling with respect to the
methods described before. These characteristics are attractive
for real-time applications. However, MDNs suffer from
numerical instability for high dimensional problems and mode
collapse when using regularization techniques [Makansi et
al., 2019].
3.5</p>
    </sec>
    <sec id="sec-10">
      <title>Quality Metrics for Uncertainty Estimation</title>
      <p>In this section, we discuss common metrics for evaluating the
quality of uncertainty estimation.</p>
      <p>Classification Metrics. Different methods for uncertainty
estimation exist for classification tasks. Variation Ratio and
information metrics such as Predictive Entropy, Mutual
Information, can be used in classification settings to represent
uncertainty [Gal, 2016]. Variation ratio is a measure of
dispersion; mutual information captures model confidence, and
predictive entropy accounts for epistemic and aleatoric
uncertainty [Mukhoti and Gal, 2018; Michelmore et al., 2018;
Phan et al., 2019]. [Mukhoti and Gal, 2018] propose specific
performance metrics for semantic segmentation to evaluate
Bayesian models. Since there is no ground-truth for
uncertainty estimation, [Snoek et al., 2019; Lakshminarayanan et
al., 2017] argue that proper scoring rules are NLL and Brier
score. NLL depends on predictive uncertainty and is
commonly evaluated in a held-out set, however, it can
overestimate tail probabilities; whereas Brier-score measures the
accuracy of predictive probabilities by a sum of squared
differences between the predicted probability vector and the target,
nonetheless, this score is prone to avoid capturing infrequent
events. Other evaluation metrics independent of score
values are: the Area Under the Receiver Operating
Characteristic (AUROC), Area Under Precision Recall Curve (AUPRC),
and Area Under Risk-Coverage (AURC) [Hendrycks and
Gimpel, 2016; Ding et al., 2019].</p>
      <p>Regression Metrics. Similarly, in regression tasks, NLL
is a proper scoring rule for a likelihood that follows
Gaussian distribution [Lakshminarayanan et al., 2017; Kendall and
Gal, 2017]. Furthermore, [Ilg et al., 2018] introduces a
relative measure for uncertainty estimation, the Area Under the
Sparsification Error (AUSE) curve, that measures the
difference between the dispersion of predictions (affected by
predictive uncertainty), and a oracle in terms of true prediction
error, e.g. Root Mean Squared Error (RMSE) [Gustafsson et
al., 2019].</p>
      <p>Calibration Metrics. For classification tasks, common
quality metrics are Expected Calibration Error (ECE) and
Maximum Calibration Error (MCE) [Guo et al., 2017]. The
former measures the difference between expected accuracy
and expected confidence; the latter identifies the largest
discrepancy between accuracy and confidence, which is of
particular interest in safety-critical applications. For a regression
configuration, [Kuleshov et al., 2018] use calibration error as
a metric that represents the sum of weighted squared
differences between the expected and observed (empirical)
confidence levels; correspondingly in [Gustafsson et al., 2019], the
authors propose to use the Area Under the Calibration Error
curve (AUCE) as an absolute measure of uncertainty. The
before-mentioned authors use reliability diagrams (i.e.
calibration plots) to get a visual representation of model
calibration. Regardless of drawbacks with OOD samples,
calibration plots and measures are used extensively to compare the
predictive quality of other uncertainty estimation methods.
3.6</p>
    </sec>
    <sec id="sec-11">
      <title>Considerations per AV Task Type</title>
      <p>In the context of AVs, for (end-to-end) steering angle
prediction, a broad variety of uncertainty estimation methods
have been applied. In some works only epistemic
uncertainty was captured by using MCD [Michelmore et al., 2018;
Michelmore et al., 2019]. However, usually both types of
uncertainty are captured [Lee et al., 2019b; Lee et al., 2019c;
Lee et al., 2019a] by using the method proposed by [Kendall
and Gal, 2017], or by using DE, boostrap ensembles, or
MDNs. The calibration plots presented in [Hubschneider et
al., 2019] show that MCD has better out-of-the-box
calibration than bootstrap ensembles or MDNs; the last two
methods are overconfident in their predictions. In this
particular task, safety mechanisms have been proposed when
uncertainty estimations surpass a given or learned threshold in
order to improve vehicle safety [Michelmore et al., 2018;
Michelmore et al., 2019; Lee et al., 2019b].</p>
      <p>Under the modular pipeline paradigm for AV control,
probabilistic modeling has mainly been applied to perception
tasks like object detection from 3D Lidar, semantic
segmentation and depth estimation. For 3D object detection from Lidar
point-clouds, [Feng et al., 2018] estimate aleatoric and
epistemic uncertainty using the methods proposed by [Kendall
and Gal, 2017]. However, epistemic uncertainty estimation
with MCD introduces a high computational cost. A later
work from [Feng et al., 2019b] leverages aleatoric
uncertainties to greatly improve the performance and reduce the
computational load from MCD. In [Feng et al., 2019a] the
authors show that predictions for classification and regression
are miscalibrated, and propose methods to fix calibration of
DNNs and produce better uncertainty estimates.</p>
      <p>For semantic segmentation, [Phan et al., 2019; Mukhoti
and Gal, 2018; Gustafsson et al., 2019] model aleatoric
uncertainty from the softmax output, and epistemic uncertainty
by using MCD or ensembles. Common uncertainty metrics
in this case are predictive entropy and mutual information
[Mukhoti and Gal, 2018]. For Depth estimation, [Gustafsson
et al., 2019] compares DE with the heteroscedastic regression
in combination with MCD [Kendall and Gal, 2017]. In both
previous tasks (semantic segmentation and depth estimation)
DE achieves better performance and calibration than MCD
variants [Gustafsson et al., 2019]. However, in DE the
computational cost at training and testing grows linearly with the
number of ensembles. Similarly for traffic sign recognition,
DE exhibit the best-calibrated outputs, but in this case, MCD
in combination with softmax also produces well-calibrated
outputs close to that from DE [Henne et al., 2020].</p>
      <p>For optical flow, [Gast and Roth, 2018] capture aleatoric
uncertainty by replacing the input, output and activation
functions with probability distributions. This method allows
propagating a fixed value of uncertainty at the input to the output
of the DNN. [Ilg et al., 2018] present an alternative approach,
where DE and bootstrap ensembles were used to obtain the
predictive uncertainty.</p>
      <p>For future prediction, [Makansi et al., 2019] propose an
improvement to MDNs to predict the multi-modal
distribution of positions of a vehicle in the future. This method
presents two stages: a sampling and a fitting network. The
former network receives the current position of the vehicle as
an input and outputs a fixed number of hypotheses for future
positions. The latter network fits a mixture distribution to the
hypothesis estimated in the first network. This improvement
helps to avoid mode collapse in MDNs, however, high
dimensional outputs remain challenging for this approach.
4</p>
      <sec id="sec-11-1">
        <title>Conclusions</title>
        <p>We presented a comparative survey for uncertainty
estimation methods for both, classification and regression tasks, in
the AV domain. We also provide a general comparative
analysis of these methods. From this analysis we can see that DE
has become a gold-standard for uncertainty quantification in
many AV tasks thanks to its high-quality uncertainty
predictions and its robustness to OOD samples. However, the high
computational load and large memory footprint, can hinder its
use in safety-critical applications that have hardware
limitations or tight time-constraints. Here, sampling-free methods
are an interesting avenue for future research. New robust (to
OOD) and lightweight approaches should be explored in the
AV domain, to produce good-quality uncertainty estimates.
We also observed that predictions from these methods are
uncalibrated (overconfident or underconfident) and are usually
applied to classification tasks. We encourage the application
of calibration methods also for regression tasks by using the
methods proposed by [Kuleshov et al., 2018] instead of
limiting the assessment of predictions with only reliability
diagrams. We also suggest to study and compare uncertainty
estimation methods under dataset-shift conditions to assess
their robustness. For future work, we plan to incorporate
uncertainty information into the Responsability-Sensitive Safety
model [Shalev-Shwartz et al., 2017]. This generalizes the
approach from [Salay et al., 2020] by considering component
uncertainty from different AV subsystems and propagating it
through them. These subsystems could include DNNs e.g.
for planning and control.</p>
      </sec>
      <sec id="sec-11-2">
        <title>Acknow ledgments</title>
        <p>This work has received funding from the COMP4DRONES
project, under Joint Undertaking (JU) grant agreement
N 826610. The JU receives support from the European
Union’s Horizon 2020 research and innovation programme
and from Spain, Austria, Belgium, Czech Republic, France,
Italy, Latvia, Netherlands.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Ashukha et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Arsenii</given-names>
            <surname>Ashukha</surname>
          </string-name>
          , Alexander Lyzhov, Dmitry Molchanov, and
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Vetrov</surname>
          </string-name>
          .
          <article-title>Pitfalls of indomain uncertainty estimation and ensembling in deep learning</article-title>
          .
          <source>arXiv preprint arXiv:2002.06470</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bansal and Weld</source>
          , 2018]
          <string-name>
            <given-names>Gagan</given-names>
            <surname>Bansal and Daniel S Weld</surname>
          </string-name>
          .
          <article-title>A coverage-based utility model for identifying unknown unknowns</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Bishop</source>
          , 1994]
          <string-name>
            <surname>Christopher</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Bishop.</surname>
          </string-name>
          <article-title>Mixture density networks</article-title>
          .
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Blundell et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Charles</given-names>
            <surname>Blundell</surname>
          </string-name>
          , Julien Cornebise, Koray Kavukcuoglu, and
          <string-name>
            <given-names>Daan</given-names>
            <surname>Wierstra</surname>
          </string-name>
          .
          <article-title>Weight uncertainty in neural network</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          , pages
          <fpage>1613</fpage>
          -
          <lpage>1622</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Choi et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Sungjoon</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kyungjae</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sungbin</given-names>
            <surname>Lim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Songhwai</given-names>
            <surname>Oh</surname>
          </string-name>
          .
          <article-title>Uncertainty-aware learning from demonstration using density networks with sampling-free variance modeling</article-title>
          .
          <source>In 2018 IEEE International Conference on Robotics and Automation (ICRA)</source>
          , pages
          <fpage>6915</fpage>
          -
          <lpage>6922</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Czarnecki and Salay</source>
          , 2018]
          <string-name>
            <given-names>Krzysztof</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rick</given-names>
            <surname>Salay</surname>
          </string-name>
          .
          <article-title>Towards a framework to manage perceptual uncertainty for safe automated driving</article-title>
          . In International Conference on Computer Safety, Reliability, and Security, pages
          <fpage>439</fpage>
          -
          <lpage>445</lpage>
          . Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Ding et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Yukun</given-names>
            <surname>Ding</surname>
          </string-name>
          , Jinglan Liu, Jinjun Xiong, and
          <string-name>
            <given-names>Yiyu</given-names>
            <surname>Shi</surname>
          </string-name>
          .
          <article-title>Evaluation of neural network uncertainty estimation with application to resource-constrained platforms</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <year>02050</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Feng et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Di</given-names>
            <surname>Feng</surname>
          </string-name>
          , Lars Rosenbaum, and
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Dietmayer</surname>
          </string-name>
          .
          <article-title>Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection</article-title>
          .
          <source>In 2018 21st International Conference on Intelligent Transportation Systems (ITSC)</source>
          , pages
          <fpage>3266</fpage>
          -
          <lpage>3273</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Feng et al., 2019a]
          <string-name>
            <given-names>Di</given-names>
            <surname>Feng</surname>
          </string-name>
          , Lars Rosenbaum, Claudius Glaeser, Fabian Timm, and
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Dietmayer</surname>
          </string-name>
          .
          <article-title>Can we trust you? on calibration of a probabilistic object detector for autonomous driving</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .12358,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Feng et al., 2019b]
          <string-name>
            <given-names>Di</given-names>
            <surname>Feng</surname>
          </string-name>
          , Lars Rosenbaum, Fabian Timm, and
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Dietmayer</surname>
          </string-name>
          .
          <article-title>Leveraging heteroscedastic aleatoric uncertainties for robust real-time lidar 3d object detection</article-title>
          .
          <source>In 2019 IEEE Intelligent Vehicles Symposium (IV)</source>
          , pages
          <fpage>1280</fpage>
          -
          <lpage>1287</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Fort et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Stanislav</given-names>
            <surname>Fort</surname>
          </string-name>
          , Huiyi Hu, and
          <string-name>
            <given-names>Balaji</given-names>
            <surname>Lakshminarayanan</surname>
          </string-name>
          .
          <article-title>Deep ensembles: A loss landscape perspective</article-title>
          . arXiv preprint arXiv:
          <year>1912</year>
          .02757,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Gal and Ghahramani</source>
          , 2016]
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zoubin</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          .
          <article-title>Dropout as a bayesian approximation: Representing model uncertainty in deep learning</article-title>
          .
          <source>In international conference on machine learning</source>
          , pages
          <fpage>1050</fpage>
          -
          <lpage>1059</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Gal et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          , Jiri Hron, and
          <string-name>
            <given-names>Alex</given-names>
            <surname>Kendall</surname>
          </string-name>
          .
          <article-title>Concrete dropout</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3581</fpage>
          -
          <lpage>3590</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Gal</source>
          , 2016]
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          .
          <article-title>Uncertainty in deep learning</article-title>
          . University of Cambridge, 1:
          <fpage>3</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Gast and Roth</source>
          , 2018]
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Gast</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Lightweight probabilistic deep networks</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>3369</fpage>
          -
          <lpage>3378</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Guo et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Chuan</given-names>
            <surname>Guo</surname>
          </string-name>
          , Geoff Pleiss,
          <string-name>
            <given-names>Yu</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kilian Q Weinberger</surname>
          </string-name>
          .
          <article-title>On calibration of modern neural networks</article-title>
          .
          <source>In Proceedings of the 34th International Conference on Machine Learning-</source>
          Volume
          <volume>70</volume>
          , pages
          <fpage>1321</fpage>
          -
          <lpage>1330</lpage>
          . JMLR. org,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Gustafsson et al.,
          <year>2019</year>
          ]
          <article-title>Fredrik K Gustafsson, Martin Danelljan</article-title>
          , and Thomas B Schön.
          <article-title>Evaluating scalable bayesian deep learning methods for robust computer vision</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .01620,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Hendrycks and Gimpel</source>
          , 2016]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Gimpel</surname>
          </string-name>
          .
          <article-title>A baseline for detecting misclassified and out-ofdistribution examples in neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1610.02136</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Henne et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Maximilian</given-names>
            <surname>Henne</surname>
          </string-name>
          , Adrian Schwaiger, Karsten Roscher, and
          <string-name>
            <given-names>Gereon</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>Benchmarking uncertainty estimation methods for deep learning with safetyrelated metrics</article-title>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Hubschneider et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Hubschneider</surname>
          </string-name>
          , Robin Hutmacher, and
          <string-name>
            <given-names>J Marius</given-names>
            <surname>Zöllner</surname>
          </string-name>
          .
          <article-title>Calibrating uncertainty models for steering angle estimation</article-title>
          .
          <source>In 2019 IEEE Intelligent Transportation Systems Conference (ITSC)</source>
          , pages
          <fpage>1511</fpage>
          -
          <lpage>1518</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Ilg et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Eddy</given-names>
            <surname>Ilg</surname>
          </string-name>
          , Ozgun Cicek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, and Thomas Brox.
          <article-title>Uncertainty estimates and multi-hypotheses networks for optical flow</article-title>
          .
          <source>In Proceedings of the European Conference on Computer Vision (ECCV)</source>
          , pages
          <fpage>652</fpage>
          -
          <lpage>667</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[ISO</source>
          ,
          <year>2019</year>
          ]
          <string-name>
            <given-names>ISO</given-names>
            <surname>ISO. Pas</surname>
          </string-name>
          21448
          <article-title>-road vehicles-safety of the intended functionality</article-title>
          .
          <source>International Organization for Standardization</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Kendall and Gal</source>
          , 2017]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Kendall</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          .
          <article-title>What uncertainties do we need in bayesian deep learning for computer vision</article-title>
          ? In
          <source>Advances in neural information processing systems</source>
          , pages
          <fpage>5574</fpage>
          -
          <lpage>5584</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Kendall et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Kendall</surname>
          </string-name>
          , Vijay Badrinarayanan, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Cipolla</surname>
          </string-name>
          .
          <article-title>Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding</article-title>
          .
          <source>arXiv preprint arXiv:1511.02680</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Koopman and Fratrik</source>
          , 2019]
          <string-name>
            <given-names>Philip</given-names>
            <surname>Koopman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Fratrik</surname>
          </string-name>
          .
          <article-title>How many operational design domains, objects</article-title>
          , and events?
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Koopman et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Philip</given-names>
            <surname>Koopman</surname>
          </string-name>
          , Beth Osyk, and
          <string-name>
            <given-names>Jack</given-names>
            <surname>Weast</surname>
          </string-name>
          .
          <article-title>Autonomous vehicles meet the physical world: Rss, variability, uncertainty, and proving safety</article-title>
          . In International Conference on Computer Safety, Reliability, and Security, pages
          <fpage>245</fpage>
          -
          <lpage>253</lpage>
          . Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Kuleshov et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Volodymyr</given-names>
            <surname>Kuleshov</surname>
          </string-name>
          , Nathan Fenner, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Ermon</surname>
          </string-name>
          .
          <article-title>Accurate uncertainties for deep learning using calibrated regression</article-title>
          .
          <source>arXiv preprint arXiv:1807.00263</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Kull et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Meelis</given-names>
            <surname>Kull</surname>
          </string-name>
          , Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Flach</surname>
          </string-name>
          .
          <article-title>Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>12295</fpage>
          -
          <lpage>12305</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [Kuutti et al.,
          <year>2020</year>
          ] Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and
          <string-name>
            <given-names>Saber</given-names>
            <surname>Fallah</surname>
          </string-name>
          .
          <article-title>A survey of deep learning applications to autonomous vehicle control</article-title>
          .
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [Lakshminarayanan et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Balaji</given-names>
            <surname>Lakshminarayanan</surname>
          </string-name>
          , Alexander Pritzel, and
          <string-name>
            <given-names>Charles</given-names>
            <surname>Blundell</surname>
          </string-name>
          .
          <article-title>Simple and scalable predictive uncertainty estimation using deep ensembles</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>6402</fpage>
          -
          <lpage>6413</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
            et al., 2019a]
            <given-names>Keuntaek</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            , Gabriel Nakajima An,
            <given-names>Viacheslav</given-names>
          </string-name>
          <string-name>
            <surname>Zakharov</surname>
          </string-name>
          , and
          <article-title>Evangelos A Theodorou. Perceptual attention-based predictive control</article-title>
          .
          <source>arXiv preprint arXiv:1904.11898</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
            et al., 2019b]
            <given-names>Keuntaek</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Kamil</given-names>
          </string-name>
          <string-name>
            <surname>Saigol</surname>
          </string-name>
          , and
          <article-title>Evangelos A Theodorou. Early failure detection of deep endto-end control policy by reinforcement learning</article-title>
          .
          <source>In 2019 International Conference on Robotics and Automation (ICRA)</source>
          , pages
          <fpage>8543</fpage>
          -
          <lpage>8549</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
            et al., 2019c]
            <given-names>Keuntaek</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Ziyi</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , Bogdan Vlahov, Harleen Brar, and
          <article-title>Evangelos A Theodorou. Ensemble bayesian decision making with redundant deep perceptual control policies</article-title>
          .
          <source>In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)</source>
          , pages
          <fpage>831</fpage>
          -
          <lpage>837</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [Loquercio et al.,
          <year>2020</year>
          ] Antonio Loquercio, Mattia Segu, and
          <string-name>
            <given-names>Davide</given-names>
            <surname>Scaramuzza</surname>
          </string-name>
          .
          <article-title>A general framework for uncertainty estimation in deep learning</article-title>
          .
          <source>IEEE Robotics and Automation Letters</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>3153</fpage>
          -
          <lpage>3160</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [Makansi et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Osama</given-names>
            <surname>Makansi</surname>
          </string-name>
          , Eddy Ilg, Ozgun Cicek, and Thomas Brox.
          <article-title>Overcoming limitations of mixture density networks: A sampling and fitting framework for multimodal future prediction</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>7144</fpage>
          -
          <lpage>7153</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>[McAllister</surname>
          </string-name>
          et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Rowan</given-names>
            <surname>McAllister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          , Alex Kendall, Mark Van Der Wilk, Amar Shah, Roberto Cipolla, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Weller</surname>
          </string-name>
          .
          <article-title>Concrete problems for autonomous vehicle safety: Advantages of bayesian deep learning</article-title>
          .
          <source>International Joint Conferences on Artificial Intelligence</source>
          , Inc.,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [Michelmore et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Rhiannon</given-names>
            <surname>Michelmore</surname>
          </string-name>
          , Marta Kwiatkowska, and
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          .
          <article-title>Evaluating uncertainty quantification in end-to-end autonomous driving control</article-title>
          .
          <source>arXiv preprint arXiv:1811.06817</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [Michelmore et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Rhiannon</given-names>
            <surname>Michelmore</surname>
          </string-name>
          , Matthew Wicker, Luca Laurenti, Luca Cardelli, Yarin Gal, and
          <string-name>
            <given-names>Marta</given-names>
            <surname>Kwiatkowska</surname>
          </string-name>
          .
          <article-title>Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control</article-title>
          .
          <source>arXiv preprint arXiv:1909.09884</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [Mohseni et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Sina</given-names>
            <surname>Mohseni</surname>
          </string-name>
          , Mandar Pitale,
          <string-name>
            <given-names>Vasu</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Zhangyang</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Practical solutions for machine learning safety in autonomous vehicles</article-title>
          . arXiv preprint arXiv:
          <year>1912</year>
          .09630,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <source>[Mukhoti and Gal</source>
          , 2018]
          <string-name>
            <given-names>Jishnu</given-names>
            <surname>Mukhoti</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          .
          <article-title>Evaluating bayesian deep learning methods for semantic segmentation</article-title>
          .
          <source>arXiv preprint arXiv:1811.12709</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [Osband et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Osband</surname>
          </string-name>
          , Charles Blundell, Alexander Pritzel, and Benjamin Van Roy.
          <article-title>Deep exploration via bootstrapped dqn</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>4026</fpage>
          -
          <lpage>4034</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [Phan et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Buu</given-names>
            <surname>Phan</surname>
          </string-name>
          , Samin Khan, Rick Salay, and
          <string-name>
            <given-names>Krzysztof</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          .
          <article-title>Bayesian uncertainty quantification with synthetic data</article-title>
          .
          <source>In International Conference on Computer Safety</source>
          , Reliability, and Security, pages
          <fpage>378</fpage>
          -
          <lpage>390</lpage>
          . Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [
          <string-name>
            <surname>Quionero-Candela</surname>
          </string-name>
          et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Joaquin</given-names>
            <surname>Quionero-Candela</surname>
          </string-name>
          , Masashi Sugiyama, Anton Schwaighofer, and
          <string-name>
            <given-names>Neil D</given-names>
            <surname>Lawrence</surname>
          </string-name>
          .
          <article-title>Dataset shift in machine learning</article-title>
          . The MIT Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [Salay et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Rick</given-names>
            <surname>Salay</surname>
          </string-name>
          , Krzysztof Czarnecki, Maria Soledad Elli, Ignacio J Alvarez, Sean Sedwards, and
          <string-name>
            <given-names>Jack</given-names>
            <surname>Weast</surname>
          </string-name>
          . Purss:
          <article-title>Towards perceptual uncertainty aware responsibility sensitive safety with ml</article-title>
          .
          <source>In SafeAI@ AAAI</source>
          , pages
          <fpage>91</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [
          <string-name>
            <surname>Shalev-Shwartz</surname>
          </string-name>
          et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Shai</given-names>
            <surname>Shalev-Shwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shaked</given-names>
            <surname>Shammah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Amnon</given-names>
            <surname>Shashua</surname>
          </string-name>
          .
          <article-title>On a formal model of safe and scalable self-driving cars</article-title>
          .
          <source>arXiv preprint arXiv:1708.06374</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [Snoek et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Jasper</given-names>
            <surname>Snoek</surname>
          </string-name>
          , Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Nowozin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Sculley</surname>
          </string-name>
          , Joshua Dillon, Jie Ren, and
          <string-name>
            <given-names>Zachary</given-names>
            <surname>Nado</surname>
          </string-name>
          .
          <article-title>Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>13969</fpage>
          -
          <lpage>13980</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <source>[Wilson and Izmailov</source>
          , 2020]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Gordon</surname>
          </string-name>
          Wilson and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Izmailov</surname>
          </string-name>
          .
          <article-title>Bayesian deep learning and a probabilistic perspective of generalization</article-title>
          . arXiv preprint arXiv:
          <year>2002</year>
          .08791,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>