Out-of-Distribution Detection Using Deep Neural Network
Latent Space Uncertainty
Fabio Arnez1,* , Ansgar Radermacher1 and François Terrier1
1
    Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France


                                          Abstract
                                          As automated systems increasingly incorporate deep neural networks (DNNs) to perform safety-critical tasks, confidence
                                          representation and uncertainty estimation in DNN predictions have become useful and essential to represent DNN ignorance.
                                          Predictive uncertainty has often been used to identify samples that can lead to wrong predictions with high confidence, i.e.,
                                          Out-of-Distribution (OoD) detection. However, predictive uncertainty estimation at the output of a DNN might fail for OoD
                                          detection in computer vision tasks such as semantic segmentation due to the lack of information about semantic structures
                                          and contexts. We propose using the DNN uncertainty from intermediate latent representations to overcome this problem.
                                          Our experiments show promising results in OoD detection for the semantic segmentation task.

                                          Keywords
                                          Uncertainty Estimation Latent Space Out-of-Distribution Detection Semantic Segmentation Automated Vehicle


1. Introduction                                                                                        bles, Monte-Carlo Dropout, etc.) offer a principled ap-
                                                                                                       proach to model and quantify uncertainties in DNNs.
In the last decade, Deep Neural Networks (DNNs) have However, quantifying uncertainty is challenging since
witnessed great advances in real-world applications like we do not have access to ground-truth uncertainty esti-
Autonomous Vehicles (AVs) to perform complex tasks mates, i.e., we do not have a clear definition of what a
such as object detection and tracking or vehicle control. good uncertainty estimate is. Moreover, computer vision
Despite the progress introduced by DNNs in the previous tasks can add an extra level of complexity since tasks
decade, they still have significant safety shortcomings such as semantic segmentation require a pixel-level un-
due to their complexity, opacity and lack of interpretabil- derstanding of an image. In this case, a Bayesian Deep
ity. Moreover, it is well-known that DNN models behave Learning model for semantic segmentation will classify
unpredictably under dataset shift [1]. Deep Learning each pixel in the input image and generate an uncertainty
(DL) models have training and data bias that directly im- estimate for each classified pixel.
pact model predictions and performance. This impedes                                                      In semantic segmentation, uncertainty estimation has
ensuring the reliability of the DNN models, which is a been used for Out-of-Distribution (OoD) detection under
precondition for safety-critical systems to ensure compli- the assumption that samples that are far away from the
ance with industry safety standards to avoid jeopardizing training distribution (anomalous or OoD samples) pro-
human lives [2].                                                                                       vide higher predictive uncertainty than samples that are
   As highly automated systems (e.g., autonomous vehi- observed in the training data [3]. Approaches that use
cles or autonomous mobile robots) increasingly rely on BNNs are able to capture aleatoric and epistemic uncer-
DNNs to perform safety-critical tasks, different methods tainties in the form of uncertainty maps (Figure 1-top) but
have been proposed to represent confidence in the DNN still fail to detect anomalies accurately. BNN methods for
predictions. One way to represent DNN confidence is to semantic segmentation are prone to yield false-positive
capture the uncertainty associated with a prediction for a predictions, as well as miss-matches between anomaly
given input sample. Capturing information about “what instances and uncertain areas caused by the lack of in-
the model does not know” is not only useful but essential formation on semantic structures and contexts [4, 5], as
in safety-critical tasks.                                                                              presented in Figure 1-middle.
   Bayesian Neural Networks (BNNs) and existing                                                           Recently, embedding density estimation methods have
Bayesian approximate inference methods (Deep Ensem- been proposed to estimate the connection to uncertain-
                                                                                                       ties from Bayesian methods [6, 3]. In this direction,
The 37th AAAI Conference on Artificial Intelligence: SafeAI 2023 work- methods that leverage metrics or statistics from the non-
shop, February 07–14, 2023, Washington, DC, USA                                                        parametric embedding space density have been proposed
*
  Corresponding author.                                                                                recently [7, 8], in contrast to a distance-based method that
$ fabio.arnez@cea.fr (F. Arnez); ansgar.radermacher@cea.fr
                                                                                                       often assumes a parametric embedding density [9, 10, 11].
(A. Radermacher); francois.terrier@cea.fr (F. Terrier)
 0000-0003-0367-3035 (F. Arnez)                                                                          The present work combines the benefits from Bayesian
          © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License methods for uncertainty estimation with methods for la-
          Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                                                                Net encodes each input image 𝑥𝑖 and estimates the prob-
                                                                ability of these segmentation variants (𝜇𝑝𝑟𝑖𝑜𝑟 , 𝜎𝑝𝑟𝑖𝑜𝑟
                                                                                                                     2
                                                                                                                         ).
                                                                To predict a set of segmentation outputs, a set of samples
                                                                are drawn from the Prior Net probability distribution.
                                                                Interestingly, we can draw a connection from this ap-
                                                                proach to other related work that aims to model complex
                                                                aleatoric uncertainty (ambiguity, multi-modality) by han-
                                                                dling stochastic input variables [15, 16, 17].


                                                                3. Methods
                                                                3.1. Capturing Uncertainty from
                                                                     Intermediate Latent Representations
                                                                Despite the benefits introduced by injecting random sam-
Figure 1: Semantic segmentation uncertainty estimation com-     ples from the latent space into U-Net, aleatoric uncer-
parison for in-distribution and out-of-distribution data        tainty alone is not enough. For the Out-of-Distribution
                                                                detection task, epistemic uncertainty is needed [18, 19].
                                                                Although the Prior Net encoder 𝑞𝑝𝑟𝑖𝑜𝑟 employs Bayesian
tent representation density estimation in the OoD detec-        inference to obtain latent vectors z, it does not capture
tion task. We propose to capture the entropy of interme-        epistemic uncertainty since the encoder lacks a distribu-
diate (latent) representations and estimate the entropy         tion over parameters 𝜑. To overcome this problem, we
densities for In-Distribution (InD) and OoD samples (see        took inspiration from Daxberger and Hernández-Lobato
Figure 1-bottom). Once entropy densities are estimated,         [20], Jesson et al. [21], and propose to capture uncer-
we use them to classify new input samples as InD or OoD,        tainty in the Probabilistic U-Net Prior Net encoder us-
i.e., we build a data-driven monitoring function data that      ing 𝑀 Monte Carlo Dropout (MCD) samples [22], i.e.,
utilizes the input sample entropy for the OoD detection         𝑞𝑝𝑟𝑖𝑜𝑟 (𝑧 | 𝑥, 𝜑𝑚 ).
task.
                                                                                          ∫︁
                                                                      𝑞Φ (z | x, 𝒟𝑝 ) =        𝑞(z | x, 𝜑)𝑝(𝜑 | 𝒟𝑝 )𝑑𝜑   (1)
2. Semantic Segmentation with                                                              𝜑

   Probabilistic U-Net Architecture                                In eq. 1, we adapt the Prior Net encoder to capture the
                                                                posterior 𝑞(z | x, 𝒟) using a set Φ = {𝜑𝑚 }𝑀   𝑚 of encoder
Probabilistic U-Net [12], is a DNN architecture for seman-      parameters samples 𝜑𝑚 ∼ 𝑝(𝜑 | 𝒟𝑝 ) that are obtained
tic segmentation that combines the U-Net architecture           applying MCD at test-time. During execution time, we
[13] with the conditional variational autoencoder (CVAE)        forward-pass an input image 𝑥𝑖 multiple times into the
framework [14]. The goal of Probabilistic U-Net is to han-      𝑞𝑝𝑟𝑖𝑜𝑟 net. Each time we forward-pass the input image,
dle input image ambiguities by leveraging the stochastic        we will generate a new dropout mask that in consequence
nature of the CVAE latent space. Figure 2 shows the             will make a new (𝜇𝑝𝑟𝑖𝑜𝑟 , 𝜎𝑝𝑟𝑖𝑜𝑟
                                                                                              2
                                                                                                    ) prediction. From each
Probabilistic U-Net architecture.                               predicted (𝜇𝑝𝑟𝑖𝑜𝑟 , 𝜎𝑝𝑟𝑖𝑜𝑟 ) for the same image we sample
                                                                                      2

   During training, depicted in Figure 2a, Probabilistic        a new latent vector z, as presented in Figure 3.
U-Net finds a useful embedding of the segmentation vari-           MCD has been applied extensively for simple epistemic
ants in the latent space by introducing a Posterior Net.        uncertainty estimation. However, dropout was found to
This network learns to recognize a segmentation variant         be ineffective on convolutional neural networks (CNNs).
and to map it into a noisy position in the latent space         Standard dropout is ineffective in removing semantic
           2
(𝜇𝑝𝑜𝑠𝑡 , 𝜎𝑝𝑜𝑠𝑡 ). In addition, KL divergence is used to pe-     information from CNN feature maps because nearby acti-
nalize differences between the distributions at the output      vations contain closely related information. On the other
of prior and posterior nets. The idea here is to bring both     hand, dropping continuous regions in 2D feature maps
distributions as close as possible so that the Prior Net dis-   can help remove semantic information and enforce re-
tribution covers the spaces of all presented segmentation       maining units to learn features for the assigned task [23].
variants.                                                       This effect is also desired for capturing uncertainties, oth-
   In general, the central component of this architecture       erwise, we could get overconfident uncertainty estimates
is its latent space. Each value from the latent space en-       in the presence of samples that contain anomalies. To
codes a segmentation variant. During inference, the Prior       overcome the standard dropout limitation, we followed
Figure 2: Probabilistic U-Net [12], with Bayesian Prior Net for Semantic Segmentation: a. During training b. During inference
with the monitoring function ℳ𝑂𝑂𝐷 at the output of the Prior Net.


                                                                3.2. Bayesian Generative Classifier for
                                                                     OoD Detection
                                                               For OoD detection, we assume that we have access to
                                                               a dataset of normal (InD) and anomaly (OoD) samples
                                                               𝑌 = {normal, anomaly}, with which we can train a
                                                               Bayesian generative classifier (Not so naive Bayes Classi-
                                                               fier) using the empirical density of a metric or statistic
Figure 3: Prior Net latent vector z predictions with Monte
Carlo DropBlock2D. The latent space at the output of the Prior
                                                               𝑇 from latent representations z, i.e., 𝑇 (z). To this end,
Net is presented in 2D for illustration purposes.              we follow Morningstar et al. [7] approach and use a Ker-
                                                               nel Density Estimation (KDE) method to obtain the 𝑇 (z)
                                                               densities. Since we aim at leveraging the uncertainty
                                                               from intermediate latent representations, the 𝑇 statistic
the approach from Deepshikha et al. [24], and used Drop- is the entropy at the output of the Prior Net (described in
Block2D to capture uncertainty from the Probabilistic the previous section) with which we build the monitoring
U-Net. We applied MC DropBlock2D in the last feature function ℳ𝑂𝑂𝐷 , as presented in Figure 2b.
map from the Prior Net, as shown in Figure 2 and Figure 3         For each label set, we fit a KDE to obtain a generative
(in red).                                                      model of the data, i.e., use KDE to compute the likelihood
   The average surprise or uncertainty of a random vari- 𝑝(𝑇 (z) | 𝑦). Then, we compute the class label prior
able 𝑧 is defined by its probability distribution 𝑝(𝑧), and probability 𝑝(𝑌 ), i.e., compute the marginal categorical
it is called the entropy of 𝑧, i.e., H(𝑧). For continuous distribution by counting frequencies (from the number
random variables, we use the differential entropy, as pre- of samples of each class in the complete training set). For
sented in Eq. 2,                                               an unknown latent vector, we can compute the posterior
                         ∫︁
                                         1                     probability of each class 𝑝(𝑦 | 𝑇 (z)), using Baye’s rule
               H(𝑧) =         𝑝(𝑧) log      𝑑𝑧             (2) in Eq. 4. For the OoD task, we use Eq. 5
                            𝑧          𝑝(𝑧)
   To quantify uncertainty from Prior Net MCD samples,                                       𝑃 (𝑇 (z) | 𝑦)𝑝(𝑦)
                                                                           𝑝(𝑦 | 𝑇 (z)) =                              (4)
we used standard entropy estimators [25] on 32 Monte                                             𝑝(𝑇 (z))
Carlo samples (32 image forward passes through Prior
Net with MC DropBlock2D turned on). In Eq. 3, the                                            𝑝(𝑇 (z) | 𝑦)𝑝(𝑦)
                                                                        𝑝(𝑦 | 𝑇 (z)) = ∑︀                              (5)
entropy ĤΦ (𝑧 | 𝑥) measures the average surprise of                                        𝑦∈𝑌 𝑝(𝑇 (z) | 𝑦)𝑝(𝑦)
observing latent vector 𝑧 at the output of Prior Net, given       For a more details description of the approach for
an input image 𝑥.                                              Bayesian generative classification we refer the reader
                      ∫︁                                       to the works from VanderPlas [26] and Postels et al. [3].
                                            1
          H(𝑧 | 𝑥) =       𝑝(𝑧 | 𝑥) log           𝑑𝑧       (3)
                         𝑧               𝑝(𝑧 | 𝑥)
Figure 4: Dataset for training the OoD monitoring function


4. Early Experiments and Results
Dataset Building. For training the DNN model for
semantic segmentation we used the Valeo Woodscape
dataset 1 [27] with the semantic segmentation labels.          Figure 5: Illustration of empirical densities with KDE: Ma-
For training the monitoring function (i.e., Bayesian gen-      halanobis distance 𝑑𝑀 (top-left), the multivariate Gaussian
                                                                        ^ 𝜑 (z | 𝑥) (top-right), and the entropy from latent
erative classifier), our first choice was to use Soiling       entropy 𝐻
Woodscape sub-dataset. However, after inspecting the           each vector variable 𝐻^ 𝜑 (𝑧𝑖 | 𝑥).
dataset, we noticed that samples were taken in small se-
quences. To improve dataset diversity and implement our
approach, we decided to create a new smaller sub-dataset       vals that denote under-confident (uncertainty high) and
by taking just one or two samples from the sampling            overconfident (uncertainty very low) predictions. In the
sequences for each anomaly in soiling Woodscape. We            latter case, the entropy from latent vector variables, we
called this new dataset OoD Woodscape, and it combines         observe that some variables exhibit multimodal density
samples from the Woodscape training set (normal class)         predictions for OoD samples and density peaks in differ-
and samples from the Soiling Woodscape validation set          ent entropy value intervals from those obtained with InD
(anomaly class). The ooD-Woodscape training set has            samples. Finally, the 𝑑𝑀 density shows slight peaks or
280 samples, 140 samples for each class; the validation        modes for OoD samples, however, densities for InD and
set has 120 samples total, 60 samples for each class. The      OoD have a high degree of overlap.
dataset-building procedure is depicted in Figure 4.                Metrics. To evaluate our monitoring function, we
   Experiments. We quantify the entropy from inter-            used the validation set from OoD-Woodscape (the dataset
mediate latent vectors. Using the entropy values, we           we designed and built). We report the results using the
estimate the entropy density for each sub-dataset, i.e.,       following metrics, as suggested by Ferreira et al. [28] and
samples from normal and anomaly sub-datasets. First,           Blum et al. [6]. In this regard, we report the Matthews
we quantify the entropy assuming a multivariate Gaus-          correlation coefficient (MCC), the F1-score, the area un-
sian distribution 𝐻 ˆ 𝜑 (z | 𝑥), as presented in Figure 5
                                                               der the Receiver Operating Characteristic (AUROC), and
top-right. Next, we compute the entropy estimation for         the False-Positive Rate at 90% True Positive Rate (FPR90)
each variable in the latent vector 𝐻
                                   ˆ 𝜑 (𝑧𝑖 | 𝑥), as shown in
                                                               values. Table 1 summarizes the results used for each
Figure 5-bottom. Finally, for comparison, we also use the      statistic or feature employed in our classifier (monitoring
Mahalanobis distance which is a multivariate measure of        function), and Figure 6, shows the ROC curve.
the distance between a point and distribution. In this last        Results & Discussion. We present the results of our
case, we built the reference distribution taking intermedi-    monitoring function (classifier) in Table 1 and in Figure 6.
ate representations zi for each input image 𝑥𝑖 , from the      In the results, we can see that the latent vector entropy-
Woodscape validation set (see Figure 5 top-left). Then,        based methods outperform the Mahalanobis distance-
we measure the √︁ distance to this reference distribution      based 𝑑𝑀 method in almost all the performance metrics.
using 𝑑𝑀 = (z* − 𝜇zval )𝑇 Σ−1       zval (z − 𝜇zval ), for a
                                           *
                                                               We believe that the reason behind the poor performance
new input image x and its predicted latent vector z* .
                     *
                                                               of the 𝑑𝑀 method is the strong assumption on the embed-
   For entropy, in both cases, we observe that the densi-      ding space being class conditional Gaussian we building
ties for InD and OoD samples are different. In the first       the reference distributions to compute the distance. On
case, the estimated latent vector density shows clear mul-     the hand, we can see that latent vector variable entropy
timodality for OoD samples, with peaks in entropy inter-       has the best results. The reason behind the performance
                                                               is that the classifier benefits from getting more expressive
1
    https://woodscape.valeo.com/download                       (entropy) information at the latent variable level.
     Method       MCC       F1      AUROC      FPR90               ference on Robotics and Automation (ICRA), IEEE,
       𝑑𝑀         0.473    0.763     0.769        0.5              2019, pp. 2083–2089.
   ^ 𝜑 (z | 𝑥)
   𝐻              0.572    0.797     0.855        0.4          [2] F. Arnez, H. Espinoza, A. Radermacher, F. Terrier, A
   ^ 𝜑 (𝑧𝑖 | 𝑥)
   𝐻              0.685    0.849     0.946       0.16              comparison of uncertainty estimation approaches
                                                                   in deep learning components for autonomous vehi-
Table 1                                                            cle applications, Proceedings of the Workshop on
Evaluation of OoD detection methods using DNN latent rep-          Artificial Intelligence Safety 2020 (2020).
resentations                                                   [3] J. Postels, H. Blum, Y. Strümpler, C. Cadena, R. Sieg-
                                                                   wart, L. Van Gool, F. Tombari, The hidden un-
                                                                   certainty in a neural networks activations, arXiv
                                                                   preprint arXiv:2012.03082 (2020).
                                                               [4] G. Di Biase, H. Blum, R. Siegwart, C. Cadena, Pixel-
                                                                   wise anomaly detection in complex driving scenes,
                                                                   in: Proceedings of the IEEE/CVF conference on
                                                                   computer vision and pattern recognition, 2021, pp.
                                                                   16918–16927.
                                                               [5] Y. Xia, Y. Zhang, F. Liu, W. Shen, A. L. Yuille, Synthe-
                                                                   size then compare: Detecting failures and anoma-
                                                                   lies for semantic segmentation, in: European Con-
                                                                   ference on Computer Vision, Springer, 2020, pp.
                                                                   145–161.
                                                               [6] H. Blum, P.-E. Sarlin, J. Nieto, R. Siegwart, C. Ca-
                                                                   dena, The fishyscapes benchmark: measuring blind
Figure 6: OoD detector ROC Curve analysis                          spots in semantic segmentation, International Jour-
                                                                   nal of Computer Vision 129 (2021) 3119–3135.
                                                               [7] W. Morningstar, C. Ham, A. Gallagher, B. Lakshmi-
                                                                   narayanan, A. Alemi, J. Dillon, Density of states
5. Conclusion                                                      estimation for out of distribution detection, in: In-
In this work, we presented a method to use the uncer-              ternational Conference on Artificial Intelligence
tainty from intermediate latent representations for Out-           and Statistics, PMLR, 2021, pp. 3232–3240.
of-distribution detection in a semantic segmentation task.     [8] Y. Sun, Y. Ming, X. Zhu, Y. Li, Out-of-distribution de-
Our early results show that using the entropy from latent          tection with deep nearest neighbors, arXiv preprint
features can be useful in building data-driven monitoring          arXiv:2204.06507 (2022).
functions. In future work, we aim to explore the impact        [9] K. Lee, K. Lee, H. Lee, J. Shin, A simple unified
of the structure in the latent space by relaxing the Gaus-         framework for detecting out-of-distribution sam-
sian assumption [29] and its effect on the metrics and             ples and adversarial attacks, Advances in neural
statistics used for the OoD detection task. Moreover, it is        information processing systems 31 (2018).
important to analyze the applicability of our approach in     [10] J. Nitsch, M. Itkina, R. Senanayake, J. Nieto,
other semantic segmentation architectures that do not              M. Schmidt, R. Siegwart, M. J. Kochenderfer, C. Ca-
present generative blocks of neural networks.                      dena, Out-of-distribution detection for automotive
                                                                   perception, in: 2021 IEEE International Intelligent
                                                                   Transportation Systems Conference (ITSC), IEEE,
Acknowledgement                                                    2021, pp. 2938–2943.
                                                              [11] C.-L. Li, K. Sohn, J. Yoon, T. Pfister, Cutpaste: Self-
This work has been supported by the French government              supervised learning for anomaly detection and lo-
under the “France 2030” program as part of the SystemX             calization, in: Proceedings of the IEEE/CVF Confer-
Technological Research Institute within the Confiance.ai           ence on Computer Vision and Pattern Recognition,
Program (www.confiance.ai).                                        2021, pp. 9664–9674.
                                                              [12] S. A. Kohl, B. Romera-Paredes, C. Meyer, J. De Fauw,
                                                                   J. R. Ledsam, K. H. Maier-Hein, S. Eslami, D. J.
References                                                         Rezende, O. Ronneberger, A probabilistic u-net for
 [1] R. McAllister, G. Kahn, J. Clune, S. Levine, Robust-          segmentation of ambiguous images, arXiv preprint
     ness to out-of-distribution inputs via task-aware             arXiv:1806.05034 (2018).
     generative uncertainty, in: 2019 International Con-      [13] O. Ronneberger, P. Fischer, T. Brox, U-net: Con-
                                                                   volutional networks for biomedical image segmen-
     tation, in: International Conference on Medical               Informatsii 23 (1987) 9–16.
     image computing and computer-assisted interven-          [26] J. VanderPlas, Python data science handbook: Es-
     tion, Springer, 2015, pp. 234–241.                            sential tools for working with data, " O’Reilly Media,
[14] K. Sohn, H. Lee, X. Yan, Learning structured output           Inc.", 2016.
     representation using deep conditional generative         [27] S. Yogamani, C. Hughes, J. Horgan, G. Sistu,
     models, Advances in neural information processing             P. Varley, D. O’Dea, M. Uricar, S. Milz, M. Simon,
     systems 28 (2015) 3483–3491.                                  K. Amende, C. Witt, H. Rashed, S. Chennupati,
[15] S. Depeweg, J.-M. Hernandez-Lobato, F. Doshi-                 S. Nayak, S. Mansoor, X. Perrotton, P. Perez, Wood-
     Velez, S. Udluft, Decomposition of uncertainty                scape: A multi-task, multi-camera fisheye dataset
     in bayesian deep learning for efficient and risk-             for autonomous driving, in: Proceedings of the
     sensitive learning, in: International Conference              IEEE/CVF International Conference on Computer
     on Machine Learning, PMLR, 2018, pp. 1184–1193.               Vision (ICCV), 2019.
[16] M. Henaff, Y. LeCun, A. Canziani, Model-predictive       [28] R. S. Ferreira, J. Arlat, J. Guiochet, H. Waeselynck,
     policy learning with uncertainty regularization for           Benchmarking safety monitors for image classifiers
     driving in dense traffic, in: 7th International Con-          with machine learning, in: 2021 IEEE 26th Pacific
     ference on Learning Representations, ICLR 2019,               Rim International Symposium on Dependable Com-
     2019.                                                         puting (PRDC), IEEE, 2021, pp. 7–16.
[17] F. Arnez, H. Espinoza, A. Radermacher, F. Terrier,       [29] P. Ghosh, M. S. Sajjadi, A. Vergari, M. Black,
     Improving robustness of deep neural networks for              B. Scholkopf, From variational to deterministic au-
     aerial navigation by incorporating input uncer-               toencoders, in: International Conference on Learn-
     tainty, in: Computer Safety, Reliability, and Se-             ing Representations, 2019.
     curity. SAFECOMP 2021 Workshops, Springer In-
     ternational Publishing, Cham, 2021, pp. 219–225.
[18] A. Kendall, Y. Gal, What uncertainties do we need
     in bayesian deep learning for computer vision?, in:
     Advances in neural information processing systems,
     2017, pp. 5574–5584.
[19] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Scul-
     ley, S. Nowozin, J. Dillon, B. Lakshminarayanan,
     J. Snoek, Can you trust your model’s uncertainty?
     evaluating predictive uncertainty under dataset
     shift, Advances in Neural Information Processing
     Systems 32 (2019) 13991–14002.
[20] E. Daxberger, J. M. Hernández-Lobato, Bayesian
     variational autoencoders for unsupervised
     out-of-distribution detection,        arXiv preprint
     arXiv:1912.05651 (2019).
[21] A. Jesson, S. Mindermann, U. Shalit, Y. Gal, Identify-
     ing causal-effect inference failure with uncertainty-
     aware models, Advances in Neural Information
     Processing Systems 33 (2020) 11637–11649.
[22] Y. Gal, Z. Ghahramani, Dropout as a bayesian ap-
     proximation: Representing model uncertainty in
     deep learning, in: international conference on ma-
     chine learning, 2016, pp. 1050–1059.
[23] G. Ghiasi, T.-Y. Lin, Q. V. Le, Dropblock: A regu-
     larization method for convolutional networks, Ad-
     vances in Neural Information Processing Systems
     31 (2018) 10727–10737.
[24] K. Deepshikha, S. H. Yelleni, P. Srijith, C. K. Mohan,
     Monte carlo dropblock for modelling uncertainty in
     object detection, arXiv preprint arXiv:2108.03614
     (2021).
[25] L. Kozachenko, N. N. Leonenko, Sample estimate of
     the entropy of a random vector, Problemy Peredachi