An Evaluation of “Crash Prediction Networks” (CPN) for Autonomous Driving
                        Scenarios in CARLA Simulator
                             Saasha Nair,1, 2 Sina Shafaei, 3 Daniel Auge, 3 Alois Knoll 3
                                   1
                                     TeCIP Institute, Scuola Superiore Sant’Anna, Pisa, Italy
                      2
                       Department of Excellence in Robotics & AI, Scuola Superiore Sant’Anna, Pisa, Italy
                                         3
                                           Technical University of Munich, Germany
                    saasha.nair@santannapisa.it, sina.shafaei@tum.de, daniel.auge@tum.de, knoll@in.tum.de

                            Abstract                                   and simulated components for testing the functionality of
                                                                       parts of the system. The software, hardware and the vehi-
  Safeguarding the trajectory planning algorithms of au-
  tonomous vehicles is crucial for safe operation in mixed traf-       cle itself can be tested while simulating the residual parts.
  fic scenarios. This paper proposes the use of an ensemble          3. Testing in real traffic: allows to study how the au-
  of neural networks to work together under the moniker of
                                                                        tonomous vehicle reacts to real-world driving scenarios
  “Crash Prediction Networks”. The system comprises multi-
  ple independent networks, each focusing on a different sub-           on public roads with other participating actors.
  set of sensory inputs. The aim is for the networks to work            Safety concerns are still seen to be a major impediment in
  together in unison to reach a consensus of whether a vehicle
                                                                     the wide-scale adoption of autonomous vehicles (Rudolph,
  might enter a catastrophic state to trigger an appropriate in-
  tervention. The proposed approach would act as an additional       Voget, and Mottok 2018; Kalra and Paddock 2016; McAl-
  layer of safety by supervising the decision making module of       lister et al. 2017; Koopman and Wagner 2016). Safety moni-
  an autonomous vehicle.                                             tors can potentially help to alleviate the safety concerns sur-
  Though the proposed approach encompasses all the sensors           rounding autonomous vehicles, by observing the inputs to
  and allied paraphernalia, the scope of this paper is exclusively   and outputs from the various modules contained in the driv-
  limited to exploring safety monitors for visual sensors. The       ing pipeline. These monitors usually differentiate between
  approach can, however, be extrapolated to other sensors. The       three main states that the system of interest, in this case,
  evaluation was conducted using the CARLA simulator for             the car, may end up in (Machin et al. 2016). The first be-
  simple driving scenarios studying the benefits of modeling         ing the catastrophic state, that is one where the damage has
  temporal features to capture the motion in the environment.        already been done and from which the system cannot be re-
  Additionally, the paper studies the importance of ‘accounting      covered. The remaining two states, namely the safe state and
  for uncertainty’ in models dealing with vehicle safety.
                                                                     the warning state, are non-catastrophic. In the safe state the
                                                                     system behaves as expected, without any constraints. The
                          Introduction                               warning state is where the system is close to being involved
Safety, as defined by Avizenis (Avizienis et al. 2004), is the       in a catastrophe but can still be salvaged. For a system to
“absence of catastrophic consequences on the user(s) and the         transition from a safe to a catastrophic state, it must always
environment”. Therefore, safety engineering (Törngren et al.         pass through the warning state, thus the warning state can
2018) relates to “methods used to assess, eliminate and/or           be thought of as the margin, where applying the correct in-
reduce risk to acceptable level”. Safety in vehicles requires        tervention brings the system back to the safe state. Thus,
implementing vehicle-level behaviours (Koopman and Wag-              it requires defining a safety strategy that identifies potential
ner 2017) that ensures safe and reliable performance across          warning states and specifies corresponding safety rules to
different contexts (McAllister et al. 2017). This could entail       guide the response of the autonomous system.
safety by construction, analysis, and verification/validation.          The problem with existing approaches to safety
Currently, however, testing is the main method used for en-          monitoring, such as “Safety Monitoring Framework
suring safety in automated vehicles, and includes the follow-        (SMOF)” (Machin et al. 2016), is that they require hand-
ing types (Huang et al. 2016): -                                     picked and hard-coded limiters defined at the design stage
1. Software testing: helps to ensure the correctness of the          describing what counts as safe or unsafe intervals of input
   program at the source code level via unit tests and testing       and output. However, in neural networks, rather than the
   tools.                                                            behaviour being hard-coded into the system, the models
                                                                     learn to detect patterns from the data they were trained on.
2. X-in-the-Loop (XiL) testing: as introduced in (Stellet            This makes it difficult to combine neural networks with
   et al. 2015; Riedmaier et al. 2018), combines real-world          the existing safety monitoring techniques, and might in
Copyright © 2021for this paper by its authors. Use permitted under   fact limit the expressive power and freedom of neural net-
Creative Commons License Attribution 4.0 International (CC BY        works (Ashmore, Calinescu, and Paterson 2019). Thus this
4.0).                                                                paper introduces “Crash Prediction Networks” (CPN) (Nair
              state, reward                                                                                Crash
                                                                   Input     NN-based        Action                      Valid
                              RL-agent                                       component                  Predication
                                           isCrashed                                                     Network
                    action
Environment                                         Compare                                                    Invalid
                                                                                                         Fail-safe
                                Crash      prediction
                              Prediction
                                                                                                          mode
                               Network             update
            state, action                                         Figure 2: Operation phase of the crash prediction net-
                                                                  work (Nair et al. 2019)

Figure 1: Training phase of the crash prediction net-
work (Nair et al. 2019)                                           vention, also referred to as the “Fail Safe Mode” (as seen in
                                                                  Figure 2), can be of the form of transferring control to the
                                                                  human driver, or pulling over to the shoulder of the road and
et al. 2019) as a ‘safety by construction’ solution for           so on. This paper does not delve into the details of the in-
monitoring modules composed of neural networks. The               tervention, but rather focuses on developing the technique to
aim of this paper is to explore the effectiveness of CPN          identify when the intervention should be triggered.
in monitoring various aspects of vehicular safety. CPN for           A potential problem that might be encountered is that
this purpose has been evaluated in a simulated environment        CPN might lose its relevance in the real-world over time due
and in the presence of dynamic obstacles. Additionally, the       to the dynamic and constantly evolving state of the opera-
potential enhancement of prediction accuracy has also been        tional environment. The setup of CPN makes it possible to
investigated using temporal features in the architecture of       train the network in an iterative manner, which can be used
the networks.                                                     to combat this problem. Implementing iterative training with
                                                                  CPN would require the continuous collection of live driv-
Overall Architecture                                              ing data during the operational phase and training CPN on
Consider an end-to-end setup for the driving system with a        the newly collected data at regular intervals. Additionally, an
range of sensors that can be trained in simulation, the driv-     advantage with this technique is that it is not tightly coupled
ing module uses information about the state of the environ-       with the nature of the driving agent, though the experiments
ment to decide whether to continue straight, turn or apply        use an RL-based driving agent, one could easily swap it for
brakes. The Crash Prediction Network can be thought of as         any other driving agent.
an envelope around this driving module. The aim of CPN               Though the proposed architecture and techniques suggest
is to study the action decision obtained as output from the       the use of an ensemble of networks each focusing on a sub-
driving module, to determine whether it is likely to lead to      group of different sensors, the scope of this evaluation is
a crash given the sensory information about the state. Safety     limited only to visual data collected using RGB cameras
monitors need to be extremely robust and reliable, thus to        mounted on the hood of the car in simulation. The paper
this effect CPN is suggested to be implemented as an ensem-       studies the importance of accounting for temporal features
ble (Ying Zhao, Jun Gao, and Xuezhi Yang 2005) of neural          in the data on the prediction accuracy of CPN by modeling
networks. Each network differs either in architecture or the      two different network architectures, namely “Simple CPN”
subset of sensory data it consumes as input. The networks         and “Spatio-Temporal (ST)-CPN” as depicted in Figure 3.
then work in unison to reach a consensus on whether the ve-       Simple CPN uses a single frame, i.e., an image of the cur-
hicle should continue with the current action or trigger an       rent state to make a prediction about the level of safety of the
intervention.                                                     next state based on the proposed action of the driving mod-
   During the training phase for CPN ( Figure 1), a dataset is    ule. This is achieved by using a VGG network (Simonyan
created by allowing a Reinforcement Learning driving agent        and Zisserman 2014) (trained from scratch with the dataset
to interact with the environment, and storing the information     collected in CARLA as described in the following section)
of the states encountered and the action taken along with the     for feature extraction, which is then concatenated with the
outcome. This allows the training of the CPN to be posed as       action decision to perform the classification. The ST-CPN
a classification problem with two classes ‘safe’ or ‘unsafe’      architecture, on the other hand, takes as input an N-frame
indicating whether the action decision with the given state       long history, i.e., the last N-frames encountered before the
information led to a collision or not. During the operational     current state, along with the proposed driving decision.
phase (Figure 2), the ensemble of networks that compose              Additionally, the importance of accounting for uncer-
CPN observe the input from the various sensors and the de-        tainty is also studied in this evaluation. Standard deep learn-
cision proposed by the driving module to predict the “safe-       ing uses point estimates for predictions (Goodfellow, Ben-
ness” of the outcome. If the proposed action is likely to lead    gio, and Courville 2016). Thus, even when the model en-
to a catastrophic state, a predefined intervention is triggered   counters inputs that are dissimilar to the ones that it was
thereby filtering out potentially dangerous action decisions,     trained on, it might counterintuitively generate a high proba-
else, the proposed action is executed. The predefined inter-      bility score, thereby making probability scores an unreliable
   image (84x84x3)        action (1x5)
                                          image (84x84x3)        action (1x5)

      3x3 conv, 64
                                           3x3 convlstm, 20

      3x3 conv, 64
                                          batch normalization

  2x2 maxpool, stride=2
                                            1x2x2 maxpool

      3x3 conv, 128
                                                                                   Figure 4: Format of the dataset used by Simple CPN
                                           3x3 convlstm, 20

      3x3 conv, 128
                                                                                estimate of the model’s confidence. This can be combated
                                          batch normalization
                                                                                with the use of Bayesian deep learning which allows for
  2x2 maxpool, stride=2                                                         a probabilistic approach to predictions by inferring distri-
                                            1x2x2 maxpool                       butions over the model parameters (Gal 2016; Kwon et al.
      3x3 conv, 512
                                                                                2020). Beside generating uncertainty estimates, Bayesian
                                           3x3 convlstm, 40
                                                                                deep learning also helps reduce over-fitting. However, such
                                                                                models are difficult to train, and usually have intractable ob-
      3x3 conv, 512
                                                                                jective functions. Thus, in this work, we explore the need
                                          batch normalization
                                                                                for accounting for uncertainty in safety monitors, like CPN,
      3x3 conv, 512                                                             with the use of MC-Dropout (Gal and Ghahramani 2016) to
                                            1x2x2 maxpool                       approximate the Bayesian function.
  2x2 maxpool, stride=2

                                           3x3 convlstm, 40                                Experiments and Evaluations
      3x3 conv, 512                                                             As stated in the previous section, the scope of the experi-
                                          batch normalization
                                                                                ments was limited to inputs from RGB cameras placed on
                                                                                the hood of the car. Additionally, the focus of the networks
      3x3 conv, 512
                                                                                was on preventing “locally avoidable catastrophes” (Saun-
                                            1x2x2 maxpool
                                                                                ders et al. 2018), i.e., ones that can be avoided by adjusting
      3x3 conv, 512                                                             the course of action when danger is imminent. This simpli-
                                              3x3 conv,30
                                           (time distributed)                   fying assumption eliminates the need for long-term strategic
  2x2 maxpool, stride=2                                                         planning and focuses only at the point of failure. The exper-
                                         2x2 maxpool, stride=2
                                           (time distributed)
                                                                                iments were conducted using Carla 0.9.6, “an open-source
                                                                                simulator for urban driving” (Dosovitskiy et al. 2017). The
      3x3 conv, 512
                                                                                CARLA simulator provides a “Scenario Runner”, which
                                              dense, 300
                                                                                acts as an additional layer over the simulator, to support the
      3x3 conv, 512                                                             testing of driving scenarios laid out by NHTSA as a list of
                                               dense, 10                        pre-crash typologies (Najm et al. 2007). To study the im-
      3x3 conv, 512                                                             portance of temporal features in safety prediction, ST-CPN
                                               dense, 10                        is compared against the single frame input of Simple CPN.
  2x2 maxpool, stride=2                                                         The evaluation in this paper, uses a history length of 10,
                                               dense, 1                         meaning the last 10 image frames encountered by the ego
                                                                                vehicle are fed as input to the ST-CPN model. Both the mod-
       dense, 500
                                                                                els are extended for further experiments with uncertainty by
                                             safe/unsafe                        applying MC-Dropout (Gal and Ghahramani 2016). To this
       dense, 100                                                               effect, a Dropout layer with a probability of 0.4 is applied
                                                                                during training and inference after each of the trainable lay-
        dense, 10                                                               ers (namely, conv, convlstm and dense) in the models.

                                                                                Dataset
        dense, 10
                                                                                Creating a representative dataset is a vital part of the deep
                                                                                learning pipeline. Data for the experiments in this paper was
        dense, 1
                                                                                collected by allowing the ego vehicle backed by an RL-
                                                                                agent to drive in and interact with the simulated environment
      safe/unsafe                                                               in CARLA. The simulator provides pre-built environments,
                                                                                called “Towns”. Towns 01, 03 and 04 were used for training
                                                                                and validation, while towns 02 and 05 were used for testing.
Figure 3: The Simple CPN (left), and ST-CPN (right) archi-                      For the initial tests of the proposed approach presented here,
tectures used in conducting experiments
the scale of the experiment was fairly limited with 18000
images (12000 safe and 6000 unsafe) used for training and
9000 (6000 safe and 3000 unsafe) used for testing.
   For the experiments discussed here, the ego vehicle used
in the simulation was equipped with three RGB cameras,
placed on the left, right and center of the far-front of the
hood of the car. The cameras enabled the ego vehicle to bet-
ter perceive its surroundings, allowing for a wider field of
view. As mentioned before, the two network architectures
required two different formats of data. Thus, for the Simple
CPN model single frames of images were stored. The RGB
images from the three cameras were first converted to gray-
scale to reduce the effect of color on the decision making of
the neural network since the network only needs to detect
the presence of an obstacle, and not the type of the obstacle.
The single-channel gray-scale images from the three cam-
eras were then combined depth-wise to create a single three-
channel image of dimension 84x84x3 (as depicted in Fig-            Figure 5: Simple CPN model on test set with dynamic ob-
ure 4), such that each channel represented one of the gray-        stacles
scale images. To extend the data to the ST-CPN model, a
similar procedure was followed to store a concatenation of
the last 10 image frames per step, such that 10 image frame        Simple CPN with Dynamic Obstacles
were stacked vertically to generate an (84x10)x84x3 image.         Seeing the results of the previous experiment, the simulation
Before being fed to the model as input, the single long im-        environment for the following experiments was extended to
age was processed into a series of 10 images of dimensions         include dynamic obstacles in the form of 2- and 4- wheeler
84x84x3 akin to a short video. The two networks perform            vehicles. The Simple CPN model was tasked with taking as
binary classification, such that the final dense layer contains    input an image of the current state along with the proposed
a single neuron. Thus, a decision threshold value of 0.6 was       action decision to predict if the next state would be ‘unsafe’.
used, such that if the output layer neuron produces a value        The model was able to replicate the success of the previous
greater than 0.6 then the state-action pair is classified as un-   experiment, achieving a test accuracy of 0.8018 and AUC-
safe.                                                              PRC score of 0.7624.
                                                                   As per the confusion matrix depicted in Figure 5, the num-
Evaluation Metrics                                                 ber of false negatives were quite high. However, for safety-
                                                                   critical applications such as autonomous vehicles, it is im-
In real world scenarios, unsafe crash states are compara-          portant to reduce false negatives to as low as possible. In or-
tively rare, which often leads to imbalanced datasets. This        der to improve the results of the classification, class weights
information is captured in the collected dataset by enforc-        were introduced in the Simple CPN architecture. The class
ing a mild imbalance, as can be noticed in the description of      weights were used during training in the ratio of 1:2, such
the dataset in the previous section. Thus, accuracy, the most      that the penalty of wrongly classifying a crash case applied
commonly used metric in deep learning, does not suffice as         to the loss was double that of the penalty of wrongly clas-
it could lead to a false sense of success. Since falsely classi-   sifying a non-crash case. The results summarized in table 1
fying unsafe states as safe is much worse than vice versa, the     show a slight improvement of the overall accuracy and an
main focus of the models should be on reducing false neg-          increased recall score.
atives. Therefore, recall is an important metric for the CPN
models, which has been captured in this paper via precision-       ST-CPN with Dynamic Obstacles
recall curves.                                                     Despite the presence of dynamic objects in the environment,
                                                                   the Simple CPN made its decision based on only a single
Simple CPN with Static Obstacles Only                              frame of information. This does not allow the network to
                                                                   model the motion of the ego vehicle as well as other obsta-
Before moving on to complex scenarios, it was necessary to         cles in the environment. To better deal with moving objects,
test if a deep-learning based model could help predict the         the ST-CPN model was introduced, which uses ConvLSTM
possibility of a crash based on state and action information.      to process a contiguous series of 10 frames of images.
As a sanity check, in this first experiment the Simple CPN            Looking at Figure 7 and Figure 5, it is evident that the
model was tested with only static obstacles, which meant           ST-CPN model was able to much better identify “unsafe”
crashes into walls, fences, rocks, crates on the road and other    situations, thereby reducing the number of false negatives.
static objects in an urban scenario. With a test accuracy of       This was further visible on comparing the results of the ST-
0.7907, the model was able to predict crash situations. How-       CPN model against the Simple CPN model in table 1 on
ever, the accuracy was inadequate to be practically usable.        the test set. The disadvantage however was that, due to the
                      Table 1: Comparison of classification metrics on the test set with clear weather.

 Type                     ACCURACY        RECALL        PRECISION           AUC-PR       Note
 Simple CPN                  0.8018          0.57            0.77            0.7624
 Simple CPN                  0.8131          0.68            0.74            0.7706      with loss adaption
 ST-CPN                      0.8773          0.74            0.87            0.8951
 Bayesian Simple CPN         0.8015          0.57            0.77            0.7624      Uncertainty: 0.0162
 Bayesian ST-CPN             0.8281          0.63            0.71            0.8274      Uncertainty: 0.0166
 Bayesian Combined           0.8328          0.62            0.71            0.8382      Uncertainty: 0.0222


                                                                    higher complexity of the model, the inference time of ST-
                                                                    CPN increases by a factor of 10 when compared to that of
                                                                    Simple CPN.
                                                                       Following the performance of the ST-CPN model, a study
                                                                    on the reaction of the model to out-of-distribution data, i.e.,
                                                                    data that is slightly different from the conditions that the
                                                                    model was trained on, was performed. For this 3000 images
                                                                    with clear weather, similar to training conditions, and 3000
                                                                    images with rainy weather were collected. These were re-
                                                                    ferred to as “Test small” and “Test rainy”, respectively. As
                                                                    can be seen in table 2 and Figure 8, the rainy condition
                                                                    causes the model to misclassify comparatively more “un-
                                                                    safe” scenarios, thereby dropping the performance of the
                                                                    model.


Figure 6: Performance of the Simple CPN model against the
ST-CPN model on the test set with dynamic obstacles


                                                                    Figure 8: Precision-recall curve comparing the performance
                                                                    of the ST-CPN model on test sets with clear (default) and
                                                                    rainy weather conditions


                                                                    Simple CPN with Uncertainty in dynamic
                                                                    environment
                                                                    Though accounting for temporal features in the prediction
                                                                    model helped improve the performance, it suffered from
                                                                    drop in performance when encountering out-of-distribution
                                                                    data. Since the real-world is constantly evolving and can-
Figure 7: ST-CPN model on test data with dynamic obstacles
                                                                    not be modeled completely in the training data, it is neces-
                                                                    sary to have in place techniques for the models to deal with
                                                                    such data. As pointed out in an earlier section, standard deep
                     Table 2: Comparison of classification metrics on test sets with clear and rainy weather.

 Type                       ACCURACY           RECALL        PRECISION           AUC-PR       Note
 Bayesian Simple CPN            0.9443            0.90             0.93            0.9740     Uncertainty: 0.0139, training set
 Bayesian Simple CPN            0.8030            0.57             0.78            0.7679     Uncertainty: 0.0163, test set
 Bayesian Simple CPN            0.6173            0.69             0.45            0.5408     Uncertainty: 0.0212, rainy test set
 ST-CPN                         0.8816            0.75             0.88            0.8983     test set
 ST-CPN                         0.7770            0.45             0.79            0.7140     rainy test set


learning gives no information about the confidence of the
model in its prediction. Thus, both the models from the pre-
vious experiments were extended with “MC-Dropout” (Gal
and Ghahramani 2016) by placing a dropout layer after ev-
ery convolutional and dense layer, except the output layer.
The dropout was then applied not just during training but
also during testing, wherein each input was used to generate
T predictions to calculate the mean and variance. The vari-
ance on each observation/data point is an indication of how
certain the model is in its prediction, which in turn shows
how similar the test data is to the training data. Since perfor-
mance improvements of using class weights were negligible,
the Bayesian version of the Simple CPN model was trained
without class weights.
   The advantages of using uncertainty becomes clear when
using a dataset that differs considerably from the training
data. Thus, the performance of Bayesian Simple CPN is
evaluated on “Test small” and “Test rainy” from the the pre-              Figure 9: Precision-recall curve comparing the performance
vious experiment. As can be seen from table 2, the test set               of the Bayesian versions of Simple CPN model, ST-CPN
with clear weather has similar uncertainty estimates as the               model and a weighted combination of the two models
training set, however, while using the test set with the rainy
weather, the uncertainty estimate increases. Thus, even with
a change as small as variation in weather can increase un-                ity thresholds, the combined model performs slightly better
certainty. The uncertainty estimates can therefore be useful              than both individual models, as seen in Figure 9, thereby
in building trust in the prediction of the CPN models.                    showing the benefit of modeling CPN as an ensemble of di-
   The benefit of estimating the confidence of the model in               verse networks working in unison to make a prediction about
its decision however comes at the cost of a significantly                 the future state.
longer inference time. During evaluations, the model took
10 times longer to compute the class labels and their corre-                                    Conclusion
sponding confidence values.
                                                                          The paper evaluated the proposed system of “Crash Predic-
ST-CPN with Uncertainty in Dynamic                                        tion Network” and demonstrated its advantage in exploiting
                                                                          the expressiveness of neural networks, while also maintain-
Environment
                                                                          ing robustness due to the ensemble-like setup. CPN provides
To study the effect of uncertainty estimates, combined with               redundancy to the setup by adding a layer of “safety pre-
the benefits of modeling temporal features, the ST-CPN                    diction”. CPN has the power to re-check the decisions of
network, similar to the previous experiment with Simple                   the driving module, thereby, sharing the load of safe deci-
CPN, was extended using MC-Dropout. To ensure that the                    sion making with the driving module. More importantly, it
model was comparable to the Bayesian version of the Sim-                  should be noted that CPN is not just a solution for futuris-
ple CPN model of the previous experiments, both of the                    tic autonomous vehicles, but can in fact be integrated with
models were trained to have a similar validation accuracy                 current levels of automation to monitor driving decisions of
of about 0.80. Additionally, to capture the essence of the                human drivers to ensure safer and more conservative driv-
proposed CPN model with multiple independent neural net-                  ing, thereby reducing human error on roads.
works, the outputs of the “Bayesian Simple CPN” model                        Based on the evaluations discussed in the paper, the im-
and the “Bayesian ST-CPN” model were combined as a                        portance of accounting for temporal features to model mo-
weighted average with higher weight being assigned to the                 tion in the environment is evident. Thus, future work in-
latter. This model is referred to as the “Combined” model.                volves extending the CPN model to support other sensory
   Evaluating the performance over a range of probabil-                   data as input, along with examining longer history lengths.
Furthermore, one could increase the level of granularity of      Machin, M.; Guiochet, J.; Waeselynck, H.; Blanquart, J.-P.;
the prediction label beyond just ‘safe’ and ‘unsafe’. This       Roy, M.; and Masson, L. 2016. Smof: A safety monitoring
would allow for a more sensitive setup with the ability          framework for autonomous systems. IEEE Transactions on
to trigger situation-specific interventions, which is not cur-   Systems, Man, and Cybernetics: Systems 48(5): 702–715.
rently possible. Additionally, the networks that form the en-    McAllister, R.; Gal, Y.; Kendall, A.; Van Der Wilk, M.;
semble could be extended beyond merely sensory inputs to         Shah, A.; Cipolla, R.; and Weller, A. 2017. Concrete prob-
make it geographical region and/or rule-specific with con-       lems for autonomous vehicle safety: Advantages of bayesian
siderably little effort.                                         deep learning. International Joint Conferences on Artificial
                                                                 Intelligence, Inc.
                       References                                Nair, S.; Shafaei, S.; Kugele, S.; Osman, M. H.; and Knoll,
Ashmore, R.; Calinescu, R.; and Paterson, C. 2019. Assur-        A. 2019. Monitoring safety of autonomous vehicles with
ing the machine learning lifecycle: Desiderata, methods, and     crash prediction networks. In SafeAI@ AAAI.
challenges. arXiv preprint arXiv:1905.04223 .
                                                                 Najm, W. G.; Smith, J. D.; Yanagisawa, M.; et al. 2007. Pre-
Avizienis, A.; Laprie, J.-C.; Randell, B.; and Landwehr, C.      crash scenario typology for crash avoidance research. Tech-
2004. Basic concepts and taxonomy of dependable and se-          nical report, United States. National Highway Traffic Safety
cure computing. IEEE transactions on dependable and se-          Administration.
cure computing 1(1): 11–33.
                                                                 Riedmaier, S.; Nesensohn, J.; Gutenkunst, C.; Düser, T.;
Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; and          Schick, B.; and Abdellatif, H. 2018. Validation of X-in-the-
Koltun, V. 2017. CARLA: An open urban driving simula-            Loop Approaches for Virtual Homologation of Automated
tor. arXiv preprint arXiv:1711.03938 .                           Driving Functions. In GSVF-Symposium. Graz.
Gal, Y. 2016. Uncertainty in deep learning. University of        Rudolph, A.; Voget, S.; and Mottok, J. 2018. A consistent
Cambridge 1.                                                     safety case argumentation for artificial intelligence in safety
Gal, Y.; and Ghahramani, Z. 2016. Dropout as a bayesian ap-      related automotive systems.
proximation: Representing model uncertainty in deep learn-       Saunders, W.; Sastry, G.; Stuhlmueller, A.; and Evans, O.
ing. In international conference on machine learning, 1050–      2018. Trial without error: Towards safe reinforcement learn-
1059.                                                            ing via human intervention. In Proceedings of the 17th In-
Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep         ternational Conference on Autonomous Agents and MultiA-
learning. MIT press.                                             gent Systems, 2067–2069. International Foundation for Au-
                                                                 tonomous Agents and Multiagent Systems.
Huang, W.; Wang, K.; Lv, Y.; and Zhu, F. 2016. Autonomous
vehicles testing methods review. In 2016 IEEE 19th Inter-        Simonyan, K.; and Zisserman, A. 2014. Very deep convo-
national Conference on Intelligent Transportation Systems        lutional networks for large-scale image recognition. arXiv
(ITSC), 163–168. IEEE.                                           preprint arXiv:1409.1556 .
Kalra, N.; and Paddock, S. M. 2016. Driving to safety:           Stellet, J. E.; Zofka, M. R.; Schumacher, J.; Schamm, T.;
How many miles of driving would it take to demonstrate au-       Niewels, F.; and Zöllner, J. M. 2015. Testing of advanced
tonomous vehicle reliability? Transportation Research Part       driver assistance towards automated driving: A survey and
A: Policy and Practice 94: 182–193.                              taxonomy on existing approaches and open questions. In
                                                                 2015 IEEE 18th International Conference on Intelligent
Koopman, P.; and Wagner, M. 2016. Challenges in au-
                                                                 Transportation Systems, 1455–1462. IEEE.
tonomous vehicle testing and validation. SAE International
Journal of Transportation Safety 4(1): 15–24.                    Törngren, M.; Zhang, X.; Mohan, N.; Becker, M.; Svensson,
                                                                 L.; Tao, X.; Chen, D.-J.; and Westman, J. 2018. Architecting
Koopman, P.; and Wagner, M. 2017. Autonomous vehi-
                                                                 Safety Supervisors for High Levels of Automated Driving.
cle safety: An interdisciplinary challenge. IEEE Intelligent
                                                                 In 2018 21st International Conference on Intelligent Trans-
Transportation Systems Magazine 9(1): 90–96.
                                                                 portation Systems (ITSC), 1721–1728. IEEE.
Kwon, Y.; Won, J.-H.; Kim, B. J.; and Paik, M. C. 2020.
                                                                 Ying Zhao; Jun Gao; and Xuezhi Yang. 2005. A survey of
Uncertainty quantification using Bayesian neural networks
                                                                 neural network ensembles. In 2005 International Confer-
in classification: Application to biomedical image segmen-
                                                                 ence on Neural Networks and Brain, volume 1, 438–442.
tation. Computational Statistics & Data Analysis 142:
                                                                 doi:10.1109/ICNNB.2005.1614650.
106816.