An Evaluation of “Crash Prediction Networks” (CPN) for Autonomous Driving Scenarios in CARLA Simulator Saasha Nair,1, 2 Sina Shafaei, 3 Daniel Auge, 3 Alois Knoll 3 1 TeCIP Institute, Scuola Superiore Sant’Anna, Pisa, Italy 2 Department of Excellence in Robotics & AI, Scuola Superiore Sant’Anna, Pisa, Italy 3 Technical University of Munich, Germany saasha.nair@santannapisa.it, sina.shafaei@tum.de, daniel.auge@tum.de, knoll@in.tum.de Abstract and simulated components for testing the functionality of parts of the system. The software, hardware and the vehi- Safeguarding the trajectory planning algorithms of au- tonomous vehicles is crucial for safe operation in mixed traf- cle itself can be tested while simulating the residual parts. fic scenarios. This paper proposes the use of an ensemble 3. Testing in real traffic: allows to study how the au- of neural networks to work together under the moniker of tonomous vehicle reacts to real-world driving scenarios “Crash Prediction Networks”. The system comprises multi- ple independent networks, each focusing on a different sub- on public roads with other participating actors. set of sensory inputs. The aim is for the networks to work Safety concerns are still seen to be a major impediment in together in unison to reach a consensus of whether a vehicle the wide-scale adoption of autonomous vehicles (Rudolph, might enter a catastrophic state to trigger an appropriate in- tervention. The proposed approach would act as an additional Voget, and Mottok 2018; Kalra and Paddock 2016; McAl- layer of safety by supervising the decision making module of lister et al. 2017; Koopman and Wagner 2016). Safety moni- an autonomous vehicle. tors can potentially help to alleviate the safety concerns sur- Though the proposed approach encompasses all the sensors rounding autonomous vehicles, by observing the inputs to and allied paraphernalia, the scope of this paper is exclusively and outputs from the various modules contained in the driv- limited to exploring safety monitors for visual sensors. The ing pipeline. These monitors usually differentiate between approach can, however, be extrapolated to other sensors. The three main states that the system of interest, in this case, evaluation was conducted using the CARLA simulator for the car, may end up in (Machin et al. 2016). The first be- simple driving scenarios studying the benefits of modeling ing the catastrophic state, that is one where the damage has temporal features to capture the motion in the environment. already been done and from which the system cannot be re- Additionally, the paper studies the importance of ‘accounting covered. The remaining two states, namely the safe state and for uncertainty’ in models dealing with vehicle safety. the warning state, are non-catastrophic. In the safe state the system behaves as expected, without any constraints. The Introduction warning state is where the system is close to being involved Safety, as defined by Avizenis (Avizienis et al. 2004), is the in a catastrophe but can still be salvaged. For a system to “absence of catastrophic consequences on the user(s) and the transition from a safe to a catastrophic state, it must always environment”. Therefore, safety engineering (Törngren et al. pass through the warning state, thus the warning state can 2018) relates to “methods used to assess, eliminate and/or be thought of as the margin, where applying the correct in- reduce risk to acceptable level”. Safety in vehicles requires tervention brings the system back to the safe state. Thus, implementing vehicle-level behaviours (Koopman and Wag- it requires defining a safety strategy that identifies potential ner 2017) that ensures safe and reliable performance across warning states and specifies corresponding safety rules to different contexts (McAllister et al. 2017). This could entail guide the response of the autonomous system. safety by construction, analysis, and verification/validation. The problem with existing approaches to safety Currently, however, testing is the main method used for en- monitoring, such as “Safety Monitoring Framework suring safety in automated vehicles, and includes the follow- (SMOF)” (Machin et al. 2016), is that they require hand- ing types (Huang et al. 2016): - picked and hard-coded limiters defined at the design stage 1. Software testing: helps to ensure the correctness of the describing what counts as safe or unsafe intervals of input program at the source code level via unit tests and testing and output. However, in neural networks, rather than the tools. behaviour being hard-coded into the system, the models learn to detect patterns from the data they were trained on. 2. X-in-the-Loop (XiL) testing: as introduced in (Stellet This makes it difficult to combine neural networks with et al. 2015; Riedmaier et al. 2018), combines real-world the existing safety monitoring techniques, and might in Copyright © 2021for this paper by its authors. Use permitted under fact limit the expressive power and freedom of neural net- Creative Commons License Attribution 4.0 International (CC BY works (Ashmore, Calinescu, and Paterson 2019). Thus this 4.0). paper introduces “Crash Prediction Networks” (CPN) (Nair state, reward Crash Input NN-based Action Valid RL-agent component Predication isCrashed Network action Environment Compare Invalid Fail-safe Crash prediction Prediction mode Network update state, action Figure 2: Operation phase of the crash prediction net- work (Nair et al. 2019) Figure 1: Training phase of the crash prediction net- work (Nair et al. 2019) vention, also referred to as the “Fail Safe Mode” (as seen in Figure 2), can be of the form of transferring control to the human driver, or pulling over to the shoulder of the road and et al. 2019) as a ‘safety by construction’ solution for so on. This paper does not delve into the details of the in- monitoring modules composed of neural networks. The tervention, but rather focuses on developing the technique to aim of this paper is to explore the effectiveness of CPN identify when the intervention should be triggered. in monitoring various aspects of vehicular safety. CPN for A potential problem that might be encountered is that this purpose has been evaluated in a simulated environment CPN might lose its relevance in the real-world over time due and in the presence of dynamic obstacles. Additionally, the to the dynamic and constantly evolving state of the opera- potential enhancement of prediction accuracy has also been tional environment. The setup of CPN makes it possible to investigated using temporal features in the architecture of train the network in an iterative manner, which can be used the networks. to combat this problem. Implementing iterative training with CPN would require the continuous collection of live driv- Overall Architecture ing data during the operational phase and training CPN on Consider an end-to-end setup for the driving system with a the newly collected data at regular intervals. Additionally, an range of sensors that can be trained in simulation, the driv- advantage with this technique is that it is not tightly coupled ing module uses information about the state of the environ- with the nature of the driving agent, though the experiments ment to decide whether to continue straight, turn or apply use an RL-based driving agent, one could easily swap it for brakes. The Crash Prediction Network can be thought of as any other driving agent. an envelope around this driving module. The aim of CPN Though the proposed architecture and techniques suggest is to study the action decision obtained as output from the the use of an ensemble of networks each focusing on a sub- driving module, to determine whether it is likely to lead to group of different sensors, the scope of this evaluation is a crash given the sensory information about the state. Safety limited only to visual data collected using RGB cameras monitors need to be extremely robust and reliable, thus to mounted on the hood of the car in simulation. The paper this effect CPN is suggested to be implemented as an ensem- studies the importance of accounting for temporal features ble (Ying Zhao, Jun Gao, and Xuezhi Yang 2005) of neural in the data on the prediction accuracy of CPN by modeling networks. Each network differs either in architecture or the two different network architectures, namely “Simple CPN” subset of sensory data it consumes as input. The networks and “Spatio-Temporal (ST)-CPN” as depicted in Figure 3. then work in unison to reach a consensus on whether the ve- Simple CPN uses a single frame, i.e., an image of the cur- hicle should continue with the current action or trigger an rent state to make a prediction about the level of safety of the intervention. next state based on the proposed action of the driving mod- During the training phase for CPN ( Figure 1), a dataset is ule. This is achieved by using a VGG network (Simonyan created by allowing a Reinforcement Learning driving agent and Zisserman 2014) (trained from scratch with the dataset to interact with the environment, and storing the information collected in CARLA as described in the following section) of the states encountered and the action taken along with the for feature extraction, which is then concatenated with the outcome. This allows the training of the CPN to be posed as action decision to perform the classification. The ST-CPN a classification problem with two classes ‘safe’ or ‘unsafe’ architecture, on the other hand, takes as input an N-frame indicating whether the action decision with the given state long history, i.e., the last N-frames encountered before the information led to a collision or not. During the operational current state, along with the proposed driving decision. phase (Figure 2), the ensemble of networks that compose Additionally, the importance of accounting for uncer- CPN observe the input from the various sensors and the de- tainty is also studied in this evaluation. Standard deep learn- cision proposed by the driving module to predict the “safe- ing uses point estimates for predictions (Goodfellow, Ben- ness” of the outcome. If the proposed action is likely to lead gio, and Courville 2016). Thus, even when the model en- to a catastrophic state, a predefined intervention is triggered counters inputs that are dissimilar to the ones that it was thereby filtering out potentially dangerous action decisions, trained on, it might counterintuitively generate a high proba- else, the proposed action is executed. The predefined inter- bility score, thereby making probability scores an unreliable image (84x84x3) action (1x5) image (84x84x3) action (1x5) 3x3 conv, 64 3x3 convlstm, 20 3x3 conv, 64 batch normalization 2x2 maxpool, stride=2 1x2x2 maxpool 3x3 conv, 128 Figure 4: Format of the dataset used by Simple CPN 3x3 convlstm, 20 3x3 conv, 128 estimate of the model’s confidence. This can be combated batch normalization with the use of Bayesian deep learning which allows for 2x2 maxpool, stride=2 a probabilistic approach to predictions by inferring distri- 1x2x2 maxpool butions over the model parameters (Gal 2016; Kwon et al. 3x3 conv, 512 2020). Beside generating uncertainty estimates, Bayesian 3x3 convlstm, 40 deep learning also helps reduce over-fitting. However, such models are difficult to train, and usually have intractable ob- 3x3 conv, 512 jective functions. Thus, in this work, we explore the need batch normalization for accounting for uncertainty in safety monitors, like CPN, 3x3 conv, 512 with the use of MC-Dropout (Gal and Ghahramani 2016) to 1x2x2 maxpool approximate the Bayesian function. 2x2 maxpool, stride=2 3x3 convlstm, 40 Experiments and Evaluations 3x3 conv, 512 As stated in the previous section, the scope of the experi- batch normalization ments was limited to inputs from RGB cameras placed on the hood of the car. Additionally, the focus of the networks 3x3 conv, 512 was on preventing “locally avoidable catastrophes” (Saun- 1x2x2 maxpool ders et al. 2018), i.e., ones that can be avoided by adjusting 3x3 conv, 512 the course of action when danger is imminent. This simpli- 3x3 conv,30 (time distributed) fying assumption eliminates the need for long-term strategic 2x2 maxpool, stride=2 planning and focuses only at the point of failure. The exper- 2x2 maxpool, stride=2 (time distributed) iments were conducted using Carla 0.9.6, “an open-source simulator for urban driving” (Dosovitskiy et al. 2017). The 3x3 conv, 512 CARLA simulator provides a “Scenario Runner”, which dense, 300 acts as an additional layer over the simulator, to support the 3x3 conv, 512 testing of driving scenarios laid out by NHTSA as a list of dense, 10 pre-crash typologies (Najm et al. 2007). To study the im- 3x3 conv, 512 portance of temporal features in safety prediction, ST-CPN dense, 10 is compared against the single frame input of Simple CPN. 2x2 maxpool, stride=2 The evaluation in this paper, uses a history length of 10, dense, 1 meaning the last 10 image frames encountered by the ego vehicle are fed as input to the ST-CPN model. Both the mod- dense, 500 els are extended for further experiments with uncertainty by safe/unsafe applying MC-Dropout (Gal and Ghahramani 2016). To this dense, 100 effect, a Dropout layer with a probability of 0.4 is applied during training and inference after each of the trainable lay- dense, 10 ers (namely, conv, convlstm and dense) in the models. Dataset dense, 10 Creating a representative dataset is a vital part of the deep learning pipeline. Data for the experiments in this paper was dense, 1 collected by allowing the ego vehicle backed by an RL- agent to drive in and interact with the simulated environment safe/unsafe in CARLA. The simulator provides pre-built environments, called “Towns”. Towns 01, 03 and 04 were used for training and validation, while towns 02 and 05 were used for testing. Figure 3: The Simple CPN (left), and ST-CPN (right) archi- For the initial tests of the proposed approach presented here, tectures used in conducting experiments the scale of the experiment was fairly limited with 18000 images (12000 safe and 6000 unsafe) used for training and 9000 (6000 safe and 3000 unsafe) used for testing. For the experiments discussed here, the ego vehicle used in the simulation was equipped with three RGB cameras, placed on the left, right and center of the far-front of the hood of the car. The cameras enabled the ego vehicle to bet- ter perceive its surroundings, allowing for a wider field of view. As mentioned before, the two network architectures required two different formats of data. Thus, for the Simple CPN model single frames of images were stored. The RGB images from the three cameras were first converted to gray- scale to reduce the effect of color on the decision making of the neural network since the network only needs to detect the presence of an obstacle, and not the type of the obstacle. The single-channel gray-scale images from the three cam- eras were then combined depth-wise to create a single three- channel image of dimension 84x84x3 (as depicted in Fig- Figure 5: Simple CPN model on test set with dynamic ob- ure 4), such that each channel represented one of the gray- stacles scale images. To extend the data to the ST-CPN model, a similar procedure was followed to store a concatenation of the last 10 image frames per step, such that 10 image frame Simple CPN with Dynamic Obstacles were stacked vertically to generate an (84x10)x84x3 image. Seeing the results of the previous experiment, the simulation Before being fed to the model as input, the single long im- environment for the following experiments was extended to age was processed into a series of 10 images of dimensions include dynamic obstacles in the form of 2- and 4- wheeler 84x84x3 akin to a short video. The two networks perform vehicles. The Simple CPN model was tasked with taking as binary classification, such that the final dense layer contains input an image of the current state along with the proposed a single neuron. Thus, a decision threshold value of 0.6 was action decision to predict if the next state would be ‘unsafe’. used, such that if the output layer neuron produces a value The model was able to replicate the success of the previous greater than 0.6 then the state-action pair is classified as un- experiment, achieving a test accuracy of 0.8018 and AUC- safe. PRC score of 0.7624. As per the confusion matrix depicted in Figure 5, the num- Evaluation Metrics ber of false negatives were quite high. However, for safety- critical applications such as autonomous vehicles, it is im- In real world scenarios, unsafe crash states are compara- portant to reduce false negatives to as low as possible. In or- tively rare, which often leads to imbalanced datasets. This der to improve the results of the classification, class weights information is captured in the collected dataset by enforc- were introduced in the Simple CPN architecture. The class ing a mild imbalance, as can be noticed in the description of weights were used during training in the ratio of 1:2, such the dataset in the previous section. Thus, accuracy, the most that the penalty of wrongly classifying a crash case applied commonly used metric in deep learning, does not suffice as to the loss was double that of the penalty of wrongly clas- it could lead to a false sense of success. Since falsely classi- sifying a non-crash case. The results summarized in table 1 fying unsafe states as safe is much worse than vice versa, the show a slight improvement of the overall accuracy and an main focus of the models should be on reducing false neg- increased recall score. atives. Therefore, recall is an important metric for the CPN models, which has been captured in this paper via precision- ST-CPN with Dynamic Obstacles recall curves. Despite the presence of dynamic objects in the environment, the Simple CPN made its decision based on only a single Simple CPN with Static Obstacles Only frame of information. This does not allow the network to model the motion of the ego vehicle as well as other obsta- Before moving on to complex scenarios, it was necessary to cles in the environment. To better deal with moving objects, test if a deep-learning based model could help predict the the ST-CPN model was introduced, which uses ConvLSTM possibility of a crash based on state and action information. to process a contiguous series of 10 frames of images. As a sanity check, in this first experiment the Simple CPN Looking at Figure 7 and Figure 5, it is evident that the model was tested with only static obstacles, which meant ST-CPN model was able to much better identify “unsafe” crashes into walls, fences, rocks, crates on the road and other situations, thereby reducing the number of false negatives. static objects in an urban scenario. With a test accuracy of This was further visible on comparing the results of the ST- 0.7907, the model was able to predict crash situations. How- CPN model against the Simple CPN model in table 1 on ever, the accuracy was inadequate to be practically usable. the test set. The disadvantage however was that, due to the Table 1: Comparison of classification metrics on the test set with clear weather. Type ACCURACY RECALL PRECISION AUC-PR Note Simple CPN 0.8018 0.57 0.77 0.7624 Simple CPN 0.8131 0.68 0.74 0.7706 with loss adaption ST-CPN 0.8773 0.74 0.87 0.8951 Bayesian Simple CPN 0.8015 0.57 0.77 0.7624 Uncertainty: 0.0162 Bayesian ST-CPN 0.8281 0.63 0.71 0.8274 Uncertainty: 0.0166 Bayesian Combined 0.8328 0.62 0.71 0.8382 Uncertainty: 0.0222 higher complexity of the model, the inference time of ST- CPN increases by a factor of 10 when compared to that of Simple CPN. Following the performance of the ST-CPN model, a study on the reaction of the model to out-of-distribution data, i.e., data that is slightly different from the conditions that the model was trained on, was performed. For this 3000 images with clear weather, similar to training conditions, and 3000 images with rainy weather were collected. These were re- ferred to as “Test small” and “Test rainy”, respectively. As can be seen in table 2 and Figure 8, the rainy condition causes the model to misclassify comparatively more “un- safe” scenarios, thereby dropping the performance of the model. Figure 6: Performance of the Simple CPN model against the ST-CPN model on the test set with dynamic obstacles Figure 8: Precision-recall curve comparing the performance of the ST-CPN model on test sets with clear (default) and rainy weather conditions Simple CPN with Uncertainty in dynamic environment Though accounting for temporal features in the prediction model helped improve the performance, it suffered from drop in performance when encountering out-of-distribution data. Since the real-world is constantly evolving and can- Figure 7: ST-CPN model on test data with dynamic obstacles not be modeled completely in the training data, it is neces- sary to have in place techniques for the models to deal with such data. As pointed out in an earlier section, standard deep Table 2: Comparison of classification metrics on test sets with clear and rainy weather. Type ACCURACY RECALL PRECISION AUC-PR Note Bayesian Simple CPN 0.9443 0.90 0.93 0.9740 Uncertainty: 0.0139, training set Bayesian Simple CPN 0.8030 0.57 0.78 0.7679 Uncertainty: 0.0163, test set Bayesian Simple CPN 0.6173 0.69 0.45 0.5408 Uncertainty: 0.0212, rainy test set ST-CPN 0.8816 0.75 0.88 0.8983 test set ST-CPN 0.7770 0.45 0.79 0.7140 rainy test set learning gives no information about the confidence of the model in its prediction. Thus, both the models from the pre- vious experiments were extended with “MC-Dropout” (Gal and Ghahramani 2016) by placing a dropout layer after ev- ery convolutional and dense layer, except the output layer. The dropout was then applied not just during training but also during testing, wherein each input was used to generate T predictions to calculate the mean and variance. The vari- ance on each observation/data point is an indication of how certain the model is in its prediction, which in turn shows how similar the test data is to the training data. Since perfor- mance improvements of using class weights were negligible, the Bayesian version of the Simple CPN model was trained without class weights. The advantages of using uncertainty becomes clear when using a dataset that differs considerably from the training data. Thus, the performance of Bayesian Simple CPN is evaluated on “Test small” and “Test rainy” from the the pre- Figure 9: Precision-recall curve comparing the performance vious experiment. As can be seen from table 2, the test set of the Bayesian versions of Simple CPN model, ST-CPN with clear weather has similar uncertainty estimates as the model and a weighted combination of the two models training set, however, while using the test set with the rainy weather, the uncertainty estimate increases. Thus, even with a change as small as variation in weather can increase un- ity thresholds, the combined model performs slightly better certainty. The uncertainty estimates can therefore be useful than both individual models, as seen in Figure 9, thereby in building trust in the prediction of the CPN models. showing the benefit of modeling CPN as an ensemble of di- The benefit of estimating the confidence of the model in verse networks working in unison to make a prediction about its decision however comes at the cost of a significantly the future state. longer inference time. During evaluations, the model took 10 times longer to compute the class labels and their corre- Conclusion sponding confidence values. The paper evaluated the proposed system of “Crash Predic- ST-CPN with Uncertainty in Dynamic tion Network” and demonstrated its advantage in exploiting the expressiveness of neural networks, while also maintain- Environment ing robustness due to the ensemble-like setup. CPN provides To study the effect of uncertainty estimates, combined with redundancy to the setup by adding a layer of “safety pre- the benefits of modeling temporal features, the ST-CPN diction”. CPN has the power to re-check the decisions of network, similar to the previous experiment with Simple the driving module, thereby, sharing the load of safe deci- CPN, was extended using MC-Dropout. To ensure that the sion making with the driving module. More importantly, it model was comparable to the Bayesian version of the Sim- should be noted that CPN is not just a solution for futuris- ple CPN model of the previous experiments, both of the tic autonomous vehicles, but can in fact be integrated with models were trained to have a similar validation accuracy current levels of automation to monitor driving decisions of of about 0.80. Additionally, to capture the essence of the human drivers to ensure safer and more conservative driv- proposed CPN model with multiple independent neural net- ing, thereby reducing human error on roads. works, the outputs of the “Bayesian Simple CPN” model Based on the evaluations discussed in the paper, the im- and the “Bayesian ST-CPN” model were combined as a portance of accounting for temporal features to model mo- weighted average with higher weight being assigned to the tion in the environment is evident. Thus, future work in- latter. This model is referred to as the “Combined” model. volves extending the CPN model to support other sensory Evaluating the performance over a range of probabil- data as input, along with examining longer history lengths. Furthermore, one could increase the level of granularity of Machin, M.; Guiochet, J.; Waeselynck, H.; Blanquart, J.-P.; the prediction label beyond just ‘safe’ and ‘unsafe’. This Roy, M.; and Masson, L. 2016. Smof: A safety monitoring would allow for a more sensitive setup with the ability framework for autonomous systems. IEEE Transactions on to trigger situation-specific interventions, which is not cur- Systems, Man, and Cybernetics: Systems 48(5): 702–715. rently possible. Additionally, the networks that form the en- McAllister, R.; Gal, Y.; Kendall, A.; Van Der Wilk, M.; semble could be extended beyond merely sensory inputs to Shah, A.; Cipolla, R.; and Weller, A. 2017. Concrete prob- make it geographical region and/or rule-specific with con- lems for autonomous vehicle safety: Advantages of bayesian siderably little effort. deep learning. International Joint Conferences on Artificial Intelligence, Inc. References Nair, S.; Shafaei, S.; Kugele, S.; Osman, M. H.; and Knoll, Ashmore, R.; Calinescu, R.; and Paterson, C. 2019. Assur- A. 2019. Monitoring safety of autonomous vehicles with ing the machine learning lifecycle: Desiderata, methods, and crash prediction networks. In SafeAI@ AAAI. challenges. arXiv preprint arXiv:1905.04223 . Najm, W. G.; Smith, J. D.; Yanagisawa, M.; et al. 2007. Pre- Avizienis, A.; Laprie, J.-C.; Randell, B.; and Landwehr, C. crash scenario typology for crash avoidance research. Tech- 2004. Basic concepts and taxonomy of dependable and se- nical report, United States. National Highway Traffic Safety cure computing. IEEE transactions on dependable and se- Administration. cure computing 1(1): 11–33. Riedmaier, S.; Nesensohn, J.; Gutenkunst, C.; Düser, T.; Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; and Schick, B.; and Abdellatif, H. 2018. Validation of X-in-the- Koltun, V. 2017. CARLA: An open urban driving simula- Loop Approaches for Virtual Homologation of Automated tor. arXiv preprint arXiv:1711.03938 . Driving Functions. In GSVF-Symposium. Graz. Gal, Y. 2016. Uncertainty in deep learning. University of Rudolph, A.; Voget, S.; and Mottok, J. 2018. A consistent Cambridge 1. safety case argumentation for artificial intelligence in safety Gal, Y.; and Ghahramani, Z. 2016. Dropout as a bayesian ap- related automotive systems. proximation: Representing model uncertainty in deep learn- Saunders, W.; Sastry, G.; Stuhlmueller, A.; and Evans, O. ing. In international conference on machine learning, 1050– 2018. Trial without error: Towards safe reinforcement learn- 1059. ing via human intervention. In Proceedings of the 17th In- Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep ternational Conference on Autonomous Agents and MultiA- learning. MIT press. gent Systems, 2067–2069. International Foundation for Au- tonomous Agents and Multiagent Systems. Huang, W.; Wang, K.; Lv, Y.; and Zhu, F. 2016. Autonomous vehicles testing methods review. In 2016 IEEE 19th Inter- Simonyan, K.; and Zisserman, A. 2014. Very deep convo- national Conference on Intelligent Transportation Systems lutional networks for large-scale image recognition. arXiv (ITSC), 163–168. IEEE. preprint arXiv:1409.1556 . Kalra, N.; and Paddock, S. M. 2016. Driving to safety: Stellet, J. E.; Zofka, M. R.; Schumacher, J.; Schamm, T.; How many miles of driving would it take to demonstrate au- Niewels, F.; and Zöllner, J. M. 2015. Testing of advanced tonomous vehicle reliability? Transportation Research Part driver assistance towards automated driving: A survey and A: Policy and Practice 94: 182–193. taxonomy on existing approaches and open questions. In 2015 IEEE 18th International Conference on Intelligent Koopman, P.; and Wagner, M. 2016. Challenges in au- Transportation Systems, 1455–1462. IEEE. tonomous vehicle testing and validation. SAE International Journal of Transportation Safety 4(1): 15–24. Törngren, M.; Zhang, X.; Mohan, N.; Becker, M.; Svensson, L.; Tao, X.; Chen, D.-J.; and Westman, J. 2018. Architecting Koopman, P.; and Wagner, M. 2017. Autonomous vehi- Safety Supervisors for High Levels of Automated Driving. cle safety: An interdisciplinary challenge. IEEE Intelligent In 2018 21st International Conference on Intelligent Trans- Transportation Systems Magazine 9(1): 90–96. portation Systems (ITSC), 1721–1728. IEEE. Kwon, Y.; Won, J.-H.; Kim, B. J.; and Paik, M. C. 2020. Ying Zhao; Jun Gao; and Xuezhi Yang. 2005. A survey of Uncertainty quantification using Bayesian neural networks neural network ensembles. In 2005 International Confer- in classification: Application to biomedical image segmen- ence on Neural Networks and Brain, volume 1, 438–442. tation. Computational Statistics & Data Analysis 142: doi:10.1109/ICNNB.2005.1614650. 106816.