=Paper=
{{Paper
|id=Vol-2808/Paper_37
|storemode=property
|title=A Hybrid-AI Approach for Competence Assessment of Automated Driving functions
|pdfUrl=https://ceur-ws.org/Vol-2808/Paper_37.pdf
|volume=Vol-2808
|authors=Jan-Pieter Paardekooper,Mauro Comi,Corrado Grappiolo,Ron Snijders,Willeke van Vught,Rutger Beekelaar
|dblpUrl=https://dblp.org/rec/conf/aaai/PaardekooperCGS21
}}
==A Hybrid-AI Approach for Competence Assessment of Automated Driving functions==
A Hybrid-AI Approach for Competence Assessment of Automated Driving Functions ∗ Jan-Pieter Paardekooper1,2 , Mauro Comi1 * , Corrado Grappiolo3 * , Ron Snijders4 * , Willeke van Vught5 , Rutger Beekelaar 1 1 TNO - Integrated Vehicle Safety, Helmond, The Netherlands 2 Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands. 3 TNO - Data Science, Den Haag, The Netherlands 4 TNO - Monitoring & Control Services, Groningen, The Netherlands 5 TNO - Perceptual and Cognitive Systems, Soesterberg, The Netherlands jan-pieter.paardekooper@tno.nl Abstract Automated driving is one of the most appealing applica- tions of artificial intelligence in an open world. It holds the An increasing number of tasks is being taken over from the promise of reducing the number of casualties (1.35 million human driver as automated driving technology is developed. Accidents have been reported in situations where the auto- yearly (WHO 2018)), increasing the comfort of travel by mated driving technology was not able to function according taking over the driving task from humans, and bringing mo- to specifications. As data-driven Artificial Intelligence (AI) bility to those unable to drive. While fleets of fully auto- systems are becoming more ubiquitous in automated vehi- mated vehicles that can run unrestrained in an open world cles, it is increasingly important to make AI systems situa- are still far away (Koopman and Wagner 2016), many vehi- tional aware. One aspect of this is determining whether these cles are already equipped with Advanced Driver Assistence systems are competent in the current and immediate traffic Systems (Okuda, Kajiwara, and Terashima 2014), like Lane situation, or that they should hand over control to the driver Keep Assist and Adaptive Cruise Control. According to The or safety system. Geneva Convention on road traffic of 1949 and the Vienna We aim to increase the safety of automated driving functions Convention on road traffic 1968, on which many countries by combining data-driven AI systems with knowledge-based base their national traffic laws, a human driver has to be AI into a hybrid-AI system that can reason about competence present in the vehicle (Vellinga 2019). Artificial Intelligence in the traffic state now and in the next few seconds. (AI) opens up the possibility of automation in increasingly We showcase our method using an intention prediction algo- complex situations, but also makes it increasingly complex rithm that is based on a deep neural network and trained with for human drivers to understand the limitations of the system real-world data of traffic participants performing a cut-in ma- neuver in front of the vehicle. This is combined with a uni- (Thill, Hemeren, and Nilsson 2014). fied, quantitative representation of the situation on the road The tremendous success of Deep Neural Networks represented by an ontology-based knowledge graph and first- (DNNs) in the recent years (LeCun, Bengio, and Hinton order logic inference rules, that takes as input both the ob- 2015) has lead to many applications in automated driving, servations of the sensors of the automated vehicle as well as ranging from perception (Cordts et al. 2016) and trajec- the output of the intention prediction algorithm. The knowl- tory prediction (Deo and Trivedi 2018) to decision mak- edge graph utilises the two features of importance, based on ing (Bansal, Krizhevsky, and Ogale 2019). The strength domain knowledge, and doubt, based on the observations and of DNNs is the capability to deal with complex prob- information about the dataset, to reason about the competence of the intention prediction algorithm. lems, but one important drawback for their application in safety-critical systems is how they deal with new situa- We have applied the competence assessment of the intention prediction algorithm to two cut-in scenarios: a traffic situation tions (Hendrycks and Gimpel 2017; McAllister et al. 2017). that is well within the operational design domain described DNNs learn a (possibly very complex) mapping from input by the training data set, and a traffic situation that includes data to output, but they lack an understanding of the deeper an unknown entity in the form of a motorcycle that was not causes of this output. Hence, these algorithms cannot reason part of the training set. In the latter case the knowledge graph about whether they are competent to produce reliable out- correctly reasoned that the intention prediction algorithm was put based on the input data. To safely apply DNNs (or any incapable of producing a reliable prediction. learning algorithm) in automated vehicles, we need to add This shows that hybrid AI for situational awareness holds situational awareness: the comprehension whether the sys- great promise to reduce the risk of automated driving func- tem understands the current environment and is capable of tions in an open world containing unknowns. producing reliable output. ∗ Authors contributed equally In this work we describe a hybrid-AI approach (van Copyright c 2021for this paper by its authors. Use permitted un- Harmelen and ten Teije 2019; Meyer-Vitali et al. 2019) to der Creative Commons License Attribution 4.0 International (CC situational awareness. In this approach, a data-driven AI is BY 4.0). coupled to a knowledge graph with reasoning capabilities. The current application is a DNN that predicts the intention over to the human driver or backup safety system. The of other road users to merge into the lane of the ego vehicle framework’s internal structure is divided into three modules: (cut-in maneuver). This is combined with a knowledge Intention Predictor, Reasoner and Competence Assessment. graph of the traffic state that relates the current situation Raw observations related to each target vehicle, such as to what the predictor has learned from the training data. their speed and acceleration, are fed to the Intention Pre- The knowledge graph reasoner returns an estimate on dictor. This module processes the information via two sub- the reliability of the predictor, which it forecasts into the modules. The first one is a deep neural network trained to immediate future (2 seconds ahead) to be able to warn the output (predict) whether a given target vehicle will perform a driver or safety system in advance that takeover of control cut-in maneuver (Cut-in Classifier). The second sub-module is imminent in the near future. is a Feature Uncertainty Estimator. It holds univariate den- sities of the classifier’s training set input features and pro- vides information on the in-distribution likelihood of the net- Related work work’s input data. The Intention Predictor’s output, the observations related In the automotive domain, situation awareness (Endsley to road geometry (e.g. presence of entry lanes) and lane vis- 1995) is a term often used to describe the readiness of human ibility are fed to the framework’s second module, the Rea- drivers to make good decisions (Endsley 2020). It is based soner. The reasoner — characterised by an ontology and on perception of the environment, comprehension of the cur- first-order logic rules — fuses the input observations with rent situation, and projection into future. Here we extend sit- domain knowledge (encoded in the ontology and in the uation awareness to an AI system in the vehicle, where we rules) into a knowledge graph. The graph realises the frame- add the assessment of competence in the current situation to work’s situational awareness, as it holds a unified represen- the comprehension of the current situation. tation of the current situation and is aware of what entities Machine learning methods tend to underperform when the are important and doubtful. distributions of the test dataset and training dataset differ The graph is then fed to the last module of our frame- significantly. Throughout the paper, we refer to data samples work, Competence Assessment. The module first organ- drawn from the training set distribution as in-distribution ises its present and past situation-aware knowledge. Then (ID), while samples drawn from a different distribution as it projects such knowledge into the future. Finally, it decides out-of-distribution (OOD). DNNs often attribute high con- whether such forecast is outside the autonomous system’s fidence to the classification or prediction of OOD sam- competence level. ples (Hein, Andriushchenko, and Bitterwolf 2019); this be- In the next part, we will describe each module in more haviour, which is especially valid for softmax classifiers detail. (Hendrycks and Gimpel 2016), can have dramatic conse- quences in applications where model reliability and safety Simulation Environment are priorities. Various papers attempt to increase models’ ro- bustness by calibrating the predicting probability estimates For simulation we use CARLA, an Open Source simulator (Guo et al. 2017) or by injecting small perturbations to the which aims to support the development, training, and val- input data (Liang, Li, and Srikant 2017). Density estimation idation of autonomous driving systems (Dosovitskiy et al. methods are also leveraged to detect OOD observations: the 2017). The scenarios are defined using OpenSCENARIO1 , likelihood over the in-distribution sample space can be ap- an open format used to describe synchronized maneuvers of proximated (Dinh, Sohl-Dickstein, and Bengio 2016) (Ren vehicles. et al. 2019) and used to compute the likelihood of new obser- For a given location of the Ego Vehicle (EV), we use the vations, thus detecting those samples that lie in low-density API of CARLA to extract a world model of the road situ- regions. ation. This world model includes the number of lanes, the Another way of dealing with OOD observations is to have presence of an entrance lane and all Target Vehicles (TVs). the DNN output more accurate certainty values. Several ap- For each TV, the velocity, acceleration, angle and position proaches have been described in the literature, ranging from relative towards EV is determined. The lane visibility v is Monte Carlo Dropout (Gal and Ghahramani 2016) to adding calculated as v = ds , where 0 ≤ v ≤ 1. Here, d is the dis- a Gaussian distribution over the weights in the last layer of tance of the closest TV on that lane and s the scope of EV in a ReLU network (Kristiadi, Hein, and Hennig 2020). It has meters (s := 50m by default). been shown that Bayesian deep learning is important for the safety of automated vehicles (McAllister et al. 2017). Intention predictor Predicting the intention of a vehicle to perform a cut-in can Method be framed as a binary classification task; the two labels to classify are ”cut-in” and ”not cut-in”. A data point labeled as To assess the competence of data-driven-AI automated- ”cut-in” refers to the collected information at timestep t for a driving capabilities we propose a pipelined framework, de- TV that performs a cut-in between t and t + 2s. Since more picted in Figure 1. The framework receives as input the ob- than one vehicle can be present at the same time, multiple servations of the current road situation and, via a pipelined data points can be collected at t. information flow, outputs the decision on whether the driv- 1 ing mode should remain autonomous or should be handed https://www.asam.net/standards/detail/openscenario/ Figure 1: The overall architecture of our situation-aware competence assessment framework. The dotted arrow from Real-life Datasets to Intention Predictor is not part of the online information flow. The dataset consists of 24305 data points, divided into During training, the cross-entropy logarithmic loss is 6348 instances labeled as ”cut-in” and 17957 ones la- weighted for the two different classes to take into account beled as ”not cut-in”, drawn from the StreetWise database their imbalance in the dataset. The threshold λ, the learn- (Paardekooper et al. 2019). While completeness measures ing rate and the number of neurons per layer are fine-tuned of this database have been developed (de Gelder et al. 2019), using a Bayesian approach for global optimization (Brochu, the dataset used for training the intention predictor does not Cora, and De Freitas 2010). To reduce overfitting, dropout cover the entire spectrum of cut-ins that are to be expected and early stopping are used. in real-life traffic. However, for the purpose of this work a complete dataset is not essential, as we are interested in the Feature uncertainty estimator To assess the robustness situations where the intention predictor has not been trained of the trained DNN predictor on unseen test scenarios, we for. first analyzed the univariate distribution of each feature xi in the training set X = {x1 , ..., x30 }. Among these features, To date, a variety of physics-based and data-driven ap- the most expressive ones for situational awareness were ex- proaches have been developed to detect spatio-temporal pat- tracted for further analysis. The following features were cho- terns in road users’ behaviour. Specifically, DNNs have been sen: the absolute EV and TVs’ velocity and acceleration, commonly adopted for classification purposes as they can their relative velocity and acceleration, the relative longitu- often outperform other methods for high-dimensional data dinal and lateral distance between the vehicles, their relative (Sakr et al. 2017). heading, and the distance between EGO and the closest lane The DNN we developed for this study is a two-layer fully marker. A desired characteristic of these features concerns connected network trained with gradient-based backpropa- their distribution. We observed that the distribution of these gation. The input x ∈ Rm is mapped into an output y ∈ Rn , data samples can be approximated to multimodal skewed where m = 30 and n = 1 are respectively the input and distributions when the dynamic properties of the vehicles output dimensionality. The two hidden layers contain 512 change incrementally over time. Such distributions can be neurons each and are activated by a ReLU function (Nair approximated by traditional non-parametric density estima- and Hinton 2010). The 30 features used as input represent tions methods, such as the Kernel Density Estimation (KDE) continuous values related to the dynamics of a TV present (Parzen 1962). at a given time t. Some of these variables, such as the TV’s KDE is a technique used to reconstruct the probability speed, acceleration, and the relative lateral and longitudinal density function of given data samples, and it can be adopted distance to EV, have been directly collected in real-life driv- for a single feature (univariate KDE) or to multiple features ing scenarios; other variables are the result of feature engi- (multivariate KDE). In the case of the univariate version, neering techniques to develop expressive variables. The out- this technique consists of fitting a kernel function, such as put, which is a single non-linear sigmoid layer defined over a Gaussian, over each of the k samples in the chosen fea- a domain o ∈ [0, 1], represents the predicted confidence in ture vector. The resulting k densities are then summed and the TV performing a cut-in within the next 3 seconds. The normalized to return the final density estimate of the fea- result of this two-class logistic regression is converted into ture. The main hyperparameter of KDE, the bandwidth h, a binary output by defining a maximum threshold λ on the controls the variance of the kernel function; its value deter- output. mines how smooth the final density estimate is. The opti- mization of this parameter, which is necessary to guarantee and road lanes. An example of relations is “drive-on”, link- that the kernel function fits the data samples correctly, was ing vehicles and lanes. An example of attributes is “distance- optimized using the Maximum Likelihood Cross-Validation from-ego”, which both vehicles and lanes have. (MLCV) approach (Habbema et al. 1974): Given a set of observations in input, the reasoner first ini- tialises the related knowledge graph by creating nodes — corresponding to entities and attributes — and edges — cor- k responding to relations. Subsequently, the rules, defined as 1 X X xj − xi M LCV = log K − log[(k − 1)h] Horn clauses (Horn 1951), augment the graph by creating k i=1 h j6=i the two attributes ”importance” and ”doubt”, linked to en- (1) tities and relations. The importance aims to encode domain where k is number of data samples to fit, K(·) is a Gaussian expert knowledge of the automotive domain. Its purpose is kernel, xj is a data point over the defined domain chosen, to categorise and rank nodes and edges. The doubt, on the and xi is the i-th sample in the feature vector. Once the fi- other hand, can be interpreted as a measure of uncertainty nal density estimate is computed, it is possible to evaluate associated to the nodes and edges. Its purpose is to assign a the likelihood of new samples for each feature; this compu- unique type of weight across the whole graph elements. The tation can be performed synchronously with the observation two features are orthogonal to each other: the schema could of new data in unseen scenarios, as required in our study. For specify that a fully visible entry lane is important indepen- practical purposes, the log-likelihood of the samples is used dently on its doubt value. On the other hand, the cut-in clas- instead of the likelihood. sifier prediction of a target vehicle that drives far away from A main assumption in our investigation is that samples ego, yet in an erratic way (high feature uncertainty), could with low likelihood lead to higher uncertainty on the DNN’s have a high doubt value associated to it and, concurrently, a competence. To quantify this intuition, we define the ratio ri low importance value because of its position. We consider as: three possible importance values, namely low, medium and L(xi | Mi ) high, and 11 doubt values, bounded in the [0, 1] interval, ri = (2) Lmax,i equidistant from each other (0, 0.1, 0.2, . . . 1). where xi is an observation that belongs to the i-th feature. The value ri represents the ratio between the estimated log- likelihood of the new sample xi given the fitted model Mi and the maximum log-likelihood Lmax,i observed for the i- th feature. The maximum log-likelihood was pre-computed and stored during the kernel fitting phase. Finally, we define the feature uncertainty φ as: m 1 X φ=1 − ri (3) m i=0 where m is the dimensionality of the feature space x. This Figure 2: The main schematics of the entities of our ontology quantity is intrinsically related to the frequency of the ob- and their hierarchical organisation. servation in the training set and reflects our previously men- tioned assumption on the DNN’s competence. The subtrac- tion guarantees that the feature uncertainty tends to 1 when An excerpt of the ontology’s entity organisation is de- all the features are out-of-distribution, thus maximizing the picted in Fig.2, whilst a schematic representation of the re- uncertainty in the predictor’s output, and to 0 when the fea- lations is shown in Fig. 3. The entities are organised hi- tures are in-distribution, thus following the same trend as the erarchically and along three main branches: one represent- competence. ing the possible vehicles, one the driving infrastructure, and one the computational models external to the reasoner. The Reasoner The second module of our framework, the Rea- non-ego vehicles are divided into two key-categories: known soner, is in charge of aggregating all observations — namely and unknown. The categorisation is done based on the types target vehicle data, road geometry, lane visibility and the of vehicles present in the dataset the cut-in classifier was output of the Intention Predictor — to have a unified and trained on. For instance, if the dataset contained only pas- quantitative representation of the situation on the road. This senger cars, such entity would be placed under the known view is represented by means of a knowledge graph based branch, whilst other vehicles such as lorries and motorcy- on an underlying ontology and a set of first-order logic infer- cles would be inferred as children of the unknown entity. The ence rules. We will hereafter refer to the ontology-rule pair known/unknown information associated to observed TVs is as the schema. The Reasoner is implemented in Grakn 2 . The used by the rules to assign doubt values to the classifier’s ontology specifies (part of) the automotive domain via enti- output and importance values to the graph nodes. The driv- ties, attributes and relations. Example of entities are vehicles ing infrastructure describes all non-vehicle entities present on the road, such as lanes, ramps and signs, in accordance 2 https://grakn.ai/. Last accessed 18 December 2020. with (Zhao et al. 2015; Czarnecki 2018a,b). Lanes have a Competence Assessment The last module in our framework — Competence Assess- ment — leverages previous and current knowledge graphs to determine whether the EV should maintain an autonomous driving modality or leave the control to the human driver or backup safety system. Competence Assessment follows a remember-forecast-decide processing flow. Remember A time-indexed memory of η graph embed- dings e1 , . . . e η is kept. The embedding of a particular time corresponds to a single value encoding the graph related to that particular time’s road observations. Currently, the embedding proce- Figure 3: The observation-relation implemented in our dure corresponds to a weighted average of all doubts, where schema. Orange items are attributes, blue items are entities, the weights are associated to the relative importance values: has entity-attribute relations are straightforward, whilst the the higher the importance, the higher the weight. entity-entity relation is depicted with a rhombus. Forecast The remembered embeddings represent, albeit in a compact way, reasoned (importance/doubt-aware) situa- tions. Intuitively, the lower an embedding, the more com- petent the autonomous vehicle was in that situation, as low importance and doubt attributes would predominantly exist in the corresponding graph. We therefore define the Compe- tence related to a graph embedding as: ci = 1 − ei , ∀ i ∈ [1, . . . η] (4) Intuitively, cη corresponds to the latest (current) compe- tence value. The ci values are then fed to a regressor to esti- mate ρ future competence values Figure 4: The instantiation of the schema of Figure 2 given fictional observations. A truck drives on a one-way lane. ĉ1 , . . . ĉρ The intention predictor, due to the truck’s high feature un- Currently, the framework implements a linear regressor, certainty, assigns a cut-in probability of 0.6. This rather un- based on the assumption that a short-term linear dependency certain value leads to a high doubt value. Nonetheless, the across observations holds. truck is rather far from EV, as it can be hinted by the high visitibility value of the lane3 . Hence, the importance value Decide The decision whether the driving should remain for the lane is low. The truck has a higher importance value autonomous or handed over to a human is made based on due to its non-likely behaviour. Finally, the doubt value as- the lowest future competence value sociated to the truck-lane relation is low mainly because of the lane, though not extremly low due to the feature uncer- ĉmin = min ĉi , ∀ i ∈ [1, . . . ρ] (5) tainty related to the vehicle and the fact that it is not entirely and by comparing it to an assessing threshold τc sure whether it will perform a cut-in. takeover if ∃ ci < τc decision = (6) AD mode otherwise fundamental attribute: visibility. The rules implement a neg- where AD stands for Autonomous Driving. ative correlation: the lower the visibility, the higher the doubt associated to that lane. In this way, the framework aims to Results and Discussion speculate about the possible existence of hidden entities in We have trained the DNN for intention prediction on 24305 adjacent lanes. Computation entities represent framework instances, divided into 6348 cut-ins and 17957 non cut-ins. models which process raw observations to generate new in- Since the dataset was unbalanced, we weighted the loss formation, in our case the cut-in classifier. In case the models function to compensate for the difference in the observa- are machine learned, the Reasoner infers, via positive cor- tions per class. The algorithm was tested on 7200 instances, relation, doubt values associated to the model outputs de- resulting in a Fscore = 0.98 (accuracy = 0.99). pending on the related in-distribution likelihood values: the We have assessed the competence of the intention predic- lower the likelihood, the lower the doubt. An example of an tor in two cut-in scenarios. The first scenario describes a cut- observation-reasoned knowledge graph is shown in Fig. 4. in by a passenger car on an otherwise empty road (Fig. 5a). Competence Decision Case Potential Risk Reasoner Current Minimum Future 1-φ w/o Reasoner w/ Reasoner cη ĉmin τφ = 0.7 τc = 0.7 1 Low not present - - 0.57 takeover - 2 Low present 0.71 0.84 0.57 - AD mode 3 High not present - - 0.31 takeover - 4 High present 0.15 0.14 0.31 - takeover Table 1: Results on the four different cases tested with and without the Reasoner. The velocity, distance and driving profile of the TV was de- signed not to pose any risk to the EV. In addition, every ve- hicle present in the scenario was known to the knowledge graph. Figure 6: Likelihood ratio r of the features used to compute the feature uncertainty in the cut-in manoeuvre performed by the motorcycle (second scenario) (a) The cut-in scenario is within the operational design expect. In other words, the output of the predictor might be domain. Corresponds to Case 1 and 2 in Table 1. incorrect since it relies on the detection of spatio-temporal patterns in the vehicle’s driving behaviour. The visibility on the road was reduced by the traffic on the first lane; this lane was considered of high importance due to the road entrance. This scenario was designed to pose potential risk to the au- tonomous system, due to the out-of-distribution features and unknown entities. The two scenarios were evaluated at the moment that one of the TVs performs a cut-in. The two settings were first tested without the contribution of the symbolic reasoning inference, shown as Case 1 and Case 3 in Table 1. Since the Reasoner was not in place, the feature uncertainty φ was used as a proxy to relate the Inten- (b) The lane entrance scenario which is outside the tion Predictor to its ability to correctly perform in the given operational design domain. Corresponds to Case 3 and 4 situation. For clarity, the quantity 1 − φ is reported; hence, in Table 1. a score equal to 1 represents full confidence in the Inten- tion Predictor output and can be directly compared with the Figure 5: Snapshots from the two different scenarios as Competence score. The threshold τφ = 0.7 was defined to shown in the CARLA simulator. establish whether it was necessary for the human driver to take over (1 − φ < τφ ), or the vehicle could maintain AD The second scenario (Fig. 5b) describes multiple vehicles mode (1 − φ ≥ τφ ). In both cases, the system decides not to (two trucks and a motorcycle) on the first right-most lane maintain the AD mode, due to the high feature uncertainty, and a truck approaching from the entrance lane. The EV is even if the scenario was safe. The absence of the Reasoner in the left lane and cannot see the approaching truck as it is causes a lack of situational awareness: the speed of the TV occluded by the vehicles on its right. The features in this sce- was lower than the average velocities collected in the train- nario are out-of-distribution, as only two features lie within ing set —– thus making the velocity an OOD feature —– the training set domain (Fig. 6). Moreover, the scenario in- but the large distance between the EV and TV is not used by cludes an unknown entity in the form of a motorcycle that the Intention Predictor to reduce the importance attributed to was not part of the training set. The rationale for this is that this quantity. a type of vehicle not present in the training set might dis- Results of the competence assessment with the Reasoner play a driving profile that the intention prediction does not are shown as Case 2 and Case 4 in Table 1. The Current Conclusions and future work We have presented a hybrid-AI framework for the safe appli- cation of AI functions in automated driving. The framework aggregates road observations and the results of data-driven AI computations — such as a DNN for intention prediction in our case study — into a knowledge graph. The graph is built by means of an ontology, which specifies the entities that can exist on the road, and a set of first-order logic in- ference rules, the latter aiming to estimate the severity level of the road situations. The knowledge graph is then com- pressed into a single value (embedding), stored in a work- ing memory, and used to forecast imminent severity levels. A final decision maker modules establishes whether the ve- hicle should continue driving autonomously or whether the Figure 7: Future estimation of the competence correspond- steering wheel should be handed over to a human driver or ing to Case 4. backup safety system. The knowledge graphs encode the situational awareness capabilities of the vehicle, whilst the forecasting and decision making processes realise the vehi- cle’s competence assessment capability. We have shown that the reasoner correctly assigns high Competence column refers to the competence cη as inferred competence to the Intention Predictor in a situation in which by the first-order rules of the knowledge graph at the cur- some features of the DNN are uncertain, but the TV poses no rent moment. In the event that more than one vehicle was safety threat to the EV due to the large distance and high lane predicted to perform a cut-in, the reported value is the low- visibility. The added value of the Reasoner is also shown in est cη estimated among all the vehicles. The Minimum Fu- a situation that contains a vehicle (in this case a motorcy- ture Competence ĉmin was computed converting the future cle) that has never been seen before by the predictor, in an doubt-embedding extrapolated by the forecaster (Fig. 7) as environment with important entities that require attention (in described in Eq. 4 and Eq. 5 (ρ = 2). Thus, the future compe- this case an entrance lane). The predictor output is unreliable tence was calculated for a prediction horizon of 2 seconds. in this case, potentially leading to erratic and dangerous be- The decision whether the vehicle should remain autonomous haviour of the EV if taken at face value. Here, the Reasoner was performed by a thresholding function (τc = 0.7) on the correctly assigns a low competence to the predictor based on future competence, as detailed in Eq. 6. the presence of the motorcycle (high doubt) and the presence of the entrance lane (high importance). We found that ĉmin evaluated for Case 4 was six times These results provide a solid starting point for future in- lower than in Case 2. In Case 2 the threshold for takeover vestigations on situational awareness. In future work, we was never reached and the system did not hand over the au- will extend situational awareness to the entire automated tonomous control. Due to the large distance of the TV and vehicle instead of a single component. In addition, the rea- the high visibility of the lanes, the Reasoner determined that soner will aggregate more types of observations, for exam- the vehicle could stay in AD mode despite the low likeli- ple those regarding road works or weather conditions, and hood of the input data expressed by the average feature un- its first-order logic inference rules could be parameterised certainty. In contrast, the system decided that a takeover of via data-driven approaches instead of solely relying on do- the AD mode was necessary in Case 4, because the numer- main knowledge. Combining the DNN with the knowledge ous sources of risk in this setting caused a low future compe- graph into a graph neural network will result in a better esti- tence. This is expressed by a competence value that is sub- mation of competence, especially further into future. Graph stantially lower than solely based on the feature uncertainty. neural networks might also aid in enhanced explainability Using the likelihood of the input data expressed by the fea- on why takeover is needed. ture uncertainty alone is not sufficient to correctly assess the While limited to a single function in a simulation envi- confidence in the Intention Predictor output. This is evident ronment, our work shows that a hybrid-AI approach to situ- by the results of Case 1 and Case 3 (Table 1), where the ab- ational awareness is essential for the safe application of AI sence of the Reasoner fails to correctly assess the situation. systems in automated driving. In addition, the competence returned by the Reasoner shows a larger contrast between these two extreme cases than the method based on the feature uncertainty alone. References Bansal, M.; Krizhevsky, A.; and Ogale, A. 2019. Chauffeur- We found that the linear regression used to assess the fu- Net: Learning to Drive by Imitating the Best and Synthesiz- ture competence was strongly affected by small variations ing the Worst. In Robotics: Science and Systems. in the history of doubt-embeddings ρ. Thus, we do not con- sider that a prediction horizon higher than 2 seconds would Brochu, E.; Cora, V. M.; and De Freitas, N. 2010. A tuto- be reliable enough to support the decision making process. rial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical re- Hendrycks, D.; and Gimpel, K. 2016. A baseline for detect- inforcement learning. arXiv preprint arXiv:1012.2599 . ing misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 . Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; and Schiele, B. Hendrycks, D.; and Gimpel, K. 2017. A Baseline for De- 2016. The Cityscapes Dataset for Semantic Urban Scene tecting Misclassified and Out-of-Distribution Examples in Understanding. In Proc. of the IEEE Conference on Com- Neural Networks. Proceedings of International Conference puter Vision and Pattern Recognition (CVPR). on Learning Representations . Czarnecki, K. 2018a. Operational world model ontology for Horn, A. 1951. On sentences which are true of direct unions automated driving systems–part 1: Road structure. Waterloo of algebras. The Journal of Symbolic Logic 16(1): 14–21. Intelligent Systems Engineering Lab (WISE) Report, Univer- Koopman, P.; and Wagner, M. 2016. Challenges in Au- sity of Waterloo . tonomous Vehicle Testing and Validation. SAE International Czarnecki, K. 2018b. Operational world model ontology Journal of Transportation Safety 4(1): 15–24. for automated driving systems–part 2: Road users, animals, Kristiadi, A.; Hein, M.; and Hennig, P. 2020. Being other obstacles, and environmental conditions,”. Waterloo Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Intelligent Systems Engineering Lab (WISE) Report, Univer- Networks. In Daumé III, H.; and Singh, A., eds., Pro- sity of Waterloo . ceedings of the 37th International Conference on Machine de Gelder, E.; Paardekooper, J.-P.; den Camp Olaf, O.; and Learning, volume 119 of Proceedings of Machine Learning De Schutter, B. 2019. Safety assessment of automated ve- Research, 5436–5446. Virtual: PMLR. hicles: how to determine whether we have collected enough LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. field data? Traffic Injury Prevention 20(S1): S162–S170. Nature 521(7553): 436–444. Deo, N.; and Trivedi, M. M. 2018. Multi-Modal Trajectory Liang, S.; Li, Y.; and Srikant, R. 2017. Enhancing the reli- Prediction of Surrounding Vehicles with Maneuver based ability of out-of-distribution image detection in neural net- LSTMs. In IEEE Intelligent Vehicles Symposium, Proceed- works. arXiv preprint arXiv:1706.02690 . ings, 1179–1184. University of California, San Diego, San McAllister, R.; Gal, Y.; Kendall, A.; van der Wilk, M.; Shah, Diego, United States, IEEE. A.; Cipolla, R.; and Weller, A. 2017. Concrete problems for Dinh, L.; Sohl-Dickstein, J.; and Bengio, S. 2016. Density autonomous vehicle safety: Advantages of Bayesian deep estimation using real nvp. arXiv preprint arXiv:1605.08803 learning. In IJCAI International Joint Conference on Ar- . tificial Intelligence, 4745–4753. University of Cambridge, Cambridge, United Kingdom. Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; and Koltun, V. 2017. CARLA: An open urban driving simula- Meyer-Vitali, A.; Bakker, R.; van Bekkum, M.; Boer, M. d.; tor. arXiv preprint arXiv:1711.03938 . Burghouts, G.; Diggelen, J. v.; Dijk, J.; Grappiolo, C.; Gre- eff, J. d.; Huizing, A.; et al. 2019. Hybrid ai: white paper. Endsley, M. R. 1995. Toward a theory of situation awareness Technical report, TNO. in dynamic systems. Human Factors 37(1): 32–64. Nair, V.; and Hinton, G. E. 2010. Rectified linear units im- Endsley, M. R. 2020. Situation Awareness in Driving. In prove restricted boltzmann machines. In ICML. Fisher, D.; Horrey, W.; Lee, J.; and Regan, M., eds., Hand- Okuda, R.; Kajiwara, Y.; and Terashima, K. 2014. A survey book of Human Factors for Automated, Connected, and In- of technical trend of ADAS and autonomous driving. In Pro- telligent Vehicles, chapter 7. CRC Press. ceedings of Technical Program - 2014 International Sympo- Gal, Y.; and Ghahramani, Z. 2016. Dropout as a Bayesian sium on VLSI Technology, Systems and Application, VLSI- approximation: Representing model uncertainty in deep TSA 2014. Renesas Electronics Corporation, Tokyo, Japan. learning. In 33rd International Conference on Machine Paardekooper, J.-P.; van Montfort, S.; Manders, J.; Goos, J.; Learning, ICML 2016, 1651–1660. University of Cam- de Gelder, E.; Op den Camp, O.; Bracquemond, A.; and Thi- bridge, Cambridge, United Kingdom. olon, G. 2019. Automatic Detection of Critical Scenarios in Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. a Public Dataset of 6000 km of Public-Road Driving. In On calibration of modern neural networks. arXiv preprint Enhanced Safety of Vehicles, 1–8. arXiv:1706.04599 . Parzen, E. 1962. On estimation of a probability density func- Habbema, J.; JDF, H.; Van den Broek, K.; et al. 1974. A tion and mode. The annals of mathematical statistics 33(3): stepwise discriminant analysis program using density esti- 1065–1076. mation. . Ren, J.; Liu, P. J.; Fertig, E.; Snoek, J.; Poplin, R.; Depristo, Hein, M.; Andriushchenko, M.; and Bitterwolf, J. 2019. M.; Dillon, J.; and Lakshminarayanan, B. 2019. Likelihood Why ReLU networks yield high-confidence predictions far ratios for out-of-distribution detection. In Advances in Neu- away from the training data and how to mitigate the prob- ral Information Processing Systems, 14707–14718. lem. In Proceedings of the IEEE Conference on Computer Sakr, S.; Elshawi, R.; Ahmed, A. M.; Qureshi, W. T.; Vision and Pattern Recognition, 41–50. Brawner, C. A.; Keteyian, S. J.; Blaha, M. J.; and Al-Mallah, M. H. 2017. Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project. BMC medical informatics and decision making 17(1): 174. Thill, S.; Hemeren, P. E.; and Nilsson, M. 2014. The appar- ent intelligence of a system as a factor in situation aware- ness. In 2014 IEEE International Inter-Disciplinary Con- ference on Cognitive Methods in Situation Awareness and Decision Support, CogSIMA 2014, 52–58. RISE Viktoria, Gothenburg, Sweden, IEEE. van Harmelen, F.; and ten Teije, A. 2019. A Boxology of De- sign Patterns for Hybrid Learning and Reasoning Systems. arXiv.org . Vellinga, N. E. 2019. Automated driving and its challenges to international traffic law: which way to go? Law, Innova- tion and Technology 11(2): 257–278. WHO. 2018. Global status report on road safety 2018. Zhao, L.; Ichise, R.; Yoshikawa, T.; Naito, T.; Kakinami, T.; and Sasaki, Y. 2015. Ontology-based decision making on uncontrolled intersections and narrow roads. In 2015 IEEE intelligent vehicles symposium (IV), 83–88. IEEE.