=Paper= {{Paper |id=Vol-2808/Paper_37 |storemode=property |title=A Hybrid-AI Approach for Competence Assessment of Automated Driving functions |pdfUrl=https://ceur-ws.org/Vol-2808/Paper_37.pdf |volume=Vol-2808 |authors=Jan-Pieter Paardekooper,Mauro Comi,Corrado Grappiolo,Ron Snijders,Willeke van Vught,Rutger Beekelaar |dblpUrl=https://dblp.org/rec/conf/aaai/PaardekooperCGS21 }} ==A Hybrid-AI Approach for Competence Assessment of Automated Driving functions== https://ceur-ws.org/Vol-2808/Paper_37.pdf
                         A Hybrid-AI Approach for Competence Assessment
                                 of Automated Driving Functions
                                                            ∗
                      Jan-Pieter Paardekooper1,2 , Mauro Comi1 * , Corrado Grappiolo3 * ,
                           Ron Snijders4 * , Willeke van Vught5 , Rutger Beekelaar 1
                                        1
                                          TNO - Integrated Vehicle Safety, Helmond, The Netherlands
           2
               Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
                                               3
                                                 TNO - Data Science, Den Haag, The Netherlands
                                    4
                                      TNO - Monitoring & Control Services, Groningen, The Netherlands
                                 5
                                   TNO - Perceptual and Cognitive Systems, Soesterberg, The Netherlands
                                                         jan-pieter.paardekooper@tno.nl
                               Abstract                                     Automated driving is one of the most appealing applica-
                                                                            tions of artificial intelligence in an open world. It holds the
   An increasing number of tasks is being taken over from the               promise of reducing the number of casualties (1.35 million
   human driver as automated driving technology is developed.
   Accidents have been reported in situations where the auto-
                                                                            yearly (WHO 2018)), increasing the comfort of travel by
   mated driving technology was not able to function according              taking over the driving task from humans, and bringing mo-
   to specifications. As data-driven Artificial Intelligence (AI)           bility to those unable to drive. While fleets of fully auto-
   systems are becoming more ubiquitous in automated vehi-                  mated vehicles that can run unrestrained in an open world
   cles, it is increasingly important to make AI systems situa-             are still far away (Koopman and Wagner 2016), many vehi-
   tional aware. One aspect of this is determining whether these            cles are already equipped with Advanced Driver Assistence
   systems are competent in the current and immediate traffic               Systems (Okuda, Kajiwara, and Terashima 2014), like Lane
   situation, or that they should hand over control to the driver           Keep Assist and Adaptive Cruise Control. According to The
   or safety system.                                                        Geneva Convention on road traffic of 1949 and the Vienna
   We aim to increase the safety of automated driving functions             Convention on road traffic 1968, on which many countries
   by combining data-driven AI systems with knowledge-based                 base their national traffic laws, a human driver has to be
   AI into a hybrid-AI system that can reason about competence              present in the vehicle (Vellinga 2019). Artificial Intelligence
   in the traffic state now and in the next few seconds.
                                                                            (AI) opens up the possibility of automation in increasingly
   We showcase our method using an intention prediction algo-               complex situations, but also makes it increasingly complex
   rithm that is based on a deep neural network and trained with
                                                                            for human drivers to understand the limitations of the system
   real-world data of traffic participants performing a cut-in ma-
   neuver in front of the vehicle. This is combined with a uni-             (Thill, Hemeren, and Nilsson 2014).
   fied, quantitative representation of the situation on the road              The tremendous success of Deep Neural Networks
   represented by an ontology-based knowledge graph and first-              (DNNs) in the recent years (LeCun, Bengio, and Hinton
   order logic inference rules, that takes as input both the ob-            2015) has lead to many applications in automated driving,
   servations of the sensors of the automated vehicle as well as            ranging from perception (Cordts et al. 2016) and trajec-
   the output of the intention prediction algorithm. The knowl-             tory prediction (Deo and Trivedi 2018) to decision mak-
   edge graph utilises the two features of importance, based on             ing (Bansal, Krizhevsky, and Ogale 2019). The strength
   domain knowledge, and doubt, based on the observations and               of DNNs is the capability to deal with complex prob-
   information about the dataset, to reason about the competence
   of the intention prediction algorithm.
                                                                            lems, but one important drawback for their application in
                                                                            safety-critical systems is how they deal with new situa-
   We have applied the competence assessment of the intention
   prediction algorithm to two cut-in scenarios: a traffic situation
                                                                            tions (Hendrycks and Gimpel 2017; McAllister et al. 2017).
   that is well within the operational design domain described              DNNs learn a (possibly very complex) mapping from input
   by the training data set, and a traffic situation that includes          data to output, but they lack an understanding of the deeper
   an unknown entity in the form of a motorcycle that was not               causes of this output. Hence, these algorithms cannot reason
   part of the training set. In the latter case the knowledge graph         about whether they are competent to produce reliable out-
   correctly reasoned that the intention prediction algorithm was           put based on the input data. To safely apply DNNs (or any
   incapable of producing a reliable prediction.                            learning algorithm) in automated vehicles, we need to add
   This shows that hybrid AI for situational awareness holds                situational awareness: the comprehension whether the sys-
   great promise to reduce the risk of automated driving func-              tem understands the current environment and is capable of
   tions in an open world containing unknowns.                              producing reliable output.
    ∗
      Authors contributed equally                                              In this work we describe a hybrid-AI approach (van
Copyright c 2021for this paper by its authors. Use permitted un-            Harmelen and ten Teije 2019; Meyer-Vitali et al. 2019) to
der Creative Commons License Attribution 4.0 International (CC              situational awareness. In this approach, a data-driven AI is
BY 4.0).                                                                    coupled to a knowledge graph with reasoning capabilities.
The current application is a DNN that predicts the intention      over to the human driver or backup safety system. The
of other road users to merge into the lane of the ego vehicle     framework’s internal structure is divided into three modules:
(cut-in maneuver). This is combined with a knowledge              Intention Predictor, Reasoner and Competence Assessment.
graph of the traffic state that relates the current situation        Raw observations related to each target vehicle, such as
to what the predictor has learned from the training data.         their speed and acceleration, are fed to the Intention Pre-
The knowledge graph reasoner returns an estimate on               dictor. This module processes the information via two sub-
the reliability of the predictor, which it forecasts into the     modules. The first one is a deep neural network trained to
immediate future (2 seconds ahead) to be able to warn the         output (predict) whether a given target vehicle will perform a
driver or safety system in advance that takeover of control       cut-in maneuver (Cut-in Classifier). The second sub-module
is imminent in the near future.                                   is a Feature Uncertainty Estimator. It holds univariate den-
                                                                  sities of the classifier’s training set input features and pro-
                                                                  vides information on the in-distribution likelihood of the net-
                      Related work                                work’s input data.
                                                                     The Intention Predictor’s output, the observations related
In the automotive domain, situation awareness (Endsley            to road geometry (e.g. presence of entry lanes) and lane vis-
1995) is a term often used to describe the readiness of human     ibility are fed to the framework’s second module, the Rea-
drivers to make good decisions (Endsley 2020). It is based        soner. The reasoner — characterised by an ontology and
on perception of the environment, comprehension of the cur-       first-order logic rules — fuses the input observations with
rent situation, and projection into future. Here we extend sit-   domain knowledge (encoded in the ontology and in the
uation awareness to an AI system in the vehicle, where we         rules) into a knowledge graph. The graph realises the frame-
add the assessment of competence in the current situation to      work’s situational awareness, as it holds a unified represen-
the comprehension of the current situation.                       tation of the current situation and is aware of what entities
   Machine learning methods tend to underperform when the         are important and doubtful.
distributions of the test dataset and training dataset differ        The graph is then fed to the last module of our frame-
significantly. Throughout the paper, we refer to data samples     work, Competence Assessment. The module first organ-
drawn from the training set distribution as in-distribution       ises its present and past situation-aware knowledge. Then
(ID), while samples drawn from a different distribution as        it projects such knowledge into the future. Finally, it decides
out-of-distribution (OOD). DNNs often attribute high con-         whether such forecast is outside the autonomous system’s
fidence to the classification or prediction of OOD sam-           competence level.
ples (Hein, Andriushchenko, and Bitterwolf 2019); this be-           In the next part, we will describe each module in more
haviour, which is especially valid for softmax classifiers        detail.
(Hendrycks and Gimpel 2016), can have dramatic conse-
quences in applications where model reliability and safety        Simulation Environment
are priorities. Various papers attempt to increase models’ ro-
bustness by calibrating the predicting probability estimates      For simulation we use CARLA, an Open Source simulator
(Guo et al. 2017) or by injecting small perturbations to the      which aims to support the development, training, and val-
input data (Liang, Li, and Srikant 2017). Density estimation      idation of autonomous driving systems (Dosovitskiy et al.
methods are also leveraged to detect OOD observations: the        2017). The scenarios are defined using OpenSCENARIO1 ,
likelihood over the in-distribution sample space can be ap-       an open format used to describe synchronized maneuvers of
proximated (Dinh, Sohl-Dickstein, and Bengio 2016) (Ren           vehicles.
et al. 2019) and used to compute the likelihood of new obser-        For a given location of the Ego Vehicle (EV), we use the
vations, thus detecting those samples that lie in low-density     API of CARLA to extract a world model of the road situ-
regions.                                                          ation. This world model includes the number of lanes, the
   Another way of dealing with OOD observations is to have        presence of an entrance lane and all Target Vehicles (TVs).
the DNN output more accurate certainty values. Several ap-        For each TV, the velocity, acceleration, angle and position
proaches have been described in the literature, ranging from      relative towards EV is determined. The lane visibility v is
Monte Carlo Dropout (Gal and Ghahramani 2016) to adding           calculated as v = ds , where 0 ≤ v ≤ 1. Here, d is the dis-
a Gaussian distribution over the weights in the last layer of     tance of the closest TV on that lane and s the scope of EV in
a ReLU network (Kristiadi, Hein, and Hennig 2020). It has         meters (s := 50m by default).
been shown that Bayesian deep learning is important for the
safety of automated vehicles (McAllister et al. 2017).
                                                                  Intention predictor
                                                                  Predicting the intention of a vehicle to perform a cut-in can
                          Method                                  be framed as a binary classification task; the two labels to
                                                                  classify are ”cut-in” and ”not cut-in”. A data point labeled as
To assess the competence of data-driven-AI automated-             ”cut-in” refers to the collected information at timestep t for a
driving capabilities we propose a pipelined framework, de-        TV that performs a cut-in between t and t + 2s. Since more
picted in Figure 1. The framework receives as input the ob-       than one vehicle can be present at the same time, multiple
servations of the current road situation and, via a pipelined     data points can be collected at t.
information flow, outputs the decision on whether the driv-
                                                                     1
ing mode should remain autonomous or should be handed                    https://www.asam.net/standards/detail/openscenario/
Figure 1: The overall architecture of our situation-aware competence assessment framework. The dotted arrow from Real-life
Datasets to Intention Predictor is not part of the online information flow.


   The dataset consists of 24305 data points, divided into           During training, the cross-entropy logarithmic loss is
6348 instances labeled as ”cut-in” and 17957 ones la-             weighted for the two different classes to take into account
beled as ”not cut-in”, drawn from the StreetWise database         their imbalance in the dataset. The threshold λ, the learn-
(Paardekooper et al. 2019). While completeness measures           ing rate and the number of neurons per layer are fine-tuned
of this database have been developed (de Gelder et al. 2019),     using a Bayesian approach for global optimization (Brochu,
the dataset used for training the intention predictor does not    Cora, and De Freitas 2010). To reduce overfitting, dropout
cover the entire spectrum of cut-ins that are to be expected      and early stopping are used.
in real-life traffic. However, for the purpose of this work a
complete dataset is not essential, as we are interested in the    Feature uncertainty estimator To assess the robustness
situations where the intention predictor has not been trained     of the trained DNN predictor on unseen test scenarios, we
for.                                                              first analyzed the univariate distribution of each feature xi
                                                                  in the training set X = {x1 , ..., x30 }. Among these features,
   To date, a variety of physics-based and data-driven ap-        the most expressive ones for situational awareness were ex-
proaches have been developed to detect spatio-temporal pat-       tracted for further analysis. The following features were cho-
terns in road users’ behaviour. Specifically, DNNs have been      sen: the absolute EV and TVs’ velocity and acceleration,
commonly adopted for classification purposes as they can          their relative velocity and acceleration, the relative longitu-
often outperform other methods for high-dimensional data          dinal and lateral distance between the vehicles, their relative
(Sakr et al. 2017).                                               heading, and the distance between EGO and the closest lane
   The DNN we developed for this study is a two-layer fully       marker. A desired characteristic of these features concerns
connected network trained with gradient-based backpropa-          their distribution. We observed that the distribution of these
gation. The input x ∈ Rm is mapped into an output y ∈ Rn ,        data samples can be approximated to multimodal skewed
where m = 30 and n = 1 are respectively the input and             distributions when the dynamic properties of the vehicles
output dimensionality. The two hidden layers contain 512          change incrementally over time. Such distributions can be
neurons each and are activated by a ReLU function (Nair           approximated by traditional non-parametric density estima-
and Hinton 2010). The 30 features used as input represent         tions methods, such as the Kernel Density Estimation (KDE)
continuous values related to the dynamics of a TV present         (Parzen 1962).
at a given time t. Some of these variables, such as the TV’s         KDE is a technique used to reconstruct the probability
speed, acceleration, and the relative lateral and longitudinal    density function of given data samples, and it can be adopted
distance to EV, have been directly collected in real-life driv-   for a single feature (univariate KDE) or to multiple features
ing scenarios; other variables are the result of feature engi-    (multivariate KDE). In the case of the univariate version,
neering techniques to develop expressive variables. The out-      this technique consists of fitting a kernel function, such as
put, which is a single non-linear sigmoid layer defined over      a Gaussian, over each of the k samples in the chosen fea-
a domain o ∈ [0, 1], represents the predicted confidence in       ture vector. The resulting k densities are then summed and
the TV performing a cut-in within the next 3 seconds. The         normalized to return the final density estimate of the fea-
result of this two-class logistic regression is converted into    ture. The main hyperparameter of KDE, the bandwidth h,
a binary output by defining a maximum threshold λ on the          controls the variance of the kernel function; its value deter-
output.                                                           mines how smooth the final density estimate is. The opti-
mization of this parameter, which is necessary to guarantee        and road lanes. An example of relations is “drive-on”, link-
that the kernel function fits the data samples correctly, was      ing vehicles and lanes. An example of attributes is “distance-
optimized using the Maximum Likelihood Cross-Validation            from-ego”, which both vehicles and lanes have.
(MLCV) approach (Habbema et al. 1974):                                 Given a set of observations in input, the reasoner first ini-
                                                                   tialises the related knowledge graph by creating nodes —
                                                                 corresponding to entities and attributes — and edges — cor-
           k
                                                                   responding to relations. Subsequently, the rules, defined as
                              
        1 X X          xj − xi 
M LCV =       log   K             − log[(k − 1)h]                  Horn clauses (Horn 1951), augment the graph by creating
        k i=1              h
                           j6=i                                    the two attributes ”importance” and ”doubt”, linked to en-
                                                           (1)     tities and relations. The importance aims to encode domain
where k is number of data samples to fit, K(·) is a Gaussian       expert knowledge of the automotive domain. Its purpose is
kernel, xj is a data point over the defined domain chosen,         to categorise and rank nodes and edges. The doubt, on the
and xi is the i-th sample in the feature vector. Once the fi-      other hand, can be interpreted as a measure of uncertainty
nal density estimate is computed, it is possible to evaluate       associated to the nodes and edges. Its purpose is to assign a
the likelihood of new samples for each feature; this compu-        unique type of weight across the whole graph elements. The
tation can be performed synchronously with the observation         two features are orthogonal to each other: the schema could
of new data in unseen scenarios, as required in our study. For     specify that a fully visible entry lane is important indepen-
practical purposes, the log-likelihood of the samples is used      dently on its doubt value. On the other hand, the cut-in clas-
instead of the likelihood.                                         sifier prediction of a target vehicle that drives far away from
   A main assumption in our investigation is that samples          ego, yet in an erratic way (high feature uncertainty), could
with low likelihood lead to higher uncertainty on the DNN’s        have a high doubt value associated to it and, concurrently, a
competence. To quantify this intuition, we define the ratio ri     low importance value because of its position. We consider
as:                                                                three possible importance values, namely low, medium and
                            L(xi | Mi )                            high, and 11 doubt values, bounded in the [0, 1] interval,
                       ri =                                (2)
                              Lmax,i                               equidistant from each other (0, 0.1, 0.2, . . . 1).
where xi is an observation that belongs to the i-th feature.
The value ri represents the ratio between the estimated log-
likelihood of the new sample xi given the fitted model Mi
and the maximum log-likelihood Lmax,i observed for the i-
th feature. The maximum log-likelihood was pre-computed
and stored during the kernel fitting phase. Finally, we define
the feature uncertainty φ as:
                                       m
                                   1 X
                        φ=1 −            ri                 (3)
                                   m i=0

where m is the dimensionality of the feature space x. This
                                                                   Figure 2: The main schematics of the entities of our ontology
quantity is intrinsically related to the frequency of the ob-
                                                                   and their hierarchical organisation.
servation in the training set and reflects our previously men-
tioned assumption on the DNN’s competence. The subtrac-
tion guarantees that the feature uncertainty tends to 1 when          An excerpt of the ontology’s entity organisation is de-
all the features are out-of-distribution, thus maximizing the      picted in Fig.2, whilst a schematic representation of the re-
uncertainty in the predictor’s output, and to 0 when the fea-      lations is shown in Fig. 3. The entities are organised hi-
tures are in-distribution, thus following the same trend as the    erarchically and along three main branches: one represent-
competence.                                                        ing the possible vehicles, one the driving infrastructure, and
                                                                   one the computational models external to the reasoner. The
Reasoner The second module of our framework, the Rea-              non-ego vehicles are divided into two key-categories: known
soner, is in charge of aggregating all observations — namely       and unknown. The categorisation is done based on the types
target vehicle data, road geometry, lane visibility and the        of vehicles present in the dataset the cut-in classifier was
output of the Intention Predictor — to have a unified and          trained on. For instance, if the dataset contained only pas-
quantitative representation of the situation on the road. This     senger cars, such entity would be placed under the known
view is represented by means of a knowledge graph based            branch, whilst other vehicles such as lorries and motorcy-
on an underlying ontology and a set of first-order logic infer-    cles would be inferred as children of the unknown entity. The
ence rules. We will hereafter refer to the ontology-rule pair      known/unknown information associated to observed TVs is
as the schema. The Reasoner is implemented in Grakn 2 . The        used by the rules to assign doubt values to the classifier’s
ontology specifies (part of) the automotive domain via enti-       output and importance values to the graph nodes. The driv-
ties, attributes and relations. Example of entities are vehicles   ing infrastructure describes all non-vehicle entities present
                                                                   on the road, such as lanes, ramps and signs, in accordance
   2
       https://grakn.ai/. Last accessed 18 December 2020.          with (Zhao et al. 2015; Czarnecki 2018a,b). Lanes have a
                                                                     Competence Assessment
                                                                     The last module in our framework — Competence Assess-
                                                                     ment — leverages previous and current knowledge graphs to
                                                                     determine whether the EV should maintain an autonomous
                                                                     driving modality or leave the control to the human driver
                                                                     or backup safety system. Competence Assessment follows a
                                                                     remember-forecast-decide processing flow.
                                                                     Remember A time-indexed memory of η graph embed-
                                                                     dings

                                                                                               e1 , . . . e η
                                                                     is kept. The embedding of a particular time corresponds to
                                                                     a single value encoding the graph related to that particular
                                                                     time’s road observations. Currently, the embedding proce-
Figure 3: The observation-relation implemented in our                dure corresponds to a weighted average of all doubts, where
schema. Orange items are attributes, blue items are entities,        the weights are associated to the relative importance values:
has entity-attribute relations are straightforward, whilst the       the higher the importance, the higher the weight.
entity-entity relation is depicted with a rhombus.                   Forecast The remembered embeddings represent, albeit in
                                                                     a compact way, reasoned (importance/doubt-aware) situa-
                                                                     tions. Intuitively, the lower an embedding, the more com-
                                                                     petent the autonomous vehicle was in that situation, as low
                                                                     importance and doubt attributes would predominantly exist
                                                                     in the corresponding graph. We therefore define the Compe-
                                                                     tence related to a graph embedding as:

                                                                                     ci = 1 − ei , ∀ i ∈ [1, . . . η]           (4)
                                                                        Intuitively, cη corresponds to the latest (current) compe-
                                                                     tence value. The ci values are then fed to a regressor to esti-
                                                                     mate ρ future competence values
Figure 4: The instantiation of the schema of Figure 2 given
fictional observations. A truck drives on a one-way lane.                                      ĉ1 , . . . ĉρ
The intention predictor, due to the truck’s high feature un-           Currently, the framework implements a linear regressor,
certainty, assigns a cut-in probability of 0.6. This rather un-      based on the assumption that a short-term linear dependency
certain value leads to a high doubt value. Nonetheless, the          across observations holds.
truck is rather far from EV, as it can be hinted by the high
visitibility value of the lane3 . Hence, the importance value        Decide The decision whether the driving should remain
for the lane is low. The truck has a higher importance value         autonomous or handed over to a human is made based on
due to its non-likely behaviour. Finally, the doubt value as-        the lowest future competence value
sociated to the truck-lane relation is low mainly because of
the lane, though not extremly low due to the feature uncer-                     ĉmin = min ĉi , ∀ i ∈ [1, . . . ρ]             (5)
tainty related to the vehicle and the fact that it is not entirely   and by comparing it to an assessing threshold τc
sure whether it will perform a cut-in.                                                    
                                                                                             takeover if ∃ ci < τc
                                                                            decision =                                           (6)
                                                                                             AD mode       otherwise

fundamental attribute: visibility. The rules implement a neg-        where AD stands for Autonomous Driving.
ative correlation: the lower the visibility, the higher the doubt
associated to that lane. In this way, the framework aims to                          Results and Discussion
speculate about the possible existence of hidden entities in         We have trained the DNN for intention prediction on 24305
adjacent lanes. Computation entities represent framework             instances, divided into 6348 cut-ins and 17957 non cut-ins.
models which process raw observations to generate new in-            Since the dataset was unbalanced, we weighted the loss
formation, in our case the cut-in classifier. In case the models     function to compensate for the difference in the observa-
are machine learned, the Reasoner infers, via positive cor-          tions per class. The algorithm was tested on 7200 instances,
relation, doubt values associated to the model outputs de-           resulting in a Fscore = 0.98 (accuracy = 0.99).
pending on the related in-distribution likelihood values: the           We have assessed the competence of the intention predic-
lower the likelihood, the lower the doubt. An example of an          tor in two cut-in scenarios. The first scenario describes a cut-
observation-reasoned knowledge graph is shown in Fig. 4.             in by a passenger car on an otherwise empty road (Fig. 5a).
                                                           Competence                                Decision
       Case     Potential Risk       Reasoner       Current Minimum Future         1-φ     w/o Reasoner w/ Reasoner
                                                      cη          ĉmin                      τφ = 0.7         τc = 0.7
         1           Low            not present        -             -             0.57      takeover             -
         2           Low              present        0.71          0.84            0.57          -           AD mode
         3           High           not present        -             -             0.31      takeover             -
         4           High             present        0.15          0.14            0.31          -           takeover

                        Table 1: Results on the four different cases tested with and without the Reasoner.


The velocity, distance and driving profile of the TV was de-
signed not to pose any risk to the EV. In addition, every ve-
hicle present in the scenario was known to the knowledge
graph.




                                                                    Figure 6: Likelihood ratio r of the features used to compute
                                                                    the feature uncertainty in the cut-in manoeuvre performed
                                                                    by the motorcycle (second scenario)


     (a) The cut-in scenario is within the operational design       expect. In other words, the output of the predictor might be
        domain. Corresponds to Case 1 and 2 in Table 1.             incorrect since it relies on the detection of spatio-temporal
                                                                    patterns in the vehicle’s driving behaviour. The visibility on
                                                                    the road was reduced by the traffic on the first lane; this lane
                                                                    was considered of high importance due to the road entrance.
                                                                    This scenario was designed to pose potential risk to the au-
                                                                    tonomous system, due to the out-of-distribution features and
                                                                    unknown entities. The two scenarios were evaluated at the
                                                                    moment that one of the TVs performs a cut-in.
                                                                       The two settings were first tested without the contribution
                                                                    of the symbolic reasoning inference, shown as Case 1 and
                                                                    Case 3 in Table 1. Since the Reasoner was not in place, the
                                                                    feature uncertainty φ was used as a proxy to relate the Inten-
      (b) The lane entrance scenario which is outside the           tion Predictor to its ability to correctly perform in the given
    operational design domain. Corresponds to Case 3 and 4          situation. For clarity, the quantity 1 − φ is reported; hence,
                          in Table 1.                               a score equal to 1 represents full confidence in the Inten-
                                                                    tion Predictor output and can be directly compared with the
Figure 5: Snapshots from the two different scenarios as             Competence score. The threshold τφ = 0.7 was defined to
shown in the CARLA simulator.                                       establish whether it was necessary for the human driver to
                                                                    take over (1 − φ < τφ ), or the vehicle could maintain AD
   The second scenario (Fig. 5b) describes multiple vehicles        mode (1 − φ ≥ τφ ). In both cases, the system decides not to
(two trucks and a motorcycle) on the first right-most lane          maintain the AD mode, due to the high feature uncertainty,
and a truck approaching from the entrance lane. The EV is           even if the scenario was safe. The absence of the Reasoner
in the left lane and cannot see the approaching truck as it is      causes a lack of situational awareness: the speed of the TV
occluded by the vehicles on its right. The features in this sce-    was lower than the average velocities collected in the train-
nario are out-of-distribution, as only two features lie within      ing set —– thus making the velocity an OOD feature —–
the training set domain (Fig. 6). Moreover, the scenario in-        but the large distance between the EV and TV is not used by
cludes an unknown entity in the form of a motorcycle that           the Intention Predictor to reduce the importance attributed to
was not part of the training set. The rationale for this is that    this quantity.
a type of vehicle not present in the training set might dis-           Results of the competence assessment with the Reasoner
play a driving profile that the intention prediction does not       are shown as Case 2 and Case 4 in Table 1. The Current
                                                                              Conclusions and future work
                                                                   We have presented a hybrid-AI framework for the safe appli-
                                                                   cation of AI functions in automated driving. The framework
                                                                   aggregates road observations and the results of data-driven
                                                                   AI computations — such as a DNN for intention prediction
                                                                   in our case study — into a knowledge graph. The graph is
                                                                   built by means of an ontology, which specifies the entities
                                                                   that can exist on the road, and a set of first-order logic in-
                                                                   ference rules, the latter aiming to estimate the severity level
                                                                   of the road situations. The knowledge graph is then com-
                                                                   pressed into a single value (embedding), stored in a work-
                                                                   ing memory, and used to forecast imminent severity levels.
                                                                   A final decision maker modules establishes whether the ve-
                                                                   hicle should continue driving autonomously or whether the
Figure 7: Future estimation of the competence correspond-          steering wheel should be handed over to a human driver or
ing to Case 4.                                                     backup safety system. The knowledge graphs encode the
                                                                   situational awareness capabilities of the vehicle, whilst the
                                                                   forecasting and decision making processes realise the vehi-
                                                                   cle’s competence assessment capability.
                                                                      We have shown that the reasoner correctly assigns high
Competence column refers to the competence cη as inferred          competence to the Intention Predictor in a situation in which
by the first-order rules of the knowledge graph at the cur-        some features of the DNN are uncertain, but the TV poses no
rent moment. In the event that more than one vehicle was           safety threat to the EV due to the large distance and high lane
predicted to perform a cut-in, the reported value is the low-      visibility. The added value of the Reasoner is also shown in
est cη estimated among all the vehicles. The Minimum Fu-           a situation that contains a vehicle (in this case a motorcy-
ture Competence ĉmin was computed converting the future           cle) that has never been seen before by the predictor, in an
doubt-embedding extrapolated by the forecaster (Fig. 7) as         environment with important entities that require attention (in
described in Eq. 4 and Eq. 5 (ρ = 2). Thus, the future compe-      this case an entrance lane). The predictor output is unreliable
tence was calculated for a prediction horizon of 2 seconds.        in this case, potentially leading to erratic and dangerous be-
The decision whether the vehicle should remain autonomous          haviour of the EV if taken at face value. Here, the Reasoner
was performed by a thresholding function (τc = 0.7) on the         correctly assigns a low competence to the predictor based on
future competence, as detailed in Eq. 6.                           the presence of the motorcycle (high doubt) and the presence
                                                                   of the entrance lane (high importance).
   We found that ĉmin evaluated for Case 4 was six times             These results provide a solid starting point for future in-
lower than in Case 2. In Case 2 the threshold for takeover         vestigations on situational awareness. In future work, we
was never reached and the system did not hand over the au-         will extend situational awareness to the entire automated
tonomous control. Due to the large distance of the TV and          vehicle instead of a single component. In addition, the rea-
the high visibility of the lanes, the Reasoner determined that     soner will aggregate more types of observations, for exam-
the vehicle could stay in AD mode despite the low likeli-          ple those regarding road works or weather conditions, and
hood of the input data expressed by the average feature un-        its first-order logic inference rules could be parameterised
certainty. In contrast, the system decided that a takeover of      via data-driven approaches instead of solely relying on do-
the AD mode was necessary in Case 4, because the numer-            main knowledge. Combining the DNN with the knowledge
ous sources of risk in this setting caused a low future compe-     graph into a graph neural network will result in a better esti-
tence. This is expressed by a competence value that is sub-        mation of competence, especially further into future. Graph
stantially lower than solely based on the feature uncertainty.     neural networks might also aid in enhanced explainability
Using the likelihood of the input data expressed by the fea-       on why takeover is needed.
ture uncertainty alone is not sufficient to correctly assess the
                                                                      While limited to a single function in a simulation envi-
confidence in the Intention Predictor output. This is evident
                                                                   ronment, our work shows that a hybrid-AI approach to situ-
by the results of Case 1 and Case 3 (Table 1), where the ab-
                                                                   ational awareness is essential for the safe application of AI
sence of the Reasoner fails to correctly assess the situation.
                                                                   systems in automated driving.
In addition, the competence returned by the Reasoner shows
a larger contrast between these two extreme cases than the
method based on the feature uncertainty alone.                                             References
                                                                   Bansal, M.; Krizhevsky, A.; and Ogale, A. 2019. Chauffeur-
   We found that the linear regression used to assess the fu-      Net: Learning to Drive by Imitating the Best and Synthesiz-
ture competence was strongly affected by small variations          ing the Worst. In Robotics: Science and Systems.
in the history of doubt-embeddings ρ. Thus, we do not con-
sider that a prediction horizon higher than 2 seconds would        Brochu, E.; Cora, V. M.; and De Freitas, N. 2010. A tuto-
be reliable enough to support the decision making process.         rial on Bayesian optimization of expensive cost functions,
with application to active user modeling and hierarchical re-   Hendrycks, D.; and Gimpel, K. 2016. A baseline for detect-
inforcement learning. arXiv preprint arXiv:1012.2599 .          ing misclassified and out-of-distribution examples in neural
                                                                networks. arXiv preprint arXiv:1610.02136 .
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler,
M.; Benenson, R.; Franke, U.; Roth, S.; and Schiele, B.         Hendrycks, D.; and Gimpel, K. 2017. A Baseline for De-
2016. The Cityscapes Dataset for Semantic Urban Scene           tecting Misclassified and Out-of-Distribution Examples in
Understanding. In Proc. of the IEEE Conference on Com-          Neural Networks. Proceedings of International Conference
puter Vision and Pattern Recognition (CVPR).                    on Learning Representations .
Czarnecki, K. 2018a. Operational world model ontology for       Horn, A. 1951. On sentences which are true of direct unions
automated driving systems–part 1: Road structure. Waterloo      of algebras. The Journal of Symbolic Logic 16(1): 14–21.
Intelligent Systems Engineering Lab (WISE) Report, Univer-      Koopman, P.; and Wagner, M. 2016. Challenges in Au-
sity of Waterloo .                                              tonomous Vehicle Testing and Validation. SAE International
Czarnecki, K. 2018b. Operational world model ontology           Journal of Transportation Safety 4(1): 15–24.
for automated driving systems–part 2: Road users, animals,      Kristiadi, A.; Hein, M.; and Hennig, P. 2020. Being
other obstacles, and environmental conditions,”. Waterloo       Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU
Intelligent Systems Engineering Lab (WISE) Report, Univer-      Networks. In Daumé III, H.; and Singh, A., eds., Pro-
sity of Waterloo .                                              ceedings of the 37th International Conference on Machine
de Gelder, E.; Paardekooper, J.-P.; den Camp Olaf, O.; and      Learning, volume 119 of Proceedings of Machine Learning
De Schutter, B. 2019. Safety assessment of automated ve-        Research, 5436–5446. Virtual: PMLR.
hicles: how to determine whether we have collected enough       LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning.
field data? Traffic Injury Prevention 20(S1): S162–S170.        Nature 521(7553): 436–444.
Deo, N.; and Trivedi, M. M. 2018. Multi-Modal Trajectory        Liang, S.; Li, Y.; and Srikant, R. 2017. Enhancing the reli-
Prediction of Surrounding Vehicles with Maneuver based          ability of out-of-distribution image detection in neural net-
LSTMs. In IEEE Intelligent Vehicles Symposium, Proceed-         works. arXiv preprint arXiv:1706.02690 .
ings, 1179–1184. University of California, San Diego, San       McAllister, R.; Gal, Y.; Kendall, A.; van der Wilk, M.; Shah,
Diego, United States, IEEE.                                     A.; Cipolla, R.; and Weller, A. 2017. Concrete problems for
Dinh, L.; Sohl-Dickstein, J.; and Bengio, S. 2016. Density      autonomous vehicle safety: Advantages of Bayesian deep
estimation using real nvp. arXiv preprint arXiv:1605.08803      learning. In IJCAI International Joint Conference on Ar-
.                                                               tificial Intelligence, 4745–4753. University of Cambridge,
                                                                Cambridge, United Kingdom.
Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; and
Koltun, V. 2017. CARLA: An open urban driving simula-           Meyer-Vitali, A.; Bakker, R.; van Bekkum, M.; Boer, M. d.;
tor. arXiv preprint arXiv:1711.03938 .                          Burghouts, G.; Diggelen, J. v.; Dijk, J.; Grappiolo, C.; Gre-
                                                                eff, J. d.; Huizing, A.; et al. 2019. Hybrid ai: white paper.
Endsley, M. R. 1995. Toward a theory of situation awareness     Technical report, TNO.
in dynamic systems. Human Factors 37(1): 32–64.
                                                                Nair, V.; and Hinton, G. E. 2010. Rectified linear units im-
Endsley, M. R. 2020. Situation Awareness in Driving. In         prove restricted boltzmann machines. In ICML.
Fisher, D.; Horrey, W.; Lee, J.; and Regan, M., eds., Hand-     Okuda, R.; Kajiwara, Y.; and Terashima, K. 2014. A survey
book of Human Factors for Automated, Connected, and In-         of technical trend of ADAS and autonomous driving. In Pro-
telligent Vehicles, chapter 7. CRC Press.                       ceedings of Technical Program - 2014 International Sympo-
Gal, Y.; and Ghahramani, Z. 2016. Dropout as a Bayesian         sium on VLSI Technology, Systems and Application, VLSI-
approximation: Representing model uncertainty in deep           TSA 2014. Renesas Electronics Corporation, Tokyo, Japan.
learning. In 33rd International Conference on Machine           Paardekooper, J.-P.; van Montfort, S.; Manders, J.; Goos, J.;
Learning, ICML 2016, 1651–1660. University of Cam-              de Gelder, E.; Op den Camp, O.; Bracquemond, A.; and Thi-
bridge, Cambridge, United Kingdom.                              olon, G. 2019. Automatic Detection of Critical Scenarios in
Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017.       a Public Dataset of 6000 km of Public-Road Driving. In
On calibration of modern neural networks. arXiv preprint        Enhanced Safety of Vehicles, 1–8.
arXiv:1706.04599 .                                              Parzen, E. 1962. On estimation of a probability density func-
Habbema, J.; JDF, H.; Van den Broek, K.; et al. 1974. A         tion and mode. The annals of mathematical statistics 33(3):
stepwise discriminant analysis program using density esti-      1065–1076.
mation. .                                                       Ren, J.; Liu, P. J.; Fertig, E.; Snoek, J.; Poplin, R.; Depristo,
Hein, M.; Andriushchenko, M.; and Bitterwolf, J. 2019.          M.; Dillon, J.; and Lakshminarayanan, B. 2019. Likelihood
Why ReLU networks yield high-confidence predictions far         ratios for out-of-distribution detection. In Advances in Neu-
away from the training data and how to mitigate the prob-       ral Information Processing Systems, 14707–14718.
lem. In Proceedings of the IEEE Conference on Computer          Sakr, S.; Elshawi, R.; Ahmed, A. M.; Qureshi, W. T.;
Vision and Pattern Recognition, 41–50.                          Brawner, C. A.; Keteyian, S. J.; Blaha, M. J.; and Al-Mallah,
M. H. 2017. Comparison of machine learning techniques to
predict all-cause mortality using fitness data: the Henry ford
exercIse testing (FIT) project. BMC medical informatics and
decision making 17(1): 174.
Thill, S.; Hemeren, P. E.; and Nilsson, M. 2014. The appar-
ent intelligence of a system as a factor in situation aware-
ness. In 2014 IEEE International Inter-Disciplinary Con-
ference on Cognitive Methods in Situation Awareness and
Decision Support, CogSIMA 2014, 52–58. RISE Viktoria,
Gothenburg, Sweden, IEEE.
van Harmelen, F.; and ten Teije, A. 2019. A Boxology of De-
sign Patterns for Hybrid Learning and Reasoning Systems.
arXiv.org .
Vellinga, N. E. 2019. Automated driving and its challenges
to international traffic law: which way to go? Law, Innova-
tion and Technology 11(2): 257–278.
WHO. 2018. Global status report on road safety 2018.
Zhao, L.; Ichise, R.; Yoshikawa, T.; Naito, T.; Kakinami, T.;
and Sasaki, Y. 2015. Ontology-based decision making on
uncontrolled intersections and narrow roads. In 2015 IEEE
intelligent vehicles symposium (IV), 83–88. IEEE.