<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Safety-aware Active Learning with Perceptual Ambiguity and Severity Assessment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Prajit T Rajendran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Ollier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huascar Espinoza</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Morayo Adedjouma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agnes Delaborde</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chokri Mraidha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CEA, List</institution>
          ,
          <addr-line>F-91120, Palaiseau</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KDT JU</institution>
          ,
          <addr-line>Avenue de la Toison d'Or 56-60, 1060 Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Laboratoire National de Metrologie et d'Essais</institution>
          ,
          <addr-line>Trappes</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Deep Neural Networks (DNN) used in self-driving cars need a large data coverage and labelling to manage all potential hazards in safety-critical scenarios. Active learning approaches make use of automated data selection and labelling that can build diverse datasets, with less human costs and more accuracy. Traditional active learning methods consider uncertainty of the model predictions and diversity of the data points for query selection. However, they are not optimal in capturing many critical data points, which are potentially risky with respect to safety considerations. In this position paper, we propose a novel approach that uses human feedback related to perceptual data ambiguity and a criticality score, linked to system-level safety assessment. This approach includes a continual learning model that learns to identify corner cases and blindspots with high impact in potential risk, and combines them with uncertainty-sampling and diversity-sampling models to create a safety-aware acquisition function for active learning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Safety</kwd>
        <kwd>Active learning</kwd>
        <kwd>Autonomous driving</kwd>
        <kwd>Human-in-the-loop learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>the fewest samples possible. This process usually
considers factors such as uncertainty and diversity to generate
Self-driving cars are increasingly employing various deep a query list to the human [5]. Active learning has shown
learning-based components in their technology stack. impressive performance gains over random selection in
These components require tremendous amounts of data many self-driving perception tasks.
to reach a significant level of performance [ 1]. Deep While there have been emerging eforts to improve
Neural Networks (DNN) generally perform poorly when active learning for complex scenarios, little attention
they come across previously unseen data. A DNN model has been given to active learning for safety-critical
featrained on only a homogeneous set of images from a par- tures. One example of these features is the detection of
ticular scenario would perform well only in that scenario ambiguous data points when the self-driving car is in a
and under-perform in most other situations. This is a safety-critical situation. An example for ambiguity could
major concern for the safety assessment of self-driving be an image used to train a trafic light detection system
vehicle systems [2]. In a trafic light classification task for wherein there is a red light for trafic intending to turn
instance, the more the diverse scenarios the DNN mod- right and green light for the straight-moving trafic. This
ule encounters in training, the wider is its safe operation image could be delegated to the human to annotate if it
region [3]. is deemed to have a high impact on potential risk.</p>
      <p>Typically, the labels to train such modules are provided This position paper proposes a novel approach that
by humans [4]. Curating a large dataset with millions of uses human feedback related to perceptual data
ambiguhuman labels is painfully time consuming and expensive. ity and a criticality score. This criticality score, which is
Active learning is a powerful technique that attempts to linked to the exposure and severity factors of a typical
maximize a model’s performance gain while annotating safety assessment, helps to characterize the criticality
context of corner cases and blindspots with high impact
The IJCAI-ECAI-22 Workshop on Artificial Intelligence Safety (AISafety in potential risk. In a limited query budget scenario,
*20C2o2r)r,eJsuployn2d4i-n2g5,a2u0t2h2o, rV.ienna, Austria perceptual ambiguity level and criticality level obtained
$ prajit.thazhurazhikath@cea.fr (P. T. Rajendran); during the annotation process, along with uncertainty
guillaume.ollier@cea.fr (G. Ollier); and diversity measurements help in selecting the images
Huascar.Espinoza@kdt-ju.europa.eu (H. Espinoza); with highest impact on potential risk. This position paper
morayo.adedjouma@cea.fr (M. Adedjouma); is a preliminary step towards deeper research into how
(aCg.nMesr.daeidlahbao)rde@lne.fr (A. Delaborde); chokri.mraidha@cea.fr human-in-the-loop feedback can help in a safety-aware
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License active learning approach.</p>
      <p>CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
with less cost and more accuracy [6]. In this work, we
focus on pool-based active learning, where we have a
small set of labelled data available and a large set of
unlabelled data which needs to labelled within a certain
query budget.</p>
      <sec id="sec-1-1">
        <title>Some of the sampling strategies in active learning are as follows [7]:</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Works</title>
      <sec id="sec-2-1">
        <title>2.1. Motivation</title>
        <p>A modular driving system typically consists of several
components with specific functions collaborating to
achieve the intended driving behaviour. There are also
end-to-end driving systems, but these are usually
entirely made up of opaque blackbox models and thereby it
is not feasible to certify their functional safety. Learning
enabled components making use of black box machine
learning models are notorious in this aspect due to their
lack of transparency. Failures or unsafe behavior at the
component level can potentially compromise the safety
of the entire system unless there are exhaustive system
level measures to tackle them, and thus it is important to
ensure that the component is trained in a manner so as to
minimize vulnerability to unknown situations. The
presence of a human in the loop could help in mitigating some
of these vulnerabilities by identifying certain blindspots
undetected by the trained models and by assessing the
severity level of the consequence of misprediction by the
trained models. In situations of limited query budget
and training time, the paradigm of active learning could
assist in selecting the most safety relevant data points by
analyzing the blindspot vulnerabilities of the component.</p>
        <p>In this work, we focus on improving the data selection
and training of a trafic light classification component in
a modular driving system.
2.2. Active Learning
• Random sampling is a strategy where we pick
random samples from the unlabeled pool of data
as query points for the human to label. This is
usually used just as a baseline as it does not have
an intelligent strategy to select the query points.
• Uncertainty sampling is the set of strategies for
identifying unlabeled items that are near a
decision boundary in the trained model. This
approach basically picks out the data points with
a higher predictive uncertainty, and is thereby
reflective of the blindspots of the trained model.
• Diversity sampling is the set of strategies for
identifying unlabeled items that are underrepresented
or unknown to the machine learning model (for
instance, features that are not common in the
training data, or are under-represented in real
world demographics)
Active learning is a process of eliciting training data from
annotators to determine the right data to put in front of
people when you don’t have the budget or time for
human feedback on all your data. This is especially true
in datasets for autonomous driving, which could have
millions of hours of data available for training. More than
the raw quantity of the data used, the quality, diversity
and usability of the data are the important parameters
to assure optimum performance and safety of the
deployed models. The deep neural networks responsible The simplest approach in literature as illustrated in [8]
for self-driving functions require exhaustive training, is to select examples based on distances in the feature
and the data needs to cover new and uncertain situations space. In [9], diversity is measured using a similarity
in order to tackle the problems of unknown unknowns. matrix made using the Gaussian kernel of the distance
Unknown unknowns are data points for which the AI between two points. [10] makes use of entropy as a
metmodel provides a wrong prediction with a high degree ric of uncertainty.[11] makes use of information density
of accuracy. Such points are dangerous because they are of the candidate instance obtained from the input space
immune to detection by uncertainty measures, which are for the remaining unlabeled in- stances. [12] and [13] use
often used as a proxy metric to test models’ weaknesses. ensemble and Bayesian methods to approximate
uncerThe combination of data annotation and curation poses tainty respectively. [14] proposes heuristic methods to
a major challenge to deploying deep learning models in balance between the uncertainty and the represen-
tativeautonomous systems and active learning helps by auto- ness of the selected sample, considering the redundancy
matically finding the relevant data points to query the between selected samples. [15] argues that the initial
human, to build better datasets in a fraction of the time, model does not have a good performance so the queries
generated by it are also likely to be ineficient. In [ 16], it
is proposed to include knowledge from unlabeled images,
by adding unsupervised and semi-supervised methods to
enhance the performance. The authors in [17] proposed
to use a binary classifier to predict if an image is from
the labeled or unlabeled pool using the concept of
adversarial learning. In [18], a semi-supervised active learning
approach is proposed wherein contention points are
determined by making use of both the informativeness and
adaptive probabilistic label of the unlabelled points based
on the hypothesis of the current model.
2.3. Blindspots and Corner Cases
without conceptual knowledge about the trafic,
which a blackbox model may not necessarily
possess. Such blindspots can be identified with the
help of a human-in-the-loop.
• Safety Blindspots: Datapoints whose
misclassiifcation by the specific trained model at the
component level could compromise the safety of the
system which the component is a part of,
constitute safety blindspots.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Method</title>
      <sec id="sec-3-1">
        <title>Blindspots are the deficiencies that are present in a model</title>
        <p>which may be detrimental to its performance and
adaptability to unknown and uncertain situations [19]. In
active learning, data points falling under these blindspots
can be specifically picked to query to a human oracle.</p>
        <p>There can be various categories of blindspots:
In contexts which are subjective in nature or when
human contextual knowledge plays a major role, current
active learning methods based purely on model
knowledge do not tend to perform well [2]. Safety in particular
is a complex concept involving other environmental and
situational factors. Since the onus in active learning is
on a particular component, one can not discuss safety
• Model Blindspots: The set of data points, and as it is a system-level concept. However, it is possible
the feature regions they enclose, in which the to think about the safety implications of a mislabelled
model is highly uncertain about or unsure of its or ambiguous data point. A human-in-the-loop can help
predicted label constitute the model blindspots. in identifying certain conceptual blindspots which are
It is possible to identify model blindspots using not covered under the model and data blindspots as
disthe prediction uncertainty of data points. Data cussed in the section above. Although human-in-the-loop
points for which the model has a prediction with involves efort in terms of labelling, active learning
aca high entropy fall under this category. quisition functions ensure that only a fraction of the data
• Data Blindspots: The areas of the feature space points which are most critical according to the chosen
that are not covered in the training set constitute criterion have to be labelled by the humans, thereby
solvthe data blindspots. Diversity is one of the as- ing the scalability issue. Human bias is always a factor in
pects that help in uncovering these blindspots. labelling but classic methods in active learning such as
An example could be a dataset with images only inter-annotator agreement can be used to mitigate this
recorded in daytime. An image taken at night problem.
time would be very distant from the images that
the model has seen before, and even if the model’s 3.1. Perceptual Ambiguity
output prediction has a low entropy, it can not be
fully trusted. Data points which the annotator perceives to be
poten• Human-identified Blindspots: The model tially ambiguous could be rejected and removed from the
blindspots reveal the underconfidence and knowl- training set. However, a black and white approach of
edge gaps of the trained model, and the data reject and accept is not suitable in many cases, such as
blindspots explore the diversity of the data. How- trafic related tasks. Many data points could be slightly
ever, there may be more conceptual aspects in ambiguous yet interesting to include in the dataset for
the dataset which are not covered under both diversity and task relevance. Conservatively rejecting
the above categories of blindspots. For example, all data points the annotator perceives to be slightly
amconsider an image in the training set of a traf- biguous leads to lesser diversity in the training set. These
ifc light classification system wherein there are constitute human-identified blindspots and provide
additwo visible trafic lights- one for left moving traf- tional information for data selection. Thus, it would be
ifc, and the other for straight-moving trafic. If useful to quantify the level of ambiguity and
underconfithe ego vehicle is in the rightmost lane, a hu- dence that the annotator feels for each data point as very
man looking at the image can see that the vehicle low, low, medium, high or very high. A secondary model
could not possibly turn left so only the signal can be trained to predict the level of perceptual ambiguity
light for straight-moving trafic is relevant for with the help of human feedback and this could assist in
the scene. This however is an ambiguous situa- better data selection for active learning querying under
tion that could be potentially dificult to classify a limited budget. We propose table 1 as reference for the</p>
        <sec id="sec-3-1-1">
          <title>3.2. Criticality Assessment</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>We consider the safety awareness of the data labeling</title>
        <p>Level Explanation process through the concept of criticality assessment and
Very low Unambiguous image, label easy to identify thereby aim to tackle safety blindspots discussed above.</p>
        <p>Low Distracting features but easy to classify The idea behind it is to estimate the importance of a
Medium Some ambiguities in identifying the label specific image regarding a task according to the global</p>
        <p>High Occlusions and ambiguities, hard to classify risk it could represent on a system facing that task. The
Very high Corner case with safety implications global risk is here the combination of two factors. These
two factors are the severity, i.e., estimated safety
consequences if the system fails the task, and the exposure,
annotators: i.e., the estimated probability of this fail. In the context
of trafic light classification, this severity concerns the
expected consequences if this trafic light is misclassified,
and it will depend on which class is misclassified (i.e.,
green light misclassified as red/orange light, or
red/orange light misclassified as green light) and the diferent
visible environmental parameter which can participate
in the possible accidents (e.g., pedestrian crossing, road
intersection). The exposure is estimated by detecting the
diferent visible factors that could cause the
misclassification (e.g., camera obstruction or corruption, weather
conditions). We focus here on the risk assessment at the
component level without considering the whole system’s
Figure 2: Ambiguous class labels and distracting features capabilities and interactions with the other components
and subsystems.</p>
        <p>To include this active learning approach in a complete
safety engineering process, the requirements identified
in the preliminary analysis shall be considered to adapt
this score on it. A first question to estimate the severity
level is presented to the human annotator. We formulate
the question as "How do you estimate the consequences on
accidents risk if the automated driving system misclassifies
this trafic light?" (As shown in Figure 5) with the possible
answers "Negligible", "Light", "Severe", and "Fatal". We
associate each of these answers to a value (zero for
"Negligible") and if the human rater do not select the answer
"Negligible" i.e., if the severity score is higher than zero,
Figure 3: Ambiguous class labels and conceptual understand- we ask another question for the exposure estimation:
ing "Can you see any factor that might bother this trafic
light identification?" with the answers "Yes", "No". If the</p>
        <p>Consider figure 2 from the trafic light detection rater answers "No", the exposure value is zero. Else we ask
dataset presented in [20]. There are two trafic lights additional questions to identify these factors. Each factor
visible in the image, which is a source of ambiguity. Ad- is associated with an exposure value defined in amount
ditionally, at night the tail lights of trafic ahead may by the expert and not visible by the human rater. We
constitute distracting features which may afect the la- (c∑a︀nthen compute the criticality score with the formula
bel prediction. In figure 3 also from the same dataset, =1  * ) *  where  is the number of identified
one can see that once again there is an ambiguity in the factor,  is a boolean vector which represent the
presclass label on first sight. However considering that the ence/absence of a factor,  is a vector that represent the
ego vehicle is in the middle lane, with proper conceptual exposure value for each factor and  is the severity score.
knowledge it can be presumed that the trafic light for
straight-moving trafic is the relevant one.</p>
        <sec id="sec-3-2-1">
          <title>3.3. Continual Learning Model for</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Perceptual Ambiguity and Criticality</title>
          <p>A continual learning approach would be suitable in a
human-in-the-loop environment when the human can
initially provide labels and eventually a simple model
(diferent from the main component that is being trained)
would be able to replace the human when it reaches
a suficient level of performance. Note that before the
continual learning model’s misclassifications would only
afect the data selection and not the predictions of the
main component directly. Along with providing the class
labels, the human annotator can be asked to provide the
perception ambiguity and severity level associated with
the data point. Thus there can be two separate continual
learning models attached to the main component
modelone to predict perceptual ambiguity and one to predict
the severity level of the data point. The model used here
could be a shallow neural network with the intermediate
features from the main component model.</p>
          <p>An issue with the continual learning approach is
catastrophic forgetting when the model updates itself
constantly and forgets what it learnt before. To avoid this,
it is necessary to maintain the best representation set of
what the model already knows so that when the model is
re-trained it can also include this representation set. In
this work, we make use of a bufer called the familiarity
bufer for this purpose. The familiarity bufer holds a
representation of the data points where the model
predicts the perceptual ambiguity or the criticality of the
data point accurately. When the model encounters data
points wherein there is mismatch between the model
prediction and the human feedback, that data point is
populated in the unfamiliarity bufer. When the
unfamiliarity bufer is full, the continual learning model is
retrained with the contents of both the familiarity and
unfamiliarity bufers. After the re-training, the
familiarity bufer of size ’n’ is updated. From the contents
of both the bufers, the most diverse ’n’ data points are
chosen to repopulate the familiarity bufer. Finally, the
unfamiliarity bufer is emptied.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.4. Uncertainty and Diversity</title>
          <p>The model blindspots and data blindspots can be captured
by uncertainty and diversity respectively. They can be
calculated as follows:
• Uncertainty-based querying: In
uncertaintybased querying, the model’s uncertainty about its
predictions is used as a metric for selecting query
points [10]. The model predictions typically
contain probability scores associated with each class
label. In the ideal scenario, the model should
allocate a probability of one to the correct label
and zero to all the incorrect labels. Thus, entropy
can be used as a measure of the self-evaluated
confidence of the model in its own predictions.
Zero entropy means that the model is perfectly
confident in its prediction while an entropy of
one is the level of maximum doubt. Entropy of a
model with ’c’ classes with each class ’i’ having a
probability  is defined as follows:</p>
          <p>= −  ∑︁ ()
=1
(1)
Thus, data points with a higher entropy are those
with a higher level of uncertainty attached. The
queries can be generated such that the most
uncertain data points are shown to the human for
review. In this work, we use an ensemble of
models as in [21] to generate the average predictive
entropy.
• Diversity-based querying: The diversity-based
querying approach aims to include the data points
most diferent from what the model has
previously seen [9]. For this, one should store a
representation of the training data that the model
has been trained on. An ideal candidate for this is
the distribution of the features at an intermediate
layer of the prediction model. The distribution of
features of a fully connected (FC) layer in the later
layers of a convolutional neural network for the
training data points could be computed and then
compared with each new data point to obtain a
distance score. In this work, we consider an FC
layer with ’N’ neurons and compute the means
and variances of the output values from that layer
for all training data points as a new variant of the
existing distance based acquisition functions for
diversity such as in [8]. Then, for each new data
point, we calculate the Z-score for each of the ’N’
features 1 to  and consider their average.
 −  =
∑︀
=1

−  
 
(2)</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>The higher the Z-score, the more distant the new</title>
        <p>data point from the known distribution. In this
approach, the queries would be generated such
that the data points with a higher average Z-score
are shown to the human for labelling.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Evaluation</title>
    </sec>
    <sec id="sec-5">
      <title>Framework</title>
      <sec id="sec-5-1">
        <title>4.1. Planned Experiment</title>
        <sec id="sec-5-1-1">
          <title>The first step in the active learning process is training the initial model using the available pool of labelled data.</title>
          <p>This model would serve as a starting point to generate
queries from the unlabelled set. The large pool of
unlabelled data is divided randomly into various chunks. Each
of these chunks shall be labelled in a particular round of
active learning [22]. In the first round of active learning,
the pre-trained model is made use of to generate a query
list of the data points to be reviewed and labelled by the
human. The selection criteria of the query points is the
major challenge in active learning and it depends on the
mode of active learning selected as explained above.
After all the data points in the first round of active learning
are labelled successfully, the model is re-trained with
the updated set of labelled data and the next chunk of
unlabelled data is selected for the second round of active
learning. This process continues till all data points are
labelled.</p>
          <p>During the labelling process, the annotators are tasked
at providing the class label, perceptual ambiguity level
and severity level of each data point on a graphical user
interface as shown in figure 5. If the data point has a high
severity and ambiguity level, additional questions can
be asked of the annotators to determine the associated
criticality score as mentioned above.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Active Learning Acquisition</title>
      </sec>
      <sec id="sec-5-3">
        <title>Functions</title>
        <p>In order to demonstrate the efectiveness of the proposed
approach, we propose to perform the experiment with
the following combinations of acquisition functions:
• Random: In this mode, N% of images are
randomly selected from the subset of unlabelled data
in a particular round, and are assigned to the
human to label
• Uncertainty: In this mode, the top N% of images
with the highest average entropy are assigned to
the human to label
• Diversity: In this mode, the top N% of images
with the highest average Z-score from the current
distribution are assigned to the human to label
• Perceptual Ambiguity: In this mode, the top N%
of images with the highest perceptual ambiguity
scores are assigned to the human to label
• Criticality: In this mode, the top N% of images
4.3.1. F1-score</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.3. Evaluation Metrics</title>
        <sec id="sec-5-4-1">
          <title>We propose to use the following evaluation metrics to</title>
          <p>compare the safety and performance of the proposed
approach with that of the pre-existing ones:
When there is an imbalance in the number of data points
in diferent classes, accuracy might not be a good metric
for prediction performance. In this case, F1-score which
accounts for both type-I and type-II errors would be a
better metric:
with the criticality scores are assigned to the hu- we can reuse the accuracy metric used to evaluate the
man to label performance of classification models and adapt it to
crit• Combined: In this mode, the top N% of images icality aspects. Given the safety requirements
identiwith the highest combined average of entropy, Z- fied through the Hazard Analysis and Risk Assessment
score, criticality and perceptual ambiguity score (HARA) methods and all the relevant Operating
Conare assigned to the human to label ditions (OCs) visible on the input data, a safety expert
identifies the possible hazardous scenarios that could be
caused by misclassification of this input data (with a
minimum probability of occurrence), and weight the score
associated to this input on the visible risk. The OCs are
any relevant parameters to describe the system’s usage
scenarios, including environmental conditions, dynamic
elements, and scenery. As the criticality assessment
presented in section 3.2, the risk evaluation is decomposed
into severity and exposure factors. We estimate for each
input the Safety Integrity Level (SIL), presented in the
IEC 61508[23] standard, with the severity and exposures
scores and the risk matrix. We then give each input an
integer score between one to four, and we compute the
model safety-weighted accuracy as follows:
∑︀
=1  *</p>
          <p>∑︀</p>
          <p>With  the number of predictions,  the vector with
the SIL scores for all inputs, and  a vector with the values
of the classification correctness (one if the classification
is correct and zero else).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>1 −  =
2 *   * 
  + 
(3)
4.3.2. Uncertainty Reduction
The goal of training a model is to generalize its knowledge
over the assigned task and therefore perform well on the
unseen test set. As mentioned above, entropy is a good
measure of prediction power of a model when the label
probabilities are available. Therefore, we can use entropy
over the test set as one of the measures of how the model
uncertainty is reduced. Note that while the reduction
of uncertainty is good, it has to be viewed in tandem
with other metrics such as accuracy, precision, recall or
F1-score.</p>
      <sec id="sec-6-1">
        <title>In this paper, we introduced the concepts of percep</title>
        <p>tual ambiguity and criticality, and proposed a model
which learns to predict these through continuous
feedback from a human in the loop. The proposed approach
is aimed at tackling blindspots not covered under current
4.3.3. Query Relevance approaches dealing with the uncertainty and diversity
In each round of active learning, N% of the data is selected sampling methods. An experiment was designed with
as query points to be shown to the human. It is necessary the goal of testing such a model trained to perform trafic
to measure if the selected points are indeed the best ones. light detection. The work is still in an early stage and the
One of the ways to do this is to measure the diference next steps include performing an active learning
experiin the relevant scores (an average of the uncertainty, ment on a large scale with several volunteers, linking the
diversity, criticality and perceptual ambiguity scores for definition of criticality to concrete safety metrics in the
each point) in predictions of the human labelled points industry, development of other evaluation metrics and
and those of the auto-labelled points. The larger the testing alternate designs of the continual learning model.
diference between these sets, the more relevant are the
selected query points. For the random mode, the query Acknowledgments
relevance is expected to be the least because the points are
selected randomly without considering their relevance
in active learning.</p>
      </sec>
      <sec id="sec-6-2">
        <title>This work is partially funded by TAILOR, an ICT-48 Network of AI Research Excellence Centers funded by EU Horizon 2020 research and innovation programme under grant agreement No 952215.</title>
        <p>4.3.4. Safety-weighted Accuracy</p>
      </sec>
      <sec id="sec-6-3">
        <title>To consider the importance of each input data on the safety relevancy for a machine learning model training,</title>
        <p>C. Li, Z. Cui, An active learning approach with
uncertainty, representativeness, and diversity, The
[1] H. M. Eraqi, M. N. Moustafa, J. Honer, End-to- Scientific World Journal 2014 (2014).
end deep learning for steering autonomous vehicles [15] O. Siméoni, Robust image representation for
classiconsidering temporal dependencies, arXiv preprint ifcation, retrieval and object discovery, Ph.D. thesis,
arXiv:1710.03804 (2017). Université rennes1, 2020.
[2] S. Mohseni, M. Pitale, V. Singh, Z. Wang, Prac- [16] O. Siméoni, M. Budnik, Y. Avrithis, G. Gravier,
Retical solutions for machine learning safety in thinking deep active learning: Using unlabeled data
autonomous vehicles, CoRR abs/1912.09630 at model training, in: 2020 25th International
Con(2019). URL: http://arxiv.org/abs/1912.09630. ference on Pattern Recognition (ICPR), IEEE, 2021,
arXiv:1912.09630. pp. 1220–1227.
[3] D. Wang, X. Ma, X. Yang, Tl-gan: Improving trafic [17] D. Gissin, S. Shalev-Shwartz, Discriminative active
light recognition via data synthesis for autonomous learning, arXiv preprint arXiv:1907.06347 (2019).
driving, arXiv preprint arXiv:2203.15006 (2022). [18] I. Muslea, S. Minton, C. A. Knoblock, Active+
semi[4] J. Geary, H. Gouk, S. Ramamoorthy, Active altru- supervised learning= robust multi-view learning,
ism learning and information suficiency for au- in: ICML, volume 2, Citeseer, 2002, pp. 435–442.
tonomous driving, arXiv preprint arXiv:2110.04580 [19] R. Ramakrishnan, E. Kamar, B. Nushi, D. Dey,
(2021). J. Shah, E. Horvitz, Overcoming blind spots in the
[5] E. Haussmann, M. Fenzi, K. Chitta, J. Ivanecky, real world: Leveraging complementary abilities for
H. Xu, D. Roy, A. Mittel, N. Koumchatzky, C. Fara- joint execution, in: Proceedings of the AAAI
Conbet, J. M. Alvarez, Scalable active learning for object ference on Artificial Intelligence, volume 33, 2019,
detection, in: 2020 IEEE intelligent vehicles sympo- pp. 6137–6145.</p>
        <p>sium (iv), IEEE, 2020, pp. 1430–1435. [20] X. Yang, J. Yan, X. Yang, J. Tang, W. Liao,
[6] C.-C. Kao, T.-Y. Lee, P. Sen, M.-Y. Liu, Localization- T. He, Scrdet++: Detecting small, cluttered and
aware active learning for object detection, in: Asian rotated objects via instance-level feature
denoisConference on Computer Vision, Springer, 2018, pp. ing and rotation loss smoothing, arXiv preprint
506–522. arXiv:2004.13316 (2020).
[7] R. Monarch, Human-in-the-Loop Machine Learn- [21] R. Rahaman, et al., Uncertainty quantification and
ing, Manning Publications Co., 2021. deep ensembles, Advances in Neural Information
[8] Y. Geifman, R. El-Yaniv, Deep active learning over Processing Systems 34 (2021).
the long tail, CoRR abs/1711.00941 (2017). URL: http: [22] R. Ganti, A. Gray, Upal: Unbiased pool based active
//arxiv.org/abs/1711.00941. arXiv:1711.00941. learning, in: Artificial Intelligence and Statistics,
[9] G. Wang, J.-N. Hwang, C. Rose, F. Wallace, Uncer- PMLR, 2012, pp. 422–431.</p>
        <p>tainty sampling based active learning with diversity [23] International Electrotechnical Commission,
Funcconstraint by sparse selection, in: 2017 IEEE 19th tional safety of electrical/electronic/programmable
International Workshop on Multimedia Signal Pro- electronic safety-related systems, 2010.
cessing (MMSP), IEEE, 2017, pp. 1–6.
[10] A. J. Joshi, F. Porikli, N. Papanikolopoulos,
Multiclass active learning for image classification, in:
2009 ieee conference on computer vision and
pattern recognition, IEEE, 2009, pp. 2372–2379.
[11] X. Li, Y. Guo, Adaptive active learning for image
classification, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition,
2013, pp. 859–866.
[12] W. H. Beluch, T. Genewein, A. Nürnberger, J. M.</p>
        <p>Köhler, The power of ensembles for active
learning in image classification, in: Proceedings of the
IEEE conference on computer vision and pattern
recognition, 2018, pp. 9368–9377.
[13] Y. Gal, R. Islam, Z. Ghahramani, Deep bayesian
active learning with image data, in: International
Conference on Machine Learning, PMLR, 2017, pp.</p>
        <p>1183–1192.
[14] T. He, S. Zhang, J. Xin, P. Zhao, J. Wu, X. Xian,</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>