<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Safety-aware Active Learning with Perceptual Ambiguity and Severity Assessment</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Prajit</forename><forename type="middle">T</forename><surname>Rajendran</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">CEA</orgName>
								<address>
									<addrLine>List</addrLine>
									<postCode>F-91120</postCode>
									<settlement>Palaiseau</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Guillaume</forename><surname>Ollier</surname></persName>
							<email>guillaume.ollier@cea.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">CEA</orgName>
								<address>
									<addrLine>List</addrLine>
									<postCode>F-91120</postCode>
									<settlement>Palaiseau</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Huascar</forename><surname>Espinoza</surname></persName>
							<email>huascar.espinoza@kdt-ju.europa.eu</email>
							<affiliation key="aff1">
								<orgName type="institution">KDT JU</orgName>
								<address>
									<addrLine>Avenue de la Toison d&apos;Or 56-60</addrLine>
									<postCode>1060</postCode>
									<settlement>Brussels</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Morayo</forename><surname>Adedjouma</surname></persName>
							<email>morayo.adedjouma@cea.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">CEA</orgName>
								<address>
									<addrLine>List</addrLine>
									<postCode>F-91120</postCode>
									<settlement>Palaiseau</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Agnes</forename><surname>Delaborde</surname></persName>
							<email>agnes.delaborde@lne.fr</email>
							<affiliation key="aff2">
								<orgName type="institution">Laboratoire National de Metrologie et d&apos;Essais</orgName>
								<address>
									<settlement>Trappes</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chokri</forename><surname>Mraidha</surname></persName>
							<email>chokri.mraidha@cea.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">CEA</orgName>
								<address>
									<addrLine>List</addrLine>
									<postCode>F-91120</postCode>
									<settlement>Palaiseau</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Safety-aware Active Learning with Perceptual Ambiguity and Severity Assessment</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1271D05E9E6922F6C20C4AC506074F4C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:22+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Safety</term>
					<term>Active learning</term>
					<term>Autonomous driving</term>
					<term>Human-in-the-loop learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Deep Neural Networks (DNN) used in self-driving cars need a large data coverage and labelling to manage all potential hazards in safety-critical scenarios. Active learning approaches make use of automated data selection and labelling that can build diverse datasets, with less human costs and more accuracy. Traditional active learning methods consider uncertainty of the model predictions and diversity of the data points for query selection. However, they are not optimal in capturing many critical data points, which are potentially risky with respect to safety considerations. In this position paper, we propose a novel approach that uses human feedback related to perceptual data ambiguity and a criticality score, linked to system-level safety assessment. This approach includes a continual learning model that learns to identify corner cases and blindspots with high impact in potential risk, and combines them with uncertainty-sampling and diversity-sampling models to create a safety-aware acquisition function for active learning.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Self-driving cars are increasingly employing various deep learning-based components in their technology stack. These components require tremendous amounts of data to reach a significant level of performance <ref type="bibr" target="#b0">[1]</ref>. Deep Neural Networks (DNN) generally perform poorly when they come across previously unseen data. A DNN model trained on only a homogeneous set of images from a particular scenario would perform well only in that scenario and under-perform in most other situations. This is a major concern for the safety assessment of self-driving vehicle systems <ref type="bibr" target="#b1">[2]</ref>. In a traffic light classification task for instance, the more the diverse scenarios the DNN module encounters in training, the wider is its safe operation region <ref type="bibr" target="#b2">[3]</ref>.</p><p>Typically, the labels to train such modules are provided by humans <ref type="bibr" target="#b3">[4]</ref>. Curating a large dataset with millions of human labels is painfully time consuming and expensive. Active learning is a powerful technique that attempts to maximize a model's performance gain while annotating the fewest samples possible. This process usually considers factors such as uncertainty and diversity to generate a query list to the human <ref type="bibr" target="#b4">[5]</ref>. Active learning has shown impressive performance gains over random selection in many self-driving perception tasks.</p><p>While there have been emerging efforts to improve active learning for complex scenarios, little attention has been given to active learning for safety-critical features. One example of these features is the detection of ambiguous data points when the self-driving car is in a safety-critical situation. An example for ambiguity could be an image used to train a traffic light detection system wherein there is a red light for traffic intending to turn right and green light for the straight-moving traffic. This image could be delegated to the human to annotate if it is deemed to have a high impact on potential risk.</p><p>This position paper proposes a novel approach that uses human feedback related to perceptual data ambiguity and a criticality score. This criticality score, which is linked to the exposure and severity factors of a typical safety assessment, helps to characterize the criticality context of corner cases and blindspots with high impact in potential risk. In a limited query budget scenario, perceptual ambiguity level and criticality level obtained during the annotation process, along with uncertainty and diversity measurements help in selecting the images with highest impact on potential risk. This position paper is a preliminary step towards deeper research into how human-in-the-loop feedback can help in a safety-aware active learning approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and Related Works</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Motivation</head><p>A modular driving system typically consists of several components with specific functions collaborating to achieve the intended driving behaviour. There are also end-to-end driving systems, but these are usually entirely made up of opaque blackbox models and thereby it is not feasible to certify their functional safety. Learning enabled components making use of black box machine learning models are notorious in this aspect due to their lack of transparency. Failures or unsafe behavior at the component level can potentially compromise the safety of the entire system unless there are exhaustive system level measures to tackle them, and thus it is important to ensure that the component is trained in a manner so as to minimize vulnerability to unknown situations. The presence of a human in the loop could help in mitigating some of these vulnerabilities by identifying certain blindspots undetected by the trained models and by assessing the severity level of the consequence of misprediction by the trained models. In situations of limited query budget and training time, the paradigm of active learning could assist in selecting the most safety relevant data points by analyzing the blindspot vulnerabilities of the component. In this work, we focus on improving the data selection and training of a traffic light classification component in a modular driving system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Active Learning</head><p>Active learning is a process of eliciting training data from annotators to determine the right data to put in front of people when you don't have the budget or time for human feedback on all your data. This is especially true in datasets for autonomous driving, which could have millions of hours of data available for training. More than the raw quantity of the data used, the quality, diversity and usability of the data are the important parameters to assure optimum performance and safety of the deployed models. The deep neural networks responsible for self-driving functions require exhaustive training, and the data needs to cover new and uncertain situations in order to tackle the problems of unknown unknowns. Unknown unknowns are data points for which the AI model provides a wrong prediction with a high degree of accuracy. Such points are dangerous because they are immune to detection by uncertainty measures, which are often used as a proxy metric to test models' weaknesses. The combination of data annotation and curation poses a major challenge to deploying deep learning models in autonomous systems and active learning helps by automatically finding the relevant data points to query the human, to build better datasets in a fraction of the time, with less cost and more accuracy <ref type="bibr" target="#b5">[6]</ref>. In this work, we focus on pool-based active learning, where we have a small set of labelled data available and a large set of unlabelled data which needs to labelled within a certain query budget. • Random sampling is a strategy where we pick random samples from the unlabeled pool of data as query points for the human to label. This is usually used just as a baseline as it does not have an intelligent strategy to select the query points. • Uncertainty sampling is the set of strategies for identifying unlabeled items that are near a decision boundary in the trained model. This approach basically picks out the data points with a higher predictive uncertainty, and is thereby reflective of the blindspots of the trained model. • Diversity sampling is the set of strategies for identifying unlabeled items that are underrepresented or unknown to the machine learning model (for instance, features that are not common in the training data, or are under-represented in real world demographics)</p><p>The simplest approach in literature as illustrated in <ref type="bibr" target="#b7">[8]</ref> is to select examples based on distances in the feature space. In <ref type="bibr" target="#b8">[9]</ref>, diversity is measured using a similarity matrix made using the Gaussian kernel of the distance between two points. <ref type="bibr" target="#b9">[10]</ref> makes use of entropy as a metric of uncertainty. <ref type="bibr" target="#b10">[11]</ref> makes use of information density of the candidate instance obtained from the input space for the remaining unlabeled in-stances. <ref type="bibr" target="#b11">[12]</ref> and <ref type="bibr" target="#b12">[13]</ref> use ensemble and Bayesian methods to approximate uncertainty respectively. <ref type="bibr" target="#b13">[14]</ref> proposes heuristic methods to balance between the uncertainty and the represen-tativeness of the selected sample, considering the redundancy between selected samples. <ref type="bibr" target="#b14">[15]</ref> argues that the initial model does not have a good performance so the queries generated by it are also likely to be inefficient. In <ref type="bibr" target="#b15">[16]</ref>, it is proposed to include knowledge from unlabeled images, by adding unsupervised and semi-supervised methods to enhance the performance. The authors in <ref type="bibr" target="#b16">[17]</ref> proposed to use a binary classifier to predict if an image is from the labeled or unlabeled pool using the concept of adversarial learning. In <ref type="bibr" target="#b17">[18]</ref>, a semi-supervised active learning approach is proposed wherein contention points are determined by making use of both the informativeness and adaptive probabilistic label of the unlabelled points based on the hypothesis of the current model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Blindspots and Corner Cases</head><p>Blindspots are the deficiencies that are present in a model which may be detrimental to its performance and adaptability to unknown and uncertain situations <ref type="bibr" target="#b18">[19]</ref>. In active learning, data points falling under these blindspots can be specifically picked to query to a human oracle. There can be various categories of blindspots:</p><p>• Model Blindspots: The set of data points, and the feature regions they enclose, in which the model is highly uncertain about or unsure of its predicted label constitute the model blindspots.</p><p>It is possible to identify model blindspots using the prediction uncertainty of data points. Data points for which the model has a prediction with a high entropy fall under this category. • Data Blindspots: The areas of the feature space that are not covered in the training set constitute the data blindspots. Diversity is one of the aspects that help in uncovering these blindspots. An example could be a dataset with images only recorded in daytime. An image taken at night time would be very distant from the images that the model has seen before, and even if the model's output prediction has a low entropy, it can not be fully trusted. • Human-identified Blindspots: The model blindspots reveal the underconfidence and knowledge gaps of the trained model, and the data blindspots explore the diversity of the data. However, there may be more conceptual aspects in the dataset which are not covered under both the above categories of blindspots. For example, consider an image in the training set of a traffic light classification system wherein there are two visible traffic lights-one for left moving traffic, and the other for straight-moving traffic. If the ego vehicle is in the rightmost lane, a human looking at the image can see that the vehicle could not possibly turn left so only the signal light for straight-moving traffic is relevant for the scene. This however is an ambiguous situation that could be potentially difficult to classify without conceptual knowledge about the traffic, which a blackbox model may not necessarily possess. Such blindspots can be identified with the help of a human-in-the-loop. • Safety Blindspots: Datapoints whose misclassification by the specific trained model at the component level could compromise the safety of the system which the component is a part of, constitute safety blindspots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed Method</head><p>In contexts which are subjective in nature or when human contextual knowledge plays a major role, current active learning methods based purely on model knowledge do not tend to perform well <ref type="bibr" target="#b1">[2]</ref>. Safety in particular is a complex concept involving other environmental and situational factors. Since the onus in active learning is on a particular component, one can not discuss safety as it is a system-level concept. However, it is possible to think about the safety implications of a mislabelled or ambiguous data point. A human-in-the-loop can help in identifying certain conceptual blindspots which are not covered under the model and data blindspots as discussed in the section above. Although human-in-the-loop involves effort in terms of labelling, active learning acquisition functions ensure that only a fraction of the data points which are most critical according to the chosen criterion have to be labelled by the humans, thereby solving the scalability issue. Human bias is always a factor in labelling but classic methods in active learning such as inter-annotator agreement can be used to mitigate this problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Perceptual Ambiguity</head><p>Data points which the annotator perceives to be potentially ambiguous could be rejected and removed from the training set. However, a black and white approach of reject and accept is not suitable in many cases, such as traffic related tasks. Many data points could be slightly ambiguous yet interesting to include in the dataset for diversity and task relevance. Conservatively rejecting all data points the annotator perceives to be slightly ambiguous leads to lesser diversity in the training set. These constitute human-identified blindspots and provide additional information for data selection. Thus, it would be useful to quantify the level of ambiguity and underconfidence that the annotator feels for each data point as very low, low, medium, high or very high. A secondary model can be trained to predict the level of perceptual ambiguity with the help of human feedback and this could assist in better data selection for active learning querying under a limited budget. We propose table 1 as reference for the  Consider figure <ref type="figure">2</ref> from the traffic light detection dataset presented in <ref type="bibr" target="#b19">[20]</ref>. There are two traffic lights visible in the image, which is a source of ambiguity. Additionally, at night the tail lights of traffic ahead may constitute distracting features which may affect the label prediction. In figure <ref type="figure">3</ref> also from the same dataset, one can see that once again there is an ambiguity in the class label on first sight. However considering that the ego vehicle is in the middle lane, with proper conceptual knowledge it can be presumed that the traffic light for straight-moving traffic is the relevant one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Criticality Assessment</head><p>We consider the safety awareness of the data labeling process through the concept of criticality assessment and thereby aim to tackle safety blindspots discussed above. The idea behind it is to estimate the importance of a specific image regarding a task according to the global risk it could represent on a system facing that task. The global risk is here the combination of two factors. These two factors are the severity, i.e., estimated safety consequences if the system fails the task, and the exposure, i.e., the estimated probability of this fail. In the context of traffic light classification, this severity concerns the expected consequences if this traffic light is misclassified, and it will depend on which class is misclassified (i.e., green light misclassified as red/orange light, or red/orange light misclassified as green light) and the different visible environmental parameter which can participate in the possible accidents (e.g., pedestrian crossing, road intersection). The exposure is estimated by detecting the different visible factors that could cause the misclassification (e.g., camera obstruction or corruption, weather conditions). We focus here on the risk assessment at the component level without considering the whole system's capabilities and interactions with the other components and subsystems.</p><p>To include this active learning approach in a complete safety engineering process, the requirements identified in the preliminary analysis shall be considered to adapt this score on it. A first question to estimate the severity level is presented to the human annotator. We formulate the question as "How do you estimate the consequences on accidents risk if the automated driving system misclassifies this traffic light?"(As shown in Figure <ref type="figure" target="#fig_3">5</ref>) with the possible answers "Negligible", "Light", "Severe", and "Fatal". We associate each of these answers to a value (zero for "Negligible") and if the human rater do not select the answer "Negligible" i.e., if the severity score is higher than zero, we ask another question for the exposure estimation: "Can you see any factor that might bother this traffic light identification?" with the answers "Yes", "No". If the rater answers "No", the exposure value is zero. Else we ask additional questions to identify these factors. Each factor is associated with an exposure value defined in amount by the expert and not visible by the human rater. We can then compute the criticality score with the formula ( ∑︀ 𝑛 𝑘=1 𝑓 𝑘 * 𝑒 𝑘 ) * 𝑠 where 𝑛 is the number of identified factor, 𝑓 is a boolean vector which represent the presence/absence of a factor, 𝑒 is a vector that represent the exposure value for each factor and 𝑠 is the severity score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Continual Learning Model for Perceptual Ambiguity and Criticality</head><p>A continual learning approach would be suitable in a human-in-the-loop environment when the human can initially provide labels and eventually a simple model (different from the main component that is being trained) would be able to replace the human when it reaches a sufficient level of performance. Note that before the continual learning model's misclassifications would only affect the data selection and not the predictions of the main component directly. Along with providing the class labels, the human annotator can be asked to provide the perception ambiguity and severity level associated with the data point. Thus there can be two separate continual learning models attached to the main component modelone to predict perceptual ambiguity and one to predict the severity level of the data point. The model used here could be a shallow neural network with the intermediate features from the main component model. An issue with the continual learning approach is catastrophic forgetting when the model updates itself constantly and forgets what it learnt before. To avoid this, it is necessary to maintain the best representation set of what the model already knows so that when the model is re-trained it can also include this representation set. In this work, we make use of a buffer called the familiarity buffer for this purpose. The familiarity buffer holds a representation of the data points where the model predicts the perceptual ambiguity or the criticality of the data point accurately. When the model encounters data points wherein there is mismatch between the model prediction and the human feedback, that data point is populated in the unfamiliarity buffer. When the unfamiliarity buffer is full, the continual learning model is retrained with the contents of both the familiarity and unfamiliarity buffers. After the re-training, the familiarity buffer of size 'n' is updated. From the contents of both the buffers, the most diverse 'n' data points are chosen to repopulate the familiarity buffer. Finally, the unfamiliarity buffer is emptied.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Uncertainty and Diversity</head><p>The model blindspots and data blindspots can be captured by uncertainty and diversity respectively. They can be calculated as follows:</p><p>• Uncertainty-based querying: In uncertaintybased querying, the model's uncertainty about its predictions is used as a metric for selecting query points <ref type="bibr" target="#b9">[10]</ref>. The model predictions typically contain probability scores associated with each class label. In the ideal scenario, the model should allocate a probability of one to the correct label and zero to all the incorrect labels. Thus, entropy can be used as a measure of the self-evaluated confidence of the model in its own predictions.</p><p>Zero entropy means that the model is perfectly confident in its prediction while an entropy of one is the level of maximum doubt. Entropy of a model with 'c' classes with each class 'i' having a probability 𝑝𝑖 is defined as follows:</p><formula xml:id="formula_0">𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = −𝑝𝑖 𝑐 ∑︁ 𝑛=1 𝑝𝑖𝑙𝑜𝑔(𝑝𝑖)<label>(1)</label></formula><p>Thus, data points with a higher entropy are those with a higher level of uncertainty attached. The queries can be generated such that the most uncertain data points are shown to the human for review. In this work, we use an ensemble of models as in <ref type="bibr" target="#b20">[21]</ref> to generate the average predictive entropy. • Diversity-based querying: The diversity-based querying approach aims to include the data points most different from what the model has previously seen <ref type="bibr" target="#b8">[9]</ref>. For this, one should store a representation of the training data that the model has been trained on. An ideal candidate for this is the distribution of the features at an intermediate layer of the prediction model. The distribution of features of a fully connected (FC) layer in the later layers of a convolutional neural network for the training data points could be computed and then compared with each new data point to obtain a distance score. In this work, we consider an FC layer with 'N' neurons and compute the means and variances of the output values from that layer for all training data points as a new variant of the existing distance based acquisition functions for diversity such as in <ref type="bibr" target="#b7">[8]</ref>. Then, for each new data point, we calculate the Z-score for each of the 'N' features 𝑓1 to 𝑓𝑁 and consider their average.</p><formula xml:id="formula_1">𝑍 − 𝑠𝑐𝑜𝑟𝑒 = ∑︀ 𝑁 𝑛=1 𝑓 𝑖 −𝜇 𝑖 𝜎 𝑖 𝑁<label>(2)</label></formula><p>The higher the Z-score, the more distant the new data point from the known distribution. In this approach, the queries would be generated such that the data points with a higher average Z-score are shown to the human for labelling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Proposed Evaluation Framework</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Planned Experiment</head><p>The first step in the active learning process is training the initial model using the available pool of labelled data. This model would serve as a starting point to generate queries from the unlabelled set. The large pool of unlabelled data is divided randomly into various chunks. Each of these chunks shall be labelled in a particular round of active learning <ref type="bibr" target="#b21">[22]</ref>. In the first round of active learning, the pre-trained model is made use of to generate a query list of the data points to be reviewed and labelled by the human. The selection criteria of the query points is the major challenge in active learning and it depends on the mode of active learning selected as explained above. After all the data points in the first round of active learning are labelled successfully, the model is re-trained with the updated set of labelled data and the next chunk of unlabelled data is selected for the second round of active learning. This process continues till all data points are labelled.</p><p>During the labelling process, the annotators are tasked at providing the class label, perceptual ambiguity level and severity level of each data point on a graphical user interface as shown in figure <ref type="figure" target="#fig_3">5</ref>. If the data point has a high severity and ambiguity level, additional questions can be asked of the annotators to determine the associated criticality score as mentioned above.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Active Learning Acquisition Functions</head><p>In order to demonstrate the effectiveness of the proposed approach, we propose to perform the experiment with the following combinations of acquisition functions:</p><p>• Random: In this mode, N% of images are randomly selected from the subset of unlabelled data in a particular round, and are assigned to the human to label </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Evaluation Metrics</head><p>We propose to use the following evaluation metrics to compare the safety and performance of the proposed approach with that of the pre-existing ones:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.1.">F1-score</head><p>When there is an imbalance in the number of data points in different classes, accuracy might not be a good metric for prediction performance. In this case, F1-score which accounts for both type-I and type-II errors would be a better metric:</p><formula xml:id="formula_2">𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 = 2 * 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 * 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙<label>(3)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.2.">Uncertainty Reduction</head><p>The goal of training a model is to generalize its knowledge over the assigned task and therefore perform well on the unseen test set. As mentioned above, entropy is a good measure of prediction power of a model when the label probabilities are available. Therefore, we can use entropy over the test set as one of the measures of how the model uncertainty is reduced. Note that while the reduction of uncertainty is good, it has to be viewed in tandem with other metrics such as accuracy, precision, recall or F1-score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.3.">Query Relevance</head><p>In each round of active learning, N% of the data is selected as query points to be shown to the human. It is necessary to measure if the selected points are indeed the best ones. One of the ways to do this is to measure the difference in the relevant scores (an average of the uncertainty, diversity, criticality and perceptual ambiguity scores for each point) in predictions of the human labelled points and those of the auto-labelled points. The larger the difference between these sets, the more relevant are the selected query points. For the random mode, the query relevance is expected to be the least because the points are selected randomly without considering their relevance in active learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.4.">Safety-weighted Accuracy</head><p>To consider the importance of each input data on the safety relevancy for a machine learning model training, we can reuse the accuracy metric used to evaluate the performance of classification models and adapt it to criticality aspects. Given the safety requirements identified through the Hazard Analysis and Risk Assessment (HARA) methods and all the relevant Operating Conditions (OCs) visible on the input data, a safety expert identifies the possible hazardous scenarios that could be caused by misclassification of this input data (with a minimum probability of occurrence), and weight the score associated to this input on the visible risk. The OCs are any relevant parameters to describe the system's usage scenarios, including environmental conditions, dynamic elements, and scenery. As the criticality assessment presented in section 3.2, the risk evaluation is decomposed into severity and exposure factors. We estimate for each input the Safety Integrity Level (SIL), presented in the IEC 61508 <ref type="bibr" target="#b22">[23]</ref> standard, with the severity and exposures scores and the risk matrix. We then give each input an integer score between one to four, and we compute the model safety-weighted accuracy as follows:</p><formula xml:id="formula_3">∑︀ 𝑛 𝑘=1 𝑠𝑖𝑙 𝑘 * 𝑐 𝑘 ∑︀ 𝑠𝑖𝑙</formula><p>With 𝑛 the number of predictions, 𝑠𝑖𝑙 the vector with the SIL scores for all inputs, and 𝑐 a vector with the values of the classification correctness (one if the classification is correct and zero else).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this paper, we introduced the concepts of perceptual ambiguity and criticality, and proposed a model which learns to predict these through continuous feedback from a human in the loop. The proposed approach is aimed at tackling blindspots not covered under current approaches dealing with the uncertainty and diversity sampling methods. An experiment was designed with the goal of testing such a model trained to perform traffic light detection. The work is still in an early stage and the next steps include performing an active learning experiment on a large scale with several volunteers, linking the definition of criticality to concrete safety metrics in the industry, development of other evaluation metrics and testing alternate designs of the continual learning model.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Block diagram of the active learning process</figDesc><graphic coords="2,303.50,150.82,201.60,113.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Ambiguous class labels and distracting features</figDesc><graphic coords="4,94.97,222.26,192.00,108.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Detailed block diagram of the proposed approach</figDesc><graphic coords="6,139.24,84.19,316.80,178.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Graphical user interface</figDesc><graphic coords="6,310.81,304.64,186.98,214.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Perceptual ambiguity levels</figDesc><table><row><cell>Level</cell><cell>Explanation</cell></row><row><cell>Very low</cell><cell>Unambiguous image, label easy to identify</cell></row><row><cell>Low</cell><cell>Distracting features but easy to classify</cell></row><row><cell>Medium</cell><cell>Some ambiguities in identifying the label</cell></row><row><cell>High</cell><cell>Occlusions and ambiguities, hard to classify</cell></row><row><cell>Very high</cell><cell>Corner case with safety implications</cell></row><row><cell>annotators:</cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work is partially funded by TAILOR, an ICT-48 Network of AI Research Excellence Centers funded by EU Horizon 2020 research and innovation programme under grant agreement No 952215.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Eraqi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Moustafa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Honer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.03804</idno>
		<title level="m">End-toend deep learning for steering autonomous vehicles considering temporal dependencies</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Mohseni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pitale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<idno>CoRR abs/1912.09630</idno>
		<ptr target="http://arxiv.org/abs/1912.09630.arXiv:1912.09630" />
		<title level="m">Practical solutions for machine learning safety in autonomous vehicles</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2203.15006</idno>
		<title level="m">Tl-gan: Improving traffic light recognition via data synthesis for autonomous driving</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Geary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gouk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ramamoorthy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2110.04580</idno>
		<title level="m">Active altruism learning and information sufficiency for autonomous driving</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Scalable active learning for object detection</title>
		<author>
			<persName><forename type="first">E</forename><surname>Haussmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fenzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chitta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ivanecky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mittel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Koumchatzky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Farabet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Alvarez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE intelligent vehicles symposium (iv)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="1430" to="1435" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Localizationaware active learning for object detection</title>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Kao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-Y</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Asian Conference on Computer Vision</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="506" to="522" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Monarch</surname></persName>
		</author>
		<title level="m">Human-in-the-Loop Machine Learning</title>
				<imprint>
			<publisher>Manning Publications Co</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Geifman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>El-Yaniv</surname></persName>
		</author>
		<idno>CoRR abs/1711.00941</idno>
		<ptr target="http://arxiv.org/abs/1711.00941.arXiv:1711.00941" />
		<title level="m">Deep active learning over the long tail</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Uncertainty sampling based active learning with diversity constraint by sparse selection</title>
		<author>
			<persName><forename type="first">G</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-N</forename><surname>Hwang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wallace</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), IEEE</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multiclass active learning for image classification</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Porikli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Papanikolopoulos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2009 ieee conference on computer vision and pattern recognition</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="2372" to="2379" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Adaptive active learning for image classification</title>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="859" to="866" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The power of ensembles for active learning in image classification</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">H</forename><surname>Beluch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Genewein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nürnberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Köhler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="9368" to="9377" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Deep bayesian active learning with image data</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Gal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ghahramani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1183" to="1192" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An active learning approach with uncertainty, representativeness, and diversity</title>
		<author>
			<persName><forename type="first">T</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Cui</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Scientific World Journal</title>
		<imprint>
			<biblScope unit="page">2014</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Siméoni</surname></persName>
		</author>
		<title level="m">Robust image representation for classification, retrieval and object discovery</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
		<respStmt>
			<orgName>Université rennes1</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Rethinking deep active learning: Using unlabeled data at model training</title>
		<author>
			<persName><forename type="first">O</forename><surname>Siméoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Budnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Avrithis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gravier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">25th International Conference on Pattern Recognition (ICPR)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2020">2020. 2021</date>
			<biblScope unit="page" from="1220" to="1227" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Gissin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shalev-Shwartz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.06347</idno>
		<title level="m">Discriminative active learning</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Active+ semisupervised learning= robust multi-view learning</title>
		<author>
			<persName><forename type="first">I</forename><surname>Muslea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Minton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Knoblock</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICML</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="435" to="442" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Overcoming blind spots in the real world: Leveraging complementary abilities for joint execution</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kamar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Nushi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Horvitz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="6137" to="6145" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>He</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.13316</idno>
		<title level="m">Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Uncertainty quantification and deep ensembles</title>
		<author>
			<persName><forename type="first">R</forename><surname>Rahaman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Upal: Unbiased pool based active learning</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ganti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gray</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Artificial Intelligence and Statistics</title>
				<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="422" to="431" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m">Functional safety of electrical/electronic/programmable electronic safety-related systems</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
		<respStmt>
			<orgName>International Electrotechnical Commission</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
