<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Safety Assurance of Uncertainty-Aware Reinforcement Learning Agents</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Felippe</forename><surname>Schmoeller</surname></persName>
							<email>felippe.schmoeller.da.roza@iks.fraunhofer.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Fraunhofer IKS</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simon</forename><surname>Hadwiger</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Siemens AG</orgName>
								<address>
									<settlement>Nuremberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">University of Wuppertal</orgName>
								<address>
									<settlement>Wuppertal</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ingo</forename><surname>Thorn</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Siemens AG</orgName>
								<address>
									<settlement>Nuremberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Karsten</forename><surname>Roscher</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Fraunhofer IKS</orgName>
								<address>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<address>
									<addrLine>Feb 13-14, |</addrLine>
									<postCode>2023</postCode>
									<settlement>Washington</settlement>
									<region>D.C</region>
									<country>US</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Safety Assurance of Uncertainty-Aware Reinforcement Learning Agents</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7F1785A550F7CE51EDFE3EF6DBD6AE10</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-04-29T06:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Uncertainty estimation</term>
					<term>Distributional shifts</term>
					<term>Reinforcement Learning</term>
					<term>Functional Safety</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The necessity of demonstrating that Machine Learning (ML) systems can be safe escalates with the ever-increasing expectation of deploying such systems to solve real-world tasks. While recent advancements in Deep Learning reignited the conviction that ML can perform at the human level of reasoning, the dimensionality and complexity added by Deep Neural Networks pose a challenge to using classical safety verification methods. While some progress has been made towards making verification and validation possible in the supervised learning landscape, works focusing on sequential decision-making tasks are still sparse. A particularly popular approach consists of building uncertainty-aware models, able to identify situations where their predictions might be unreliable. In this paper, we provide evidence obtained in simulation to support that uncertainty estimation can also help to identify scenarios where Reinforcement Learning (RL) agents can cause accidents when facing obstacles semantically different from the ones experienced while learning, focusing on industrial-grade applications. We also discuss the aspects we consider necessary for building a safety assurance case for uncertainty-aware RL models.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>This position paper is presented to serve as motivation for the long-term objective of using the uncertainty estimation capabilities of a Reinforcement Learning (RL) agent to improve its functional safety and enable RL as a viable framework to be deployed in industrial-grade applications. Although not a new concept, recent accomplishments have reignited the interest in using RL as a viable method to obtain agents able to interact with a wide range of environments (see <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3]</ref>). These results were only possible due to the integration of Deep Neural Networks (DNNs) as function approximators for RL agents.</p><p>According to some authors (e.g., <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>), the industry is eager to apply Machine Learning (ML) and DNNs more broadly in their processes, with the possibility to increase the safety level by aiding humans in processes that are potentially harmful or even automate complex tasks beyond human capabilities. According to <ref type="bibr" target="#b6">[7]</ref>, possible applications include aircraft control, power systems, medical systems, and the automotive domain. However, despite the expected gains, industrial players are historically very conservative and, most of the time, only adopt new technologies when there is enough evidence supporting their reliability and cost-effectiveness, which is still not possible for some ML paradigms.</p><p>DNNs excel at learning complex representations from a bulk of data, allowing to reach state-of-the-art performance in tasks such as computer vision, natural language processing, and control of autonomous systems. However, DNNs are too complex and have too many parameters to be verified using standard verification and validation methods. On top of that, DNN models are often overconfident and incapable of recognizing that their predictions might be wrong <ref type="bibr" target="#b7">[8]</ref>. The combination of these factors has put DNNs at the center of safe AI research in the past few years. The main goal is to guarantee that DNNs can be safe, reliable, secure, robust, explainable, and fair <ref type="bibr" target="#b6">[7]</ref>.</p><p>Another difficulty with DNNs, which also extends to Deep RL, is formalizing how capable they are of generalizing over novel instances. Despite the excellent results obtained with known benchmarks, different findings show that DNNs are susceptible to distributional shifts (e.g., <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>). That means that the model output is not reliable when fed with data drawn from a distribution that differs from its training data distribution, i.e., out-of-distribution (OOD) instances. When considering autonomous systems controlled by RL agents, there is the risk of accidents when facing OOD scenarios. This issue can be solved by making sure the model is trained with data that covers every aspect it might encounter after deployment, which is intractable for open-world complex tasks. Alternatively, some methods have been suggested to make DNNs robust to distributional shifts, such as in <ref type="bibr" target="#b10">[11]</ref>. However, making DNNs able to handle distributional shifts is a challenging task and the existing methods are limited. We follow a different direction, which consists in using a monitor to identify the OOD instances. Once OOD is detected, the system can switch to a safe control policy to avoid accidents caused by the agent's inabilities (that could be as simple as "stop and wait for help"). We follow the hypothesis that uncertainty should grow higher when facing the unknown (same as given in <ref type="bibr" target="#b11">[12]</ref>) and use uncertainty estimation as a proxy metric to classify OOD inputs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Scope and structure of the paper</head><p>This paper aims at showing how uncertainty-based OOD detection can help in the long-term goal of building a solid safety case for RL agents, which must be backed by convincing safety arguments. That is not the only factor necessary to make certification of RL models possible, but one of the most important aspects. The paper will focus on industrial applications of automated guided vehicles (AGVs). Industrial environments are mostly guided by specific regulations that are helpful when outlining the system requirements and specifications in terms of safety. We believe this can also be used as a starting point when expanding the framework to a more general case, covering a larger range of open-world applications.</p><p>To validate the potential of this approach to help with deriving strong safety arguments, experiments with an environment that simulates the application of transporting goods with a vision-based AGV in warehouses were conducted. The obtained results indicate that uncertainty estimation and OOD detection can help to identify unknown situations which, in some cases, lead to accidents. At the end of the document,</p><p>The document is structured as follows: section 2 shows publications available in the literature to serve as background and motivation for this paper. In section 3 the uncertainty-aware RL algorithm is shown. Section 4 contains the experiments and preliminary results, and section 5 presents a short discussion and the future steps we believe are necessary for building the safety assurance case for uncertainty-aware RL systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Publications investigating safety assurance cases for RL systems are limited. Therefore, we will start with relevant works that cover the application of general AI methods in safety-critical applications. That will be followed by works that deal with uncertainty estimation and OOD detection for ML systems, mainly focusing on computer vision problems, and finally, publications that combine uncertainty and RL will be shown. Our work is an intersection of those three topics, with the proposed method being inspired by existing uncertainty quantification approaches and the future outline borrowing ideas from authors that intend to conform AI systems to safety certification processes that are, to the best of our knowledge, very limited when it comes to RL. AI for safety-critical applications: Different authors defend that to enable ML models to solve safety-critical tasks, the models must be assured by evidence that the ML components will behave in accordance with existing safety specifications. <ref type="bibr" target="#b12">[13]</ref> argue that the evidence must cover all aspects necessary to show why these components can be trusted. The authors also present a survey with different methods that help in collecting the evidence for the whole ML lifecycle. In <ref type="bibr" target="#b6">[7]</ref>, an extensive study in neural networks applied to high assurance systems is presented. In <ref type="bibr" target="#b13">[14]</ref>, the authors identify problems that arise when using ML following ISO 26262, a standard that regulates the functional safety of road vehicles. They claim that the use of ML can result in hazards not experienced with conventional software. <ref type="bibr" target="#b14">[15]</ref> also discuss the shortcomings of fitting ML systems to ISO 26262 and how the Safety of the Intended Functionality (SOTIF), published in the ISO PAS 21448, offers a better alternative for safety assurance. The authors also present an extensive list of safety concerns related to DNN models, including the risk of the data distribution not being a good approximation of the real world and the possibility of distributional shifts to happen over time. <ref type="bibr" target="#b15">[16]</ref> also argue that the analysis of ML systems is fundamentally incompatible with traditional safety verification since safety engineering approaches focus on faults at the component level and their interactions with other system components while systemic failures experienced in complex systems are not necessarily consequence of faults from individual parts of the system. Therefore, the safety arguments should also reflect the inherent complexity and unpredictability of ever-changing environments where ML systems are designed to operate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Machine Learning and Uncertainty:</head><p>The impact of uncertainty in Machine Learning is a recurrent topic of research, with a plentiful of publications discussing how ML systems should manage uncertainty and presenting methods to quantify uncertainty. In <ref type="bibr" target="#b16">[17]</ref>, the authors present a more general discussion on the properties of Bayesian Deep Learning models used for computer vision tasks that are affected by aleatoric and epistemic uncertainties (the first is inherent to the system stochastic properties while the former is related to a lack of knowledge). In <ref type="bibr" target="#b17">[18]</ref>, an introduction to the topic of uncertainty in ML models is provided as well as an overview of the main methods for capturing and handling uncertainty. In <ref type="bibr" target="#b18">[19]</ref>, the authors show how autonomous systems are affected by uncertainty and how correctly assessing uncertainty can help towards improving the supervision of inherently unsafe AI systems. Furthermore, a conceptual framework for dynamic dependability management based on uncertainty quantification is presented. In <ref type="bibr" target="#b19">[20]</ref>, uncertainty quantification as a proxy for the detection of OOD samples is discussed, with different methods compared in image classification datasets, namely CIFAR-10, GTSRB, and NWPU-RESISC45. Some popular uncertainty quantification methods for DNN models worth of mentioning are Monte Carlo Dropout <ref type="bibr" target="#b20">[21]</ref>, Deep Ensembles <ref type="bibr" target="#b21">[22]</ref>, and Evidential Deep Learning <ref type="bibr" target="#b22">[23]</ref>.</p><p>Reinforcement Learning and Uncertainty: Most of the work combining uncertainty quantification and ML cover Supervised Learning, with a strong focus on computer vision tasks. However, some literature also shows how uncertainty-aware RL agents can be obtained. A popular application is to use uncertainty to improve exploration. This class of algorithms is motivated by the principle of Optimism in the Face of Uncertainty (OFU) and describes the tradeoff between using high-confidence decisions, that come from the already established knowledge, and the agent's need to explore state-action pairs with high epistemic uncertainty <ref type="bibr" target="#b23">[24]</ref>.</p><p>However, this paper will rather focus on uncertainty as a proxy for detecting domain shifts in decision-making agents. In <ref type="bibr" target="#b24">[25]</ref> it is proposed to define the data distributions in terms of the elements that compose a Markov Decision Process (MDP), where minor disturbances should fall under the generalization umbrella and large deviations represent OOD samples. However, determining which semantic properties represent such changes and how to measure them is left as an open question. In <ref type="bibr" target="#b25">[26]</ref>, the authors present an uncertainty-aware model-based learning algorithm that adds statistical uncertainty estimates combining bootstrapped neural networks and Monte Carlo Dropout to its collision predictor. Mobile robot environments are used to show that the agent acts more cautiously when facing unfamiliar scenarios and increases the robot's velocity when it has high confidence. In <ref type="bibr" target="#b26">[27]</ref> this method is extended to environments with moving obstacles. The authors also combine Monte Carlo dropout and deep ensembles with LSTM models to obtain uncertainty estimates. A Model Predictive Controller (MPC) is responsible to find the optimal action that minimizes the mean and variance of the collision predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Background</head><p>In this section, we present the background for each component of the proposed uncertainty-aware RL algorithm. Different uncertainty quantification methods could be used, but Variational Auto Encoders (VAEs) are an interesting choice for vision-based systems. They are considered robust models, are trained in an unsupervised manner (i.e., labeling samples is not necessary), are fast to train, and their generalization capabilities can be visually inspected by comparing the input and reconstructed images. However, the safety argumentation would benefit from a comparison between different alternatives, with the strengths and deficiencies of each approach addressed, which will remain as a future work suggestion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Reinforcement Learning</head><p>In RL, we consider an agent that sequentially interacts with an environment modeled as an MDP. An MDP is a tuple ℳ := (𝑆, 𝐴, 𝑅, 𝑃, 𝜇0), where 𝑆 is the set of states, 𝐴 is the set of actions, 𝑅 : 𝑆 × 𝐴 × 𝑆 ↦ → R is the reward function, 𝑃 : 𝑆 × 𝐴 × 𝑆 ↦ → [0, 1] is the transition probability function which describes the system dynamics, where 𝑃 (𝑠𝑡+1|𝑠𝑡, 𝑎𝑡) is the probability of transitioning to state 𝑠𝑡+1, given that the previous state was 𝑠𝑡 and the agent took action 𝑎𝑡, and 𝜇0 : 𝑆 ↦ → [0, 1] is the starting state distribution. At each time step, the agent observes the current state 𝑠𝑡 ∈ 𝑆, takes an action 𝑎𝑡 ∈ 𝐴, transitions to the next state 𝑠𝑡+1 drawn from the distribution 𝑃 (𝑠𝑡, 𝑎𝑡), and receives a reward 𝑅(𝑠𝑡, 𝑎𝑡, 𝑠𝑡+1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Variational Auto Encoders</head><p>VAEs are a popular class of deep probabilistic generative models <ref type="bibr" target="#b27">[28]</ref>. Autoencoders follow a simple encoderdecoder structure, where the model parameters are optimized to minimize the difference between the input sample and the decoded data, as shown in Figure <ref type="figure" target="#fig_0">1</ref>. The trained model is able to compress the inputs into a latent representation with a smaller dimension. VAEs extend regular autoencoders by substituting the exact inference of the likelihood by the lower bound of the log-likelihood, given by the evidence lower bound (ELBO):</p><formula xml:id="formula_0">log 𝑝 𝜃 (x) ≥ ℰ 𝑞 𝜑 (𝑧|𝑥) [log 𝑝 𝜃 (𝑥|𝑧)]− 𝐷𝐾𝐿[𝑞 𝜑 (𝑧|𝑥)||𝑝(𝑧)] ≜ ℒ(𝑥; 𝜃, 𝜑),<label>(1)</label></formula><p>where 𝑥 is the observed variable, 𝑧 is the latent variable with prior 𝑝(𝑧) and a conditional distribution 𝑝 𝜃 (𝑥|𝑧), 𝑞 𝜑 (𝑧|𝑥) is an approximation to the true posterior distribution 𝑝 𝜃 (𝑧|𝑥). 𝑞 𝜑 (𝑧|𝑥) and 𝑝 𝜃 (𝑥|𝑧) are neural networks parametrized by 𝜑 and 𝜃 (encoder and decoder, respectively). 𝐷𝐾𝐿 is the Kullback-Leibler divergence. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Uncertainty estimation based on Variational Auto Encoders</head><p>OOD detection using VAEs assumes that the model assigns higher likelihoods to the samples drawn from the in-distribution (ID) pool than the OOD samples, which is valid for different benchmarks as shown in <ref type="bibr" target="#b11">[12]</ref>. Metrics derived from the model likelihood are then used as uncertainty estimates. We follow the Evidence Lower Bound (ELBO) Ratio method proposed in the same paper, which represents the ratio of lower bounds of the log-likelihood of a given sample and the maximum ELBO obtained with the ID samples <ref type="bibr" target="#b11">[12]</ref>. For notation simplification, considering a fixed VAE model parametrized by 𝜑 and 𝜃, the ELBO value ℒ(𝑥; 𝜃, 𝜑) will be represented as 𝐸𝐿𝐵𝑂(𝑥), with 𝐸𝐿𝐵𝑂𝐼 (𝑥) representing the ELBO for a VAE model only trained with ID samples. Following this notation, the ELBO Ratio uncertainty 𝒰(𝑥0) for an arbitrary input 𝑥0 is shown in equation 2.</p><formula xml:id="formula_1">𝒰(𝑥0) = 𝐸𝐿𝐵𝑂(𝑥0) 𝐸𝐿𝐵𝑂𝐼 (𝑥𝑚𝑎𝑥) ,<label>(2)</label></formula><p>where 𝐸𝐿𝐵𝑂𝐼 (𝑥𝑚𝑎𝑥) is the maximum 𝐸𝐿𝐵𝑂 value calculated for all ID samples (a sort of calibration based on the training data).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments and Preliminary Results</head><p>Environment: To better support the proposed idea, experiments were conducted, and the preliminary results will be presented as further evidence. For the experiments, a custom environment was created using PyBullet <ref type="bibr" target="#b28">[29]</ref>. It was designed to represent a warehouse with a configurable layout limited by walls, goods to be transported by an automated guided vehicle (AGV), and a set of obstacles that might be in the way. The goal is to reach a certain location that contains a good to be transported, represented by a wooden pallet, while avoiding obstacles or hitting the walls. An RGB camera is attached to the AGV and its control decisions are made based on the state 𝑠𝑡 encoded by the input images and the coordinates of the AGV and the goal. The image resolution can be configured, but for the results shown below, RGB images with 84 x 84 pixels were used. The observation encoding also includes the positions of the AGV and the goal. The AGV action is a 2-dimensional vector, 𝑢𝑡, representing the linear and angular velocities. A reward of 100 is given if the agent reaches the goal position, -100 if it hits an obstacle, and -10 if it times out (i.e., it reaches the maximum number of steps).</p><p>To attest to the capacity of the uncertainty estimator to spot critical failures that might be related to OOD instances, an ID and an OOD environment were designed. The differences consist of the type of static obstacles present in each environment, with obstacles that differ in color and shape, as shown in figure <ref type="figure" target="#fig_1">2</ref>.</p><p>AGV controller framework: The controller used to solve the motion planning described above is shown in figure <ref type="figure" target="#fig_2">3</ref>. The first module is a path planner, responsible to determine the optimal path to reach the goal position based on the agent's location. The planner takes the AGV kinematic model and solves the planning with the 𝐺 1 Hermite Interpolation Problem with clothoids. Interpolating a sequence of waypoints using clothoid splines will result in a smooth trajectory, suitable for the motion planning of mobile robots, as shown in <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>. The planner takes a simplified observation 𝑠 ˜𝑡, consisting of the AGV and goal coordinates, as input. Its output is a position in the polar coordinate system 𝑝𝑡 = (𝜌𝑡, 𝜃𝑡), where 𝜌𝑡 and 𝜃𝑡 are the radial and angular coordinates at time 𝑡, respectively. Note that the planner does not account for obstacles, since it is assumed that obstacles are not known a priori and the RL agent should be re- sponsible to react and adjust if an unexpected obstacle is in the way. The second module is a non-linear controller used to calculate the control action 𝑢𝑡 necessary to reach the coordinate 𝑝𝑡. The last module is the RL agent. Its goal is to follow the proposed trajectory, i.e., keeping 𝑢𝑡 ≈ 𝑢 * 𝑡 as much as possible, proposing a different control action 𝑢 * 𝑡 ̸ = 𝑢𝑡 only to avoid a collision. To fulfil this task, an intrinsic reward 𝑟𝑖 𝑡 was added, with 𝑟𝑖 𝑡 = 0.0 if 𝑢 * 𝑡 = 𝑢𝑡 (a small difference is tolerated) and 𝑟𝑖 𝑡 = −0.1 otherwise. The optimal policy becomes a tradeoff between avoiding the risk of collision (with the expressive -100 reward as punishment) and following the path planner to avoid the small punishments. The RL agent was trained in the ID environment using the Soft Actor-Critic algorithm <ref type="bibr" target="#b31">[32]</ref>.</p><p>Uncertainty estimator: The VAE uncertainty estimation model was trained to fit instances randomly sampled from the ID environment in a Supervised Learning manner. To that end, 20.000 images were collected from the ID environment and 2.000 from the OOD, which are used for validation purposes during the model training. The model was trained for 10 epochs.</p><p>After training the RL agent and the VAE uncertainty estimator, rollouts are performed in the OOD environment with this agent, and (state, action, reward) tuples are saved for post-analysis. The episode termination states are then passed through the uncertainty estimator to verify if crashes present a significant correlation to high uncertainty levels. The hypothesis is that if a crash happens due to the agent not being able to avoid an obstacle semantically different from the ones experienced during training, the OOD detector could flag this instance before the crash occurs. ID inputs on the other hand should signal low uncertainty, indicating that the RL agent is able to handle such situations. It is worth mentioning that these experiments only consider a very limited number of distinguishing features for the OOD obstacles. Since   in reality the number of unknown obstacles can be extremely high, these experiments should be extended to a set of obstacles that is statistically significant to the problem dimension. Figure <ref type="figure" target="#fig_4">4</ref> shows how the VAE learns to reconstruct the images observed in the environment populated with ID obstacles, with the input and reconstructed images. After 10 epochs of training, the obstacles are recovered with a good definition. However, the model is not able to reconstruct the floor textures completely, which is of minor relevance in this scenario but should be investigated if such features would represent safety-critical aspects (e.g., oil in the floor, large cracks or holes).</p><p>Figure <ref type="figure" target="#fig_5">5</ref> on the other hand, represents the same model trained in the ID environment trying to reconstruct images with OOD obstacles in it. It is visible that, even after 10 epochs of training, the model is not able to recover the obstacle color or shape correctly, with blurred obstacles rendered in the output. That inability to correctly compress and decompress the images with OOD obstacles is responsible for increasing the calculated uncertainty.</p><p>Figure <ref type="figure" target="#fig_6">6</ref> shows the obtained results for the RL agent running in the OOD environment. The agent ran for 10.000 steps, which was equivalent to around 70 episodes. The y-axis represents the ELBO Ratio, which was normalized to get the values in the interval [0,1]. Episodes that ended with a crash are represented by the red bars while the blue bars picture the remaining episodes. The results show that some crash episodes presented high uncer- tainty, while very few non-crash episodes presented significant uncertainty levels. On the other hand, some failures did not trigger a high uncertainty level. These states could represent residual insufficiencies of the trained RL agent (e.g., caused by lack of training), that the OOD detector is not accurate for these inputs, or that the collision was not caused by an OOD element (e.g., the AGV crashed to a wall). To attest to the calibration of the uncertainty quantification, the same experiment was repeated in the ID environment, with the results shown in figure <ref type="figure" target="#fig_7">7</ref>.</p><p>The ELBO Ratio values are much lower for the entirety of the episodes and more consistent. That is expected, since in this case all the states should be considered ID, showing that the VAE is not outputting false positives for these data samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and Future Perspective</head><p>This paper focuses on motivating the promising perspective of using uncertainty quantification for improving the safety case of RL systems deployed in industrial applications, concentrating on camera-based systems. For that end, an environment modeling a typical warehouse was created. The preliminary results obtained with a VAE-based uncertainty estimator suggest this monitor can distinguish some of the states that result in accidents related to environmental distributional shifts. However, it is important to notice that not all accidents are caused by OOD obstacles, but can rather be influenced by the reward function definition, observation encoding, model generalization capabilities, among other aspects. Identifying and separating accidents caused by the inability of the agent to handle novel obstacles from accidents caused by other unrelated limitations is necessary before assessing the effectiveness of the OOD detection monitor. Many published works already discuss the importance of uncertainty estimation and OOD detection in the whole Safe AI spectrum, but we believe a more structured way to integrate these systems and empirical results to create a compelling safety assurance case is needed, especially for RL systems. To reach this long-term goal, we suggest the following future steps:</p><p>• Operational Design Domain (ODD) <ref type="bibr" target="#b32">[33]</ref>: In real-world applications, the number of contextual combination possibilities makes any attempt for extensive testing intractable. Therefore, precise system specification is paramount before starting to build the assurance case. The ODD should include all contextual information that covers the intended operation of the system. • Extensive experimentation: Once an appropriate ODD is derived, the experiments described in this document can be extended to a much broader scope. Varying parameters, changing scenario configuration, considering more obstacles, and adding sensor noise are just a few aspects that should be extensively considered.</p><p>Strong safety arguments will depend on the experiments achieving a high statistical confidence level for the contexts described in the ODD. This should also include multiple uncertainty estimation methods, not covered in this paper. • Qualitative analysis: Understanding the system at a higher level of abstraction is also important to build a strong safety case. For that, it is important to visualize the scenarios that lead to high or low uncertainty and try to understand patterns that lead to wrong predictions, outliers, false positives and negatives, etc. • Residual error: The uncertainty monitor is not intended to cover every safety aspect, but rather covers failures caused by the inability of the system to handle domain shifts. Therefore, risks associated with other aspects will still be present and should be addressed by other methods. • Integration of uncertainty monitor and RL agent: This paper focuses on how OOD scenarios might lead to system failures and how OOD detection can help in detecting such states before the failure happens. However, an important question is not addressed here and should be a high priority next step: what to do when an OOD input is detected? In other words, how to integrate OOD detection and a safe fallback policy into the decision-making system. • Failure rate calibration: The uncertainty values are not sufficient to estimate a failure probability because an OOD instance does not necessarily imply a failure will happen. However, upper bound probabilities could be derived from the uncertainty estimates, i.e., if the model predicts that there is a 30% probability of the 𝑠𝑡 being OOD, the risk of failures caused by distributional shifts should be below 30%. • SOTIF: As shown in Section 2, traditional functional safety standards fail to properly address ML systems. In contrast, SOTIF is a much more appropriate framework to build a safety argumentation for such cases. However, building an assurance case based on an uncertainty-aware RL agent, to the best of our knowledge, was not yet done. In SOTIF it is necessary to attest to the absence of unreasonable risk due to hazards resulting from functional insufficiencies of the intended functionality, which is challenging due to the nature of model-free RL and sequential decision-making systems in general.</p><p>Not necessarily those items were touched on in this paper, but this list serves as a roadmap to guide our research efforts in the near future, as we believe that covering these points in deeper detail will result in incremental progress towards achieving a sound argumentation to enable uncertainty-aware RL agents to be deployed in safety-critical applications.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example of an autoencoder network.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Examples of ID and OOD obstacles (top images and bottom images respectively). In the ID scenario, the obstacles are blue and dark red, while the OOD obstacles are green.</figDesc><graphic coords="4,302.62,84.19,203.36,136.03" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: RL-based controller framework.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>(a) ID input images. (b) ID reconstructed images.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: VAE model compression-decompression capabilities with ID images after 10 epochs of training.</figDesc><graphic coords="5,303.63,212.67,99.65,71.77" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: VAE model compression-decompression capabilities with OOD images after 10 epochs of training.</figDesc><graphic coords="5,405.33,213.53,99.65,70.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Uncertainty estimates on terminating states of episodes for the OOD environment.</figDesc><graphic coords="6,101.49,84.19,178.96,135.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Uncertainty estimates on terminating states of episodes for the ID environment.</figDesc><graphic coords="6,314.82,84.19,178.96,134.63" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was funded by the Bavarian Ministry for Economic Affairs, Regional Development and Energy as part of a project to support the thematic development of the Institute for Cognitive Systems.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Mnih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Antonoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wierstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedmiller</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.5602</idno>
		<title level="m">Playing atari with deep reinforcement learning</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Mastering the game of go without human knowledge</title>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schrittwieser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Antonoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hubert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Baker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bolton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">nature</title>
		<imprint>
			<biblScope unit="volume">550</biblScope>
			<biblScope unit="page" from="354" to="359" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Brockman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Cheung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dębiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dennison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Farhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hashme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1912.06680</idno>
		<title level="m">Dota 2 with large scale deep reinforcement learning</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Securing collaborative deep learning in industrial applications within adversarial scenarios</title>
		<author>
			<persName><forename type="first">C</forename><surname>Esposito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Aljawarneh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Choi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Industrial Informatics</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="4972" to="4981" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Deep learning in the industrial internet of things: Potentials, challenges, and emerging applications</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Khalil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Saeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Masood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">M</forename><surname>Fard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-S</forename><surname>Alouini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Y</forename><surname>Al-Naffouri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Internet of Things Journal</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="11016" to="11040" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Exploring the role of deep learning in industrial applications: a case study on coastal crane casting recognition</title>
		<author>
			<persName><forename type="first">M</forename><surname>Maqsood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Mehmood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kharel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Muhammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Alnumay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Hum. Cent. Comput. Inf. Sci</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1" to="14" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M P</forename><surname>Schumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<title level="m">Applications of neural networks in high assurance systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">268</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Schwaiger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Henne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Küppers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">S</forename><surname>Roza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Roscher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Haselhoff</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.02971</idno>
		<title level="m">From black-box to whitebox: Examining confidence calibration under different conditions</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Can autonomous vehicles identify, recover from, and adapt to distribution shifts?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Filos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tigkas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mcallister</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rhinehart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3145" to="3153" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Test-time training with self-supervision for generalization under distribution shifts</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Efros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9229" to="9248" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An effective baseline for robustness to distributional shift</title>
		<author>
			<persName><forename type="first">S</forename><surname>Thulasidasan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thapa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dhaubhadel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chennupati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bhattacharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bilmes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">20th IEEE International Conference on Machine Learning and Applications (ICMLA)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="278" to="285" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Ran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<title level="m">Detecting out-of-distribution samples via variational autoencoder with reliable uncertainty estimation</title>
				<imprint>
			<publisher>Neural Networks</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Ashmore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Calinescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Paterson</surname></persName>
		</author>
		<title level="m">Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Salay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Queiroz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Czarnecki</surname></persName>
		</author>
		<title level="m">An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks</title>
		<author>
			<persName><forename type="first">O</forename><surname>Willers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sudholt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Raafatnia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Abrecht</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Safety, Complexity, and Automated Driving: Holistic Perspectives on Safety Assurance</title>
		<author>
			<persName><forename type="first">S</forename><surname>Burton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Mcdermid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Garnett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weaver</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">What uncertainties do we need in bayesian deep learning for computer vision?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kendall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Hüllermeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Waegeman</surname></persName>
		</author>
		<title level="m">Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods</title>
				<imprint>
			<publisher>Machine Learning</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Managing uncertainty of ai-based perception for autonomous systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Henne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schwaiger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weiss</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AISafety@ IJCAI</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="11" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Is uncertainty quantification in deep learning sufficient for out-of-distribution detection?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Schwaiger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sinhamahapatra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gansloser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Roscher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AISafety@ IJCAI</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Dropout as a bayesian approximation: Representing model uncertainty in deep learning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Gal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ghahramani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">international conference on machine learning</title>
				<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1050" to="1059" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Simple and scalable predictive uncertainty estimation using deep ensembles</title>
		<author>
			<persName><forename type="first">B</forename><surname>Lakshminarayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pritzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Blundell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Evidential deep learning to quantify classification uncertainty</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sensoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kandemir</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2109.06668</idno>
		<title level="m">Exploration in deep reinforcement learning: a comprehensive survey</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Domain shifts in reinforcement learning: Identifying disturbances in environments</title>
		<author>
			<persName><forename type="first">T</forename><surname>Haider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">S</forename><surname>Roza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Eilers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Roscher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Günnemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AISafety@ IJCAI</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Kahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Villaflor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Abbeel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1702.01182</idno>
		<title level="m">Uncertainty-aware reinforcement learning for collision avoidance</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Safe reinforcement learning with model uncertainty estimates</title>
		<author>
			<persName><forename type="first">B</forename><surname>Lütjens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Everett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>How</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2019 International Conference on Robotics and Automation (ICRA), IEEE</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="8662" to="8668" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6114</idno>
		<title level="m">Auto-encoding variational bayes</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Coumans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bai</surname></persName>
		</author>
		<title level="m">Pybullet, a python module for physics simulation for games, robotics and machine learning</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">G1 fitting with clothoids</title>
		<author>
			<persName><forename type="first">E</forename><surname>Bertolazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Frego</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mathematical Methods in the Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="881" to="897" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Path planning maximising human comfort for assistive robots</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bevilacqua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Frego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bertolazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fontanelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Palopoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Biral</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2016 IEEE Conference on Control Applications (CCA), IEEE</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1421" to="1427" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor</title>
		<author>
			<persName><forename type="first">T</forename><surname>Haarnoja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Abbeel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ternational conference on machine learning</title>
				<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1861" to="1870" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Operational design domain for automated driving systems</title>
		<author>
			<persName><forename type="first">K</forename><surname>Czarnecki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Waterloo Intelligent Systems Engineering</title>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>WISE</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
