<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Safe CAV lane changes using MARL and control barrier functions</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Bharathkumar</forename><surname>Hegde</surname></persName>
							<email>hegdeb@tcd.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science and Statistics</orgName>
								<orgName type="institution">Trinity College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Melanie</forename><surname>Bouroche</surname></persName>
							<email>melanie.bouroche@tcd.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science and Statistics</orgName>
								<orgName type="institution">Trinity College Dublin</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Safe CAV lane changes using MARL and control barrier functions</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">807EDA5E636E193A54A30753ADD68040</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Connected and Autonomous Vehicle (CAV)</term>
					<term>Lane change</term>
					<term>Control Barrier Functions (CBF)</term>
					<term>Artificial Intelligence (AI)</term>
					<term>Multi-Agent Reinforcement Learning (MARL)</term>
					<term>Multi-Agent Systems (MAS)</term>
					<term>Deep learning (DL)</term>
					<term>Intelligent Transportation System (ITS) Orcid 0000-0002-2085-7867 (B. Hegde); 0000-0002-5039-0815 (M. Bouroche)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Connected and Autonomous Vehicles (CAVs) are expected to improve road safety and traffic efficiency in the near future. Recently, Multi-Agent Reinforcement Learning (MARL) algorithms have been applied to optimise lane change control decisions to improve the average speed of CAVs. The MARL algorithms, however, are limited by a lack of safety guarantees. Control Barrier Functions (CBFs) have been used for ensuring safety of a Reinforcement Learning (RL) agent performing safety-critical control tasks such as robotic navigation and autonomous driving. In this work, the CBF has been defined for a Multi-Agent System (MAS) of CAVs to ensure safety of a MARL lane change controller with three major contributions. The first is an architecture to integrate the high-level behavioural layer with a safe controller at the low-level motion planning layer. The high-level control layer implements a state-of-the-art MARL lane change controller, while the safe low-level motion planning layer constrains the vehicle to safe states using CBF functions. Secondly, multi-agent actor dependencies are defined to ensure that control decisions are made by CAVs in a specific order. Finally, decentralised CBF constraint formulations are defined to comply with the safety specifications. The proposed design, CBF-CAV, can guarantee safe manoeuvres while executing a behavioural control decision made by the MARL controller.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>As a result of the increasing trend in private vehicle ownership, there are over a billion vehicles in the world's motor fleet currently, and this is expected to continue growing in the near future <ref type="bibr" target="#b0">[1]</ref>. This trend is likely to cause increased congestion and road accidents. According to a report from the World Economic Forum (2018), road congestion cost 87 billion dollars to the US economy in 2018 due to loss of productivity <ref type="bibr" target="#b1">[2]</ref>. Furthermore, a European Union (EU) report states that around 78% of road crashes are considered to be a result of human errors <ref type="bibr" target="#b2">[3]</ref>. To minimise congestion and improve traffic safety, Autonomous Vehicles (AVs) are considered one of the main interventions in Intelligent Transportation Systems (ITS) <ref type="bibr" target="#b3">[4]</ref>.</p><p>AV technologies are evolving with the developments in communication technologies and Artificial Intelligence (AI). Connected Autonomous Vehicles (CAVs) are leveraging recent advancements in vehicular communication (V2X) technologies to make collaborated manoeuvres to improve traffic safety and efficiency. AI has been a popular option to solve some of the complex problems in AV technologies, such as localisation, mapping, perception, route planning, and motion control <ref type="bibr" target="#b4">[5]</ref>. For CAV motion controllers specifically, our previous work shows that Multi-Agent Reinforcement Learning (MARL) is a popular choice <ref type="bibr" target="#b5">[6]</ref>. Lane changing is one of the complex problems in motion control, as improper lane change may cause a collision that could damage the costly components in AVs or even cause loss of lives. Many forms of MARL using Deep Q-Networks (DQNs) <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>, and Actor-Critic Networks (ACN) <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref> has been applied for designing lane change controllers. Among them, MARL-CAV <ref type="bibr" target="#b11">[12]</ref> is an open-sourced state-of-the-art MARL lane change controller designed for CAVs <ref type="bibr" target="#b12">[13]</ref>. The MARL-CAV significantly improves traffic efficiency and safety. This approach, however, uses a predication-based priority assignment to avoid collisions and encourage safe behaviour, and therefore safety is not guaranteed, which limits its applicability.</p><p>Our previous work identifies that Control Barrier Functions (CBFs) are suitable for ensuring the safety of CAV lane change controllers <ref type="bibr" target="#b12">[13]</ref>. CBFs have been recently applied to ensure the safe operation of Reinforcement Learning (RL) based single-agent AV controllers <ref type="bibr" target="#b13">[14]</ref>. This CBF implementation demonstrates a longitudinal safety constraint in a simple scenario. The CBF can be formulated by considering dynamic safety constraints relative to the surrounding vehicles <ref type="bibr" target="#b14">[15]</ref>. This single-agent CBF, however, assumes that other agents make worst-case decisions. Such a safety constraint results in conservative behaviour, negatively affecting traffic efficiency.</p><p>Overall, CAV lane change controllers can be designed using MARL to improve traffic efficiency, but they do not ensure safety. This design aims to integrate the CBF safety constraints with the MARL-based lane change controller <ref type="bibr" target="#b11">[12]</ref> to ensure safety by considering multi-agent vehicular dynamics to design safety constraints. The main contributions of this work are:</p><p>• The architecture to integrate CBF constraints to the high-level MARL-based lane change controllers (Section 3). • The structure for defining the dynamics of multi-agent interaction between CAVs (Section 4.2).</p><p>• The specifications and formulations of the CBF constraints to ensure the safety of CAVs (Section 4.1 and Section 4.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>The background details related to AV control hierarchy, vehicle dynamics, RL, and CBFs are provided in this section. First, the scope of this research is explained based on control hierarchies. Next, the kinematic bicycle model is explained, and the assumptions related to vehicle dynamics are outlined. Then, notations used for RL formulations are discussed. Finally, a general form of a CBF is defined along with an optimisation problem for evaluating the safe control inputs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Hierarchy of control layers</head><p>The control decisions of AVs can be separated into four hierarchical levels such as route planning, behavioural layer, motion planning, and local feedback control <ref type="bibr" target="#b15">[16]</ref>. The route planning layer first identifies a feasible route to the destination provided by the user using the road network information. The route generated from this layer consists of a sequence of waypoints. While moving along these waypoints, the behavioural layer makes high-level driving decisions such as following a lane, performing a lane change, negotiating at the intersection, or moving in an unstructured environment. The motion planning layer generates reference control actions, such as acceleration and steering, to execute a specific manoeuvre from the high-level decision. In the last layer, a local feedback controller performs necessary actuation, such as steering, throttling, and braking, to follow the control references. The lane change controller can be developed by engineering high-level behavioural and low-level motion planning layers. Specifically, a behavioural layer can be designed to make discrete decisions to change lanes or follow the current lane. Based on this decision, the motion planning layer can identify the control references to execute the desired driving manoeuvre. Therefore, this article mainly focuses on designing these two layers in the AV control hierarchy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Vehicle dynamics</head><p>In this article, the kinematic bicycle model is considered to define vehicle dynamics. This model considers the two wheels in the front as one wheel and the same for the back wheels, as illustrated in Figure <ref type="figure" target="#fig_0">1</ref>. The distance between the front and back wheels is denoted as 𝑉 𝑙 . The vehicle's position is defined using 𝑥 and 𝑦, longitudinal and lateral coordinates along the road. The vehicle's velocity (𝑣) is controlled by adjusting the acceleration input (𝑢 1 ), and the steering angle (𝛿) is controlled by adjusting the steering velocity (𝑢 2 ). The steering velocity represents the rate of change in steering angle with time <ref type="bibr" target="#b16">[17]</ref>. The steering angle is considered the same as the angle of the front wheels with respect to the current heading of the vehicle (𝜓). The equations for the kinematic bicycle model that assumes the centre of gravity (𝐺) on the axle with equal distance from the front and back wheels can be written as <ref type="bibr" target="#b17">[18]</ref>,</p><formula xml:id="formula_0">ẋ = 𝑣 𝑥 , ẏ = 𝑣 𝑦 , v 𝑥 = 𝑢 1 cos (𝜓 + 𝛽), v 𝑦 = 𝑢 1 sin (𝜓 + 𝛽), ψ = 𝑣 𝑉 𝑙 sin 𝛽, δ = 𝑢 2<label>(1)</label></formula><p>where 𝛽 is a slip angle at the centre of gravity:</p><formula xml:id="formula_1">𝛽 = tan −1 ( 1 2 tan 𝛿)<label>(2)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Reinforcement learning</head><p>RL is a computational approach for learning a sequence of actions to achieve a specific goal. The RL problem is formulated using a Markov Decision Process (MDP) defined by the tuple (𝒮 , 𝒜 , 𝒫 , ℛ, 𝛾 ). 𝒮 is the state space, the set of state variables that an agent can observe. An agent observes a state 𝑠 𝑡 ∈ 𝒮 at a time step 𝑡. 𝒜 is the action space consisting of the set of actions that an agent can perform. At a time step 𝑡, an agent performs an action 𝑎 𝑡 ∈ 𝒜. 𝒫 ∶ 𝒮 × 𝒜 × 𝒮 → [0, 1] is the state transition function that defines the likelihood of changes in the state observed from the environment based on an action 𝑎 𝑡 ∈ 𝒜. ℛ ∶ 𝒮 × 𝒜 × 𝒮 → ℝ is a reward function that defines the agent's goal. At a time step 𝑡, the agent receives a reward 𝑅 𝑡 , which is a real number calculated for a transition from the previous state to the current state through an action. The reward function formulation plays a vital role in defining agents' behaviour in the system. 𝛾 ∈ (0, 1] is a discount factor used to define the discounted reward 𝐺 𝑡 ,</p><formula xml:id="formula_2">𝐺 𝑡 = ∞ ∑ 𝑘=𝑡+1 𝛾 𝑘 𝑅 𝑘</formula><p>The discounted reward can provide a measure to choose an action that has higher probability of getting better rewards in the future. Using the discounted reward, a state-action value function, also known as the Q-value, can be derived for a policy. A policy 𝜋 is a mapping from states to the probability of selecting possible actions. The Q-function under policy 𝜋 provides an expected future reward by choosing action 𝑎 𝑡 from state 𝑠 𝑡 . It can be defined as,</p><formula xml:id="formula_3">𝑄 𝜋 (𝑠 𝑡 , 𝑎 𝑡 ) = 𝔼[𝐺 𝑡 |𝑠 𝑡 , 𝑎 𝑡 ]</formula><p>For simple problems with a small number of possible states and actions, Q-values can be calculated based on the transition probability 𝒫 using the following equation:</p><formula xml:id="formula_4">𝑄 𝜋 (𝑠 𝑡 , 𝑎 𝑡 ) = ∑ 𝑠 𝑡+1 𝒫 (𝑠 𝑡 , 𝑎 𝑡 , 𝑠 𝑡+1 )[𝑅 𝑡 + 𝛾 max 𝑎 𝑡+1 𝑄 * (𝑠 𝑡+1 , 𝑎 𝑡+1 )]</formula><p>For complex tasks with a large state and action space, such as autonomous driving, it is often very difficult or impossible to model the transition probability 𝒫. Therefore, approximation methods are usually used to find a policy to achieve higher rewards in such tasks. These approximations can be implemented using deep neural networks <ref type="bibr" target="#b18">[19]</ref>. Deep RL (DRL) approximation algorithms achieved impressive results in playing Atari games <ref type="bibr" target="#b19">[20]</ref>. Some of the recent RL approximation algorithms include Deep Q-Networks (DQN) and policy gradient methods, such as Actor-Critic Networks (ACNs). The open-source ACN algorithms such as PPO <ref type="bibr" target="#b20">[21]</ref> and ACKTR <ref type="bibr" target="#b21">[22]</ref> have been developed and published on repositories like StableBaselines-3 <ref type="bibr" target="#b22">[23]</ref> and OpenAI baselines <ref type="bibr" target="#b23">[24]</ref>. These algorithms are applied to solve optimisation problems in various research areas, including manufacturing, robotics, large language models, and autonomous vehicles.</p><p>The RL algorithms extended to MAS considering various forms of learning and control components are known as MARL <ref type="bibr" target="#b24">[25]</ref>. The learning components learn an approximate optimal policy, and the control components execute that policy. These components are integrated into an agent in single-agent tasks such as a robot cleaning the house. Many real-world tasks, however, can be considered to be MAS, as multiple agents may need to work together in the same environment. For example, systems such as multiplayer online games, cooperative robots in factories, traffic control systems, and CAVs can be considered as MAS <ref type="bibr" target="#b25">[26]</ref>. However, MARL applications are limited to non-safety-critical tasks, as they can not ensure safety because of the blackbox property <ref type="bibr" target="#b26">[27]</ref>. To overcome this limitation, CBF safety constraints can be used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Control barrier functions</head><p>Consider a discrete time nonlinear control system defined by the following transition dynamics ṡ = 𝑓 (𝑠 𝑡 ) + 𝑔(𝑠 𝑡 )𝑢 𝑡 <ref type="bibr" target="#b2">(3)</ref> where change in state variables ṡ per unit time is defined using unactuated dynamics 𝑓 ∶ 𝑆 → 𝑆, and actuated dynamics 𝑔 ∶ 𝑆 → ℝ 𝑛,𝑚 , 𝑛 and 𝑚 are number of variables in the state space 𝑆 and action space 𝑈 respectively, 𝑠 𝑡 ∈ 𝑆 is the system state, and 𝑢 𝑡 ∈ 𝑈 is the control action at time step 𝑡. The 𝑓 and 𝑔 are defined based on known system dynamics and they are locally Lipschitz continuous, in other words, continuous functions limited by a maximum rate of change. For example, the kinematic bicycle model defined in equation ( <ref type="formula" target="#formula_0">1</ref>) can be defined as a discrete time nonlinear control system (3) as follows,</p><formula xml:id="formula_5">ṡ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 𝑣 𝑥 𝑣 𝑦 0 0 𝑣 𝑉 𝑙 sin 𝛽 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ + ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0 0 0 0 cos (𝜓 + 𝛽) 0 sin (𝜓 + 𝛽) 0 0 0 0 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ [ 𝑢 1 𝑢 2 ]<label>(4)</label></formula><p>Consider a safe set 𝐶 defined as the super-level set of the continuously differentiable function</p><formula xml:id="formula_6">ℎ ∶ 𝑆 → ℝ 𝐶 ∶ {𝑠 𝑡 ∈ 𝑆 ∶ ℎ(𝑠 𝑡 ) ≥ 0}<label>(5)</label></formula><p>To ensure the safety of the control system (3), the safe set 𝐶 must be forward invariant. When 𝐶 is forward invariant, safe actions can be defined for each state 𝑠 𝑡 ∈ 𝐶 such that the system continues to stay in 𝐶. The safe set 𝐶 is considered invariant if the function ℎ is a Control Barrier Function (CBF) such that there exists 𝜂 ∈ [0, 1] for all 𝑠 𝑡 ∈ 𝐶 satisfying the following equation ( <ref type="formula">6</ref>)</p><formula xml:id="formula_7">sup 𝑢 𝑡 ∈𝑈 [ℎ (𝑓 (𝑠 𝑡 ) + 𝑔(𝑠 𝑡 )𝑢 𝑡 ) + (𝜂 − 1)ℎ(𝑠 𝑡 )] ≥ 0 (6)</formula><p>where 𝜂 defines the magnitude at which the system is pushed within the safe set 𝐶 <ref type="bibr" target="#b13">[14]</ref>. Using smaller values of 𝜂 can enforce the constraints strictly, whereas higher values can relax the constraints. Therefore, 𝜂 represents how strongly the barrier function pushes the states inwards within 𝐶. The existence of a CBF implies that for all 𝑠 𝑡 ∈ 𝐶, there exist 𝑢 𝑡 such that 𝐶 is forward invariant <ref type="bibr" target="#b27">[28]</ref>. Therefore, the goal is to find a minimal safe action 𝑢 cbf 𝑡 that satisfies <ref type="bibr" target="#b5">(6)</ref> to ensure the safety of a control system (3). Let us consider the affine barrier function of the form ℎ(𝑠 𝑡 ) = 𝑝 T 𝑠 𝑡 + 𝑞 <ref type="bibr" target="#b6">(7)</ref> where 𝑝 ∈ ℝ 𝑛 and 𝑞 ∈ ℝ are the parameters used to define a safety constraint ℎ on the state 𝑠 𝑡 . Combining the affine barrier function with the condition ( <ref type="formula">6</ref>), the following constraint can be defined for the control action 𝑢 𝑡 ,</p><formula xml:id="formula_8">−𝑝 T 𝑔(𝑠 𝑡 )𝑢 𝑡 ≤ 𝑝 T 𝑓 (𝑠 𝑡 ) + 𝑝 T (𝜂 − 1)𝑠 𝑡 + 𝜂𝑞<label>(8)</label></formula><p>To consider multiple safety constraints defined using CBFs, 𝐶 can be considered as the intersecting half spaces defined by 𝑘 affine barrier functions <ref type="bibr" target="#b14">[15]</ref>. The affine constraint on 𝑢 𝑡 can be defined by stacking all the constraints.</p><formula xml:id="formula_9">𝐴𝑢 𝑡 ≤ 𝑏, where, 𝐴 = [𝑎 1 , 𝑎 2 , ..., 𝑎 𝑘 ], with 𝑎 𝑖 = −𝑝 T 𝑖 𝑔(𝑠 𝑡 ) 𝑏 = [𝑏 1 , 𝑏 2 , ..., 𝑏 𝑘 ], with 𝑏 𝑖 = 𝑝 T 𝑖 𝑓 (𝑠 𝑡 ) + 𝑝 T 𝑖 (𝜂 − 1)𝑠 𝑡 + 𝜂𝑞 𝑖<label>(9)</label></formula><p>This constraint can be used to reformulate the CBF given by ( <ref type="formula">6</ref>) into the following optimisation problem</p><formula xml:id="formula_10">𝑢 𝑡 = arg min 𝑢 𝑡 ||𝑢 𝑡 || 2 s.t 𝐴𝑢 𝑡 ≤ 𝑏,<label>(10)</label></formula><p>which can be efficiently solved in each time step using quadratic program <ref type="bibr" target="#b28">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Integrating CBF with MARL for highway merging</head><p>In the AV control hierarchy (Section 2.1), the safety constraints can be integrated with the motion planning layer to ensure safe lane change manoeuvres <ref type="bibr" target="#b29">[30]</ref>. The safety constraints act as a shield to override the control decisions from the motion planning layer to ensure that a vehicle stays in a safe state <ref type="bibr" target="#b30">[31]</ref>. The architecture to integrate the MARL behavioural layer, motion planning layer, and safety constraints is presented in this section to develop a safe MARL lane change controller, illustrated in Figure <ref type="figure" target="#fig_1">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">MARL behavioural layer</head><p>A vehicle, referred to as the ego vehicle, makes behavioural decisions based on its state information, measured by onboard sensors such as LIDAR, RADAR, Camera, GPS, and IMU, as well as information about the states of the surrounding 𝒩 vehicles. The ego vehicle can decide whether to change lanes, follow the lane, speed up, or slow down <ref type="bibr" target="#b11">[12]</ref>. Since the ego vehicle can observe vehicles within the range of vehicular communication (V2X), the previously defined MDP (in Section 2.3) can be extended as a Partially Observable MDP(POMDP) for this MARL application. Moreover, V2X is assumed to be a perfect communication interface without any delays or packet drops. The MARL formulation defined in this section is similar to the MARL-CAV formulation, <ref type="bibr" target="#b11">[12]</ref>, with minor changes in the state space and reward function.</p><p>The state space 𝒮 𝒾 of a vehicle 𝑖 consists of state variables including,</p><p>• 𝑥 : The longitudinal position of the vehicle.</p><p>• 𝑦 : The lateral position of the vehicle.</p><p>• 𝑣 𝑥 : The longitudinal velocity of the vehicle.</p><p>• 𝑣 𝑦 : The lateral velocity of the vehicle.</p><p>• 𝜓 : The vehicle heading with respect to the road.</p><p>These variables are observed with respect to a global coordinate system, while observed vehicle state variables are relative to the ego vehicle. As the ego vehicle observes states from 𝒩 surrounding vehicles, the overall multi-agent state space is defined as a Cartesian product of the individual states, 𝒮 = 𝒮 0 × 𝒮 1 × 𝒮 2 × ... × 𝒮 𝒩 . The 𝒩 = 5 is observed to achieve the best performance <ref type="bibr" target="#b11">[12]</ref>. We have added the heading 𝜓 to the state variables considered by MARL-CAV to capture the lane change intentions of the CAVs, which ensures safety.</p><p>The action space 𝒜 is the same as defined in MARL-CAV, which consists of five discrete variables representing a specific behaviour, namely, right lane change, left lane change, follow lane, speed up, slow down. The behavioural layer chooses one of these high-level actions. The low-level controller explained in Section 3.2 further executes these decisions.</p><p>The reward function constitutes rewards for avoiding collision 𝑟 𝑐 , maintaining desirable speed 𝑟 𝑠 , maintaining desirable headway 𝑟 ℎ , and feedback from the CBF evaluation 𝑟 𝑓 along with an associated weight 𝑤 * for each reward component. These weights can be tuned to prioritise the CAV objectives. Therefore, the reward for a CAV 𝑖 at time 𝑡 is defined as</p><formula xml:id="formula_11">𝑟 𝑖,𝑡 = 𝑤 𝑐 𝑟 𝑐 + 𝑤 𝑠 𝑟 𝑠 + 𝑤 ℎ 𝑟 ℎ + 𝑤 𝑓 𝑟 𝑓</formula><p>The feedback from the CBF evaluation, 𝑟 𝑓 , is an additional component added to the reward formulation used in MARL-CAV. This can reward the agent for staying in the safe state, which minimises the control overrides required from the CBF layer. Further, this reward encourages the agent to explore within the safe states <ref type="bibr" target="#b13">[14]</ref>. Note that the reward 𝑟 𝑖,𝑡 is the reward associated with an individual agent. To achieve collaborative goals, MARL-CAV combines rewards from the surrounding agents to define a local reward as</p><formula xml:id="formula_12">𝑅 𝑖,𝑡 = 1 𝒩 𝒩 ∑ 𝑗=0 𝑟 𝑗,𝑡</formula><p>The MARL-CAV is demonstrated with multiple RL algorithms, such as Multi-Agent extensions (MA*) of PPO <ref type="bibr" target="#b20">[21]</ref>, ACKTR <ref type="bibr" target="#b21">[22]</ref>, and DQN <ref type="bibr" target="#b19">[20]</ref>, namely MAPPO, MAACKTR, and MADQN. The multi-agent extension of these algorithms is inspired by the parameter sharing approach proposed in the Multi-Agent Actor Critic (MA2C) RL algorithm <ref type="bibr" target="#b31">[32]</ref>. Among them, MAPPO performed best compared to other algorithms in Chen et al. 2023's MARL benchmark analysis <ref type="bibr" target="#b11">[12]</ref>. Therefore, this work considers the MAPPO algorithm to train the high-level behavioural layer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Motion planning layer</head><p>For the low-level motion planning layer, a Proportional-Integral-Derivative (PID) controller can be used to generate control actions, such as acceleration and steering velocity, to execute a behavioural command (defined in Section 3.1). Because of its simplicity, the PID controller can generate control actions in real time. Moreover, it does not require any pre-defined model. While it is possible to integrate the behavioural and motion planning layer using learning-based methods to design end-to-end controllers, they have been criticised for the difficulty in training policies to perform complex tasks <ref type="bibr" target="#b32">[33]</ref>. Especially for autonomous driving tasks with dynamic surroundings, end-to-end controllers suffer from poor sample efficiency, resulting in high resource requirements <ref type="bibr" target="#b26">[27]</ref>. Another option to integrate the high-level and low-level control layers with learning-based methods is to use hierarchical RL <ref type="bibr" target="#b33">[34]</ref>. However, this approach is difficult to reproduce because of the complex training process. Other model-based approaches, such as Model Predictive Controller (MPC), require a model for generating low-level control actions <ref type="bibr" target="#b34">[35]</ref>. Estimating such a model for generating CAV control actions in a complex scenario can be difficult. Therefore, the PID controller is a viable option for the low-level control layer along with the high-level MARL controller.</p><p>Since the high-level MARL controller is not guaranteed to make safe control decisions, the low-level controller can generate unsafe control actions. The low-level control action 𝑢 ll 𝑡 at time 𝑡 generated from the PID controller aims to execute the high-level behavioural decision. Therefore, the control action 𝑢 ll 𝑡 must be constrained to ensure safety.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Safety with CBF shield</head><p>Safety of a control system can be ensured by overriding the possibly unsafe lower-level control action 𝑢 ll 𝑡 with a correction 𝑢 cbf 𝑡 to comply with safety constraints defined using CBFs <ref type="bibr" target="#b13">[14]</ref>. Therefore, the final control action 𝑢 𝑡 can be defined as</p><formula xml:id="formula_13">𝑢 𝑡 = 𝑢 ll 𝑡 + 𝑢 cbf 𝑡 (<label>11</label></formula><formula xml:id="formula_14">)</formula><p>With the updated definition for the action 𝑢 𝑡 <ref type="bibr" target="#b10">(11)</ref>, the constraints defined in ( <ref type="formula" target="#formula_9">9</ref>) can be updated to modify the optimisation problem <ref type="bibr" target="#b9">(10)</ref> as follows</p><formula xml:id="formula_15">𝑢 cbf 𝑡 = arg min 𝑢 cbf 𝑡 ||𝑢 cbf 𝑡 || 2 s.t 𝐴𝑢 cbf 𝑡 ≤ 𝑏 ll , where 𝑏 ll = [𝑏 ll 1 , 𝑏 ll 2 , ..., 𝑏 ll 𝑘 ], with 𝑏 ll 𝑖 = 𝑝 T 𝑖 𝑓 (𝑠 𝑡 ) + 𝑝 T 𝑖 (𝜂 − 1)𝑠 𝑡 + 𝜂𝑞 𝑖 + 𝑝 T 𝑖 𝑔(𝑠 𝑡 )𝑢 ll 𝑡<label>(12)</label></formula><p>In the above optimisation problem <ref type="bibr" target="#b11">(12)</ref>, 𝑢 cbf 𝑡 is optimised to evaluate the minimal correction required to ensure the safety of the system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Decentralised CBF for CAVs</head><p>In the previous sections, the CBF ℎ(𝑠 𝑡 ) has been defined based only on the ego vehicle's state. The safety constraints defined in this section consider MAS dynamics because CAVs depend on the control decisions of other vehicles to ensure their own safety. These safety constraints are defined for pure CAV traffic. The extension of the safety constraints to mixed CAV traffic, consisting of vehicles with varying levels of autonomy and connectivity, is left for future work. In this section, specifications for decentralised CBFs are defined first (Section 4.1). Then, the multi-agent actor dependencies are defined for CAVs(Section 4.2). In the end, CBF formulations are defined based on the actor dependencies to comply with the specifications (Section 4.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Specifications</head><p>The following specifications are considered to formulate decentralised CBFs for CAVs:</p><p>1. Ensure the safety of all CAVs in a MAS. 2. Safe acceleration control to avoid collision with the preceding vehicle. 3. Safe steering control to avoid collisions during lane change manoeuvres. 4. Respect the CAV controller's physical limits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Multi-agent actor dependencies</head><p>As CAVs can share their states with the surrounding vehicles, the state 𝑠 𝑡 of the ego vehicle constitutes its own ego states, 𝑠 e 𝑡 , and observed states, 𝑠 o 𝑡 , of observed vehicles. With this consideration, a dynamic CBF ℎ for two CAVs can be defined as</p><formula xml:id="formula_16">ℎ(𝑠 𝑡 ) = ℎ(𝑠 e 𝑡 ) + ℎ(𝑠 o 𝑡 )<label>(13)</label></formula><p>Notice that each term in the previously-defined optimisation constraint ( <ref type="formula" target="#formula_15">12</ref>) is defined based on the state and the action variables from a single agent. For MAS, each term can be separated into variables associated with the ego vehicle * e and the observed vehicle * o as follows</p><formula xml:id="formula_17">𝐴 e 𝑢 e 𝑡 + 𝐴 o 𝑢 o 𝑡 ≤ 𝑏 e + 𝑏 o<label>(14)</label></formula><p>where 𝐴 e and 𝑏 e are equivalent to 𝐴 and 𝑏 ll (from( <ref type="formula" target="#formula_15">12</ref>)), but evaluated using the state 𝑠 𝑡 and the action 𝑢 ll 𝑡 associated with the ego vehicle. Similarly, 𝐴 o and 𝑏 o are derived from the state and the action variables associated with the observed vehicle. The safe action for an ego vehicle can be obtained by optimising the minimum control correction 𝑢 e 𝑡 , assuming that the observed vehicle shares its state variables and safe control decisions. Therefore, the multi-agent constraint in ( <ref type="formula" target="#formula_17">14</ref>) is modified to update the quadratic program <ref type="bibr" target="#b11">(12)</ref>  </p><p>The actor dependency exists between ego vehicle and observed vehicles as the term 𝑏 ma in the above equation <ref type="bibr" target="#b14">(15)</ref> requires the observed vehicles to make their control decision, 𝑢 o 𝑡 , before the ego vehicle. Based on the ego vehicle's high-level behaviour, the observed vehicles are identified to be considered in CBF constraints.</p><p>If the ego vehicle is following a lane, its control action depends on the immediate leading vehicle within the communication range, as illustrated in Figure <ref type="figure" target="#fig_3">3a</ref>. In this case, the longitudinal distance between the ego and the leading vehicle Δ𝑥 l must be constrained to ensure safety. If a vehicle in the adjacent lane intends to change lanes to the ego vehicle's current lane behind the immediate leading vehicle, the ego vehicle's decision depends on the adjacent vehicle, as shown in Figure <ref type="figure" target="#fig_3">3b</ref>. In this case, the longitudinal and lateral distances, Δ𝑥 a and Δ𝑦 a , with the adjacent vehicle are constrained. As the ego  While changing lanes, the ego vehicle depends on the actions of immediate leading vehicles in the current lane and the adjacent target lane. Similar to the previous case, longitudinal distance from the leading vehicle Δ𝑥 l is constrained. Moreover, the longitudinal and lateral distances from the adjacent vehicle, Δ𝑥 a and Δ𝑦 a , are constrained. This dependency is illustrated in Figure <ref type="figure" target="#fig_3">3c</ref>.</p><p>Collectively, the safety of all the CAVs can be ensured by following the actor dependency as an individual CAV ensures safety with respect to its preceding vehicles. To honour this dependency, CAVs in a MAS are assumed to make control decisions sequentially in the decreasing order of their longitudinal position. Moreover, this actor dependency is applicable only to roads with two lanes. These limitations will be addressed in future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">CBF formulation</head><p>CBF constraints for CAVs, CBF-CAV, are defined to ensure safe longitudinal and lateral motion without violating the physical control limits of the vehicles. This section defines CBFs for each type of safety constraint, along with the conditions for their applicability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.1.">Longitudinal motion</head><p>The safety constraint for longitudinal motion allows the ego vehicle to maintain a safe headway with a preceding vehicle in the current lane. This constraint can be defined as, ℎ lon = Δ𝑥 l − 𝑥 safe <ref type="bibr" target="#b15">(16)</ref> where Δ𝑥 l is the longitudinal distance from the rear end of the preceding vehicle and the front of the ego vehicle. The 𝑥 safe is the safe distance that must be maintained between two vehicles to ensure that the following vehicle has enough time to slowdown if the leading vehicle start slowing down abruptly. The safe distance threshold can be evaluated from the ego vehicle velocity 𝑣 e and time headway 𝜏 as shown below,</p><formula xml:id="formula_19">𝑥 safe = 𝜏 * 𝑣 e<label>(17)</label></formula><p>Both lane following and lane changing vehicles use this constraint, as the ego vehicle is expected to maintain a safe distance from the leading vehicle in all driving scenarios. Moreover, the leading vehicle must be within the ego vehicle's communication range to enforce this constraint.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.2.">Lateral motion</head><p>This constraint ensures safety when a CAV moves laterally to change lanes. During the lane change, a vehicle must maintain a safe distance 𝑥 safe with a leading vehicle in the same lane. This constraint can be enforced with a previously defined CBF <ref type="bibr" target="#b16">(17)</ref>. As the vehicle moves laterally, either a safe lateral distance 𝑦 safe or a safe longitudinal distance 𝑥 safe must be maintained with a leading vehicle in the adjacent lane. This constraint is defined as ℎ lat</p><formula xml:id="formula_20">ℎ lat = Δ𝑥 a 𝑥 safe + Δ𝑦 a 𝑦 safe − 1<label>(18)</label></formula><p>where 𝑥 safe is the same variable defined in equation ( <ref type="formula" target="#formula_19">17</ref>), 𝑦 safe is a constant defined based on lane width 𝐿 𝑤 and vehicle width 𝑉 𝑤 to ensure comfortable lateral distance when the adjacent leading vehicle is moving in parallel.</p><formula xml:id="formula_21">𝑦 safe = 𝐿 𝑤 − 𝑉 𝑤<label>(19)</label></formula><p>A safe distance can be maintained with a vehicle in the adjacent lane with this constraint. Before changing a lane, this constraint ensures that the ego vehicle maintains a safe lateral distance from the adjacent vehicle. During the lane change, this constraint allows partial violations of lateral and longitudinal constraints while maintaining sufficient distance to avoid collision. During the execution of lane change manoeuvre, the partial violations allow a vehicle to gradually reduce the lateral distance Δ𝑦 a while gradually increasing the longitudinal distance Δ𝑥 a with the adjacent vehicle. The gradual increase in longitudinal distance ensures that a safe distance is maintained after completing the lane change manoeuvre. For example, if a vehicle in the adjacent lane is parallel to the ego vehicle, then the lateral safe distance must be maintained, Δ𝑦 a ≥ 𝑦 safe . In another case, if the adjacent vehicle is about to enter the ego vehicle's lane, then the ego vehicle must gradually increase the longitudinal distance, such that Δ𝑥 a ≥ 𝑥 safe when the adjacent vehicle enters the current lane. This constraint is applied to lane changing vehicles and lane following vehicles if they are obstructed by the adjacent leading vehicle changing lanes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.3.">Control limits</head><p>Given that vehicle control inputs are subject to physical limitations, they must be constrained. The physical constraints are defined on the steering angle (𝛿), which is constrained within the range [-𝛿 max , 𝛿 max ]. This physical constraint is defined using two CBFs ℎ max 𝛿 and ℎ min 𝛿 . Note that the steering angle is one of the state variables, hence the constraints are defined using CBFs. The acceleration range of the vehicle is defined as [−𝑢 max 1 , 𝑢 max 1 ]. This physical constraint can be enforced by including it in the constraints of the optimisation problem defined in <ref type="bibr" target="#b11">(12)</ref>.</p><formula xml:id="formula_22">−𝑢 max 1 ≤ 𝑢 1 ≤ 𝑢 max 1 (21)</formula><p>These constraints are applied to all CAVs, as they must be honoured in all scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>The proposed CBF formulations, CBF-CAV, integrated with the behavioural layer with MARL lane change controller can have minimal impact on the efficiency, because the CBF constraints override the actions only when a vehicle is about to go towards unsafe states. Moreover, by restricting agents to the safe states, the MARL controller can be trained efficiently by exploring the safe states only. Furthermore, the actor dependencies are used to consider MAS dynamics in the CAV traffic. These constraints ensure both lateral and longitudinal safe motions for all CAVs in a traffic scenario.</p><p>The provided formulations are suitable for pure CAV traffic, where a lower safe distance can be used. In the future, this can be extended to mixed traffic with dynamic CBFs to maintain higher safe distances with human driven vehicles, assuming they take worst-case control decisions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Kinematic bicycle model</figDesc><graphic coords="3,128.41,65.60,338.45,184.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Architecture for integrating MARL with safety constraints</figDesc><graphic coords="6,94.57,65.60,406.13,237.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>(a) Follow lane (b) Follow lane with a lane changing vehicle (c) Lane changing vehicle</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Actor dependency for CAVs with decentralised CBF shield vehicle's decision depends on the adjacent vehicle's action, this dependency encourages collaborative lane change behaviour among CAVs.While changing lanes, the ego vehicle depends on the actions of immediate leading vehicles in the current lane and the adjacent target lane. Similar to the previous case, longitudinal distance from the leading vehicle Δ𝑥 l is constrained. Moreover, the longitudinal and lateral distances from the adjacent vehicle, Δ𝑥 a and Δ𝑦 a , are constrained. This dependency is illustrated in Figure3c.Collectively, the safety of all the CAVs can be ensured by following the actor dependency as an individual CAV ensures safety with respect to its preceding vehicles. To honour this dependency, CAVs in a MAS are assumed to make control decisions sequentially in the decreasing order of their longitudinal position. Moreover, this actor dependency is applicable only to roads with two lanes. These limitations will be addressed in future work.</figDesc><graphic coords="9,400.42,241.96,67.68,82.43" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>for optimising 𝑢 e 𝑡 , 𝑡 || 2 , s.t 𝐴 e 𝑢 e 𝑡 ≤ 𝑏 ma , and 𝑏 ma = 𝑏 e + 𝑏 o − 𝐴 o 𝑢 o</figDesc><table><row><cell>𝑢 e 𝑡 = arg min 𝑡 𝑢 e</cell><cell>||𝑢 e</cell><cell>𝑡</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The authors wish to thank the editors and anonymous reviewers for their valuable comments and helpful suggestions which greatly improved the paper's quality. This work was supported by the SFI Centre for Research Training in Advanced Networks for Sustainable Societies (ADVANCE CRT), Ireland under the Grant number 18/CRT/6222.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://www.who.int/publications-detail-redirect/9789240086517" />
		<title level="m">Global status report on road safety 2023</title>
				<meeting><address><addrLine>WHO</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>WHO</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://www.weforum.org/agenda/2019/03/traffic-congestion-cost-the-us-economy-nearly-87-billion-in-2018/" />
		<title level="m">Traffic congestion cost the US economy nearly $87 billion in 2018</title>
				<imprint>
			<publisher>WEF</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
		<respStmt>
			<orgName>World Economic Forum</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Commission</surname></persName>
		</author>
		<ptr target="https://ec.europa.eu/commission/presscorner/detail/en/ip_23_953" />
		<title level="m">Road safety in the EU: fatalities below pre-pandemic levels but progress remains too slow</title>
				<imprint>
			<publisher>European Commission -European Commission</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Commission</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D.-G</forename><surname>For</surname></persName>
		</author>
		<idno type="DOI">10.2832/391271</idno>
		<ptr target="https://data.europa.eu/doi/10.2832/391271.doi:doi/10.2832/391271" />
		<title level="m">Mobility andTransport, Next steps towards &apos;Vision Zero&apos; -EU road safety policy framework 2021-2030</title>
				<imprint>
			<publisher>Publications Office</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Artificial intelligence applications in the development of autonomous vehicles: a survey</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.1109/JAS.2020.1003021</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE/CAA Journal of Automatica Sinica</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="315" to="329" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>IEEE/CAA Journal of Automatica Sinica</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Design of AI-based lane changing modules in connected and autonomous vehicles: a survey</title>
		<author>
			<persName><forename type="first">B</forename><surname>Hegde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bouroche</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-3173/7.pdf" />
	</analytic>
	<monogr>
		<title level="m">Twelfth International Workshop on Agents in Traffic and Transportation</title>
				<meeting><address><addrLine>Vienna</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page">16</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Distributed Multiagent Coordinated Learning for Autonomous Driving in Highways Based on Dynamic Coordination Graphs</title>
		<author>
			<persName><forename type="first">C</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tan</surname></persName>
		</author>
		<idno type="DOI">10.1109/TITS.2019.2893683</idno>
	</analytic>
	<monogr>
		<title level="m">conference Name: IEEE Transactions on Intelligent Transportation Systems</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="735" to="748" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment</title>
		<author>
			<persName><forename type="first">J</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Steinfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Labi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.trc.2021.103192</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S0968090X21002084.doi:10.1016/j.trc.2021.103192" />
	</analytic>
	<monogr>
		<title level="j">Transportation Research Part C: Emerging Technologies</title>
		<imprint>
			<biblScope unit="volume">128</biblScope>
			<biblScope unit="page">103192</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">Y J</forename><surname>Ha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Labi</surname></persName>
		</author>
		<idno type="DOI">10.1111/mice.12702</idno>
		<ptr target="https://onlinelibrary.wi-ley.com/doi/pdf/10.1111/mice.12702" />
	</analytic>
	<monogr>
		<title level="j">Computer-Aided Civil and Infrastructure Engineering</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="838" to="857" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ge</surname></persName>
		</author>
		<idno type="DOI">10.1007/s43684-022-00023-5</idno>
		<ptr target="https://doi.org/10.1007/s43684-022-00023-5.doi:10.1007/s43684-022-00023-5" />
	</analytic>
	<monogr>
		<title level="j">Autonomous Intelligent Systems</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">5</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Decentralized Cooperative Lane Changing at Freeway Weaving Areas Using Multi-Agent Deep Reinforcement Learning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Graf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2110.08124[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="2110.08124" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Deep Multi-Agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Hajidavalloo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1109/TITS.2023.3285442</idno>
	</analytic>
	<monogr>
		<title level="m">conference Name: IEEE Transactions on Intelligent Transportation Systems</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1" to="16" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Multi-agent reinforcement learning for safe lane changes by connected and autonomous vehicles: A survey</title>
		<author>
			<persName><forename type="first">B</forename><surname>Hegde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bouroche</surname></persName>
		</author>
		<idno type="DOI">10.3233/AIC-220316</idno>
		<ptr target="https://content.iospress.com/articles/ai-communications/aic220316.doi:10.3233/AIC-220316" />
	</analytic>
	<monogr>
		<title level="j">AI Communications</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="203" to="222" />
			<date type="published" when="2024">2024</date>
			<publisher>IOS Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Orosz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Murray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Burdick</surname></persName>
		</author>
		<idno type="DOI">10.1609/aaai.v33i01.33013387</idno>
		<ptr target="https://ojs.aaai.org/index.php/AAAI/article/view/4213.doi:10.1609/aaai.v33i01.33013387" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Ensuring Safety of Learning-Based Motion Planners Using Control Barrier Functions</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1109/LRA.2022.3152313</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Robotics and Automation Letters</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="4773" to="4780" />
		</imprint>
	</monogr>
	<note>conference Name</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles</title>
		<author>
			<persName><forename type="first">B</forename><surname>Paden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Čáp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Z</forename><surname>Yong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yershov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frazzoli</surname></persName>
		</author>
		<idno type="DOI">10.1109/TIV.2016.2578706</idno>
	</analytic>
	<monogr>
		<title level="m">conference Name: IEEE Transactions on Intelligent Vehicles</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="33" to="55" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Feedback control of a nonholonomic car-like robot</title>
		<author>
			<persName><forename type="first">A</forename><surname>De Luca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Oriolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Samson</surname></persName>
		</author>
		<idno type="DOI">10.1007/BFb0036073.PDFfoundinhttps://www.di.ens.fr/jean-paul.laumond/promotion/chap4.pdf</idno>
		<ptr target="LectureNotesinControl" />
	</analytic>
	<monogr>
		<title level="m">Robot Motion Planning and Control</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Thoma</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Laumond</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">229</biblScope>
			<biblScope unit="page" from="171" to="253" />
		</imprint>
		<respStmt>
			<orgName>and Information Sciences</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">CommonRoad: Composable benchmarks for motion planning on roads</title>
		<author>
			<persName><forename type="first">M</forename><surname>Althoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Koschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Manzinger</surname></persName>
		</author>
		<idno type="DOI">10.1109/IVS.2017.7995802</idno>
		<ptr target="https://ieeexplore.ieee.org/document/7995802.doi:10.1109/IVS.2017.7995802" />
	</analytic>
	<monogr>
		<title level="m">IEEE Intelligent Vehicles Symposium (IV)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="719" to="726" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Reinforcement learning: An introduction</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Barto</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
	<note>second edition ed. Edition: Second edition</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Human-level control through deep reinforcement learning</title>
		<author>
			<persName><forename type="first">V</forename><surname>Mnih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Rusu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Veness</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Bellemare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedmiller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Fidjeland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ostrovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Petersen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Beattie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sadik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Antonoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kumaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wierstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Legg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hassabis</surname></persName>
		</author>
		<idno type="DOI">10.1038/nature14236</idno>
		<ptr target="7540Publisher" />
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">518</biblScope>
			<biblScope unit="page" from="529" to="533" />
			<date type="published" when="2015">2015</date>
			<publisher>Nature Publishing Group</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Proximal Policy Optimization Algorithms</title>
		<author>
			<persName><forename type="first">J</forename><surname>Schulman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wolski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Klimov</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1707.06347</idno>
		<idno type="arXiv">arXiv:1707.06347</idno>
		<ptr target="http://arxiv.org/abs/1707.06347.doi:10.48550/arXiv.1707.06347" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mansimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Grosse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">30</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Stable-Baselines3: Reliable Reinforcement Learning Implementations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Raffin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gleave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kanervisto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernestus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dormann</surname></persName>
		</author>
		<ptr target="http://jmlr.org/papers/v22/20-1364.html" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="1" to="8" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Klimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nichol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Plappert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schulman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sidor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhokhov</surname></persName>
		</author>
		<ptr target="https://github.com/openai/baselines,publicationTitle:GitHubrepository" />
		<title level="m">OpenAI Baselines</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Parameter Sharing For Heterogeneous Agents in Multi-Agent Reinforcement Learning</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">K</forename><surname>Terry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Grammel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Son</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Black</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2005.13625</idno>
		<idno type="arXiv">arXiv:2005.13625</idno>
		<ptr target="http://arxiv.org/abs/2005.13625.doi:10.48550/arXiv.2005.13625" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">D</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nahavandi</surname></persName>
		</author>
		<idno type="DOI">10.1109/TCYB.2020.2977374</idno>
	</analytic>
	<monogr>
		<title level="m">conference Name: IEEE Transactions on Cybernetics</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="3826" to="3839" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives</title>
		<author>
			<persName><forename type="first">S</forename><surname>Teng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xuanyuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.1109/TIV.2023.3274536</idno>
	</analytic>
	<monogr>
		<title level="m">conference Name: IEEE Transactions on Intelligent Vehicles</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="3692" to="3711" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Ames</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Coogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Egerstedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Notomista</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sreenath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tabuada</surname></persName>
		</author>
		<idno type="DOI">10.23919/ECC.2019.8796030</idno>
		<ptr target="https://ieeexplore.ieee.org/abstract/document/8796030.doi:10.23919/ECC.2019.8796030" />
		<title level="m">Control Barrier Functions: Theory and Applications</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3420" to="3431" />
		</imprint>
	</monogr>
	<note>2019 18th European Control Conference (ECC)</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">P</forename><surname>Boyd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vandenberghe</surname></persName>
		</author>
		<title level="m">Convex optimization</title>
				<meeting><address><addrLine>Cambridge New York Melbourne New Delhi Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
	<note>version 29 ed</note>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">A Safe Hierarchical Planning Framework for Complex Driving Scenarios based on Reinforcement Learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tomizuka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhan</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICRA48506.2021.9561195</idno>
		<ptr target="2577-087X" />
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Robotics and Automation (ICRA)</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="2660" to="2666" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">Safe Multi-Agent Reinforcement Learning via Shielding</title>
		<author>
			<persName><forename type="first">I</forename><surname>Elsayed-Aly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bharadwaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ehlers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Topcu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Feng</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2101.11196</idno>
		<idno type="arXiv">arXiv:2101.11196</idno>
		<ptr target="http://arxiv.org/abs/2101.11196.doi:10.48550/arXiv.2101.11196" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning</title>
		<author>
			<persName><forename type="first">K</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="DOI">10.1145/3219819.3219993</idno>
		<idno>doi:10.1145/ 3219819.3219993</idno>
		<ptr target="http://doi.org/10.1145/3219819.3219993" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD &apos;18</title>
				<meeting>the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD &apos;18<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1774" to="1783" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">A Survey of End-to-End Driving: Architectures and Training Methods</title>
		<author>
			<persName><forename type="first">A</forename><surname>Tampuu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Matiisen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Semikin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fishman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Muhammad</surname></persName>
		</author>
		<idno type="DOI">10.1109/TNNLS.2020.3043505</idno>
	</analytic>
	<monogr>
		<title level="m">conference Name: IEEE Transactions on Neural Networks and Learning Systems</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1364" to="1384" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Hierarchical reinforcement learning for selfdriving decision-making without reliance on labelled driving data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Eben</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Cheng</surname></persName>
		</author>
		<idno type="DOI">10.1049/iet-its.2019.0317</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/abs/10.1049/iet-its.2019.0317.doi:10.1049/iet-its.2019.0317" />
	</analytic>
	<monogr>
		<title level="j">IET Intelligent Transport Systems</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="297" to="305" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">The State-of-the-Art of Coordinated Ramp Control with Mixed Traffic Conditions</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Barth</surname></persName>
		</author>
		<idno type="DOI">10.1109/ITSC.2019.8917067</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Intelligent Transportation Systems Conference (ITSC)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="1741" to="1748" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
