<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Matteo</forename><surname>Cederle</surname></persName>
							<email>matteo.cederle@phd.unipd.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Information Engineering</orgName>
								<orgName type="institution">University of Padova</orgName>
								<address>
									<addrLine>via Gradenigo 6</addrLine>
									<postCode>35131</postCode>
									<settlement>Padua/B</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marco</forename><surname>Fabris</surname></persName>
							<email>marco.fabris.1@unipd.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Information Engineering</orgName>
								<orgName type="institution">University of Padova</orgName>
								<address>
									<addrLine>via Gradenigo 6</addrLine>
									<postCode>35131</postCode>
									<settlement>Padua/B</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gian</forename><forename type="middle">Antonio</forename><surname>Susto</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Information Engineering</orgName>
								<orgName type="institution">University of Padova</orgName>
								<address>
									<addrLine>via Gradenigo 6</addrLine>
									<postCode>35131</postCode>
									<settlement>Padua/B</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">F234EB2470B49351398BC10D69FE087D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Autonomous Intersection Management</term>
					<term>Connected Autonomous Vehicles</term>
					<term>DQN</term>
					<term>Multi-Agent Reinforcement Learning</term>
					<term>Reinforcement Learning</term>
					<term>Smart Mobility</term>
					<term>Traffic Scenarios</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Autonomous intersection management (AIM) poses significant challenges due to the intricate nature of real-world traffic scenarios and the need for a highly expensive centralised server in charge of simultaneously controlling all the vehicles. This study addresses such issues by proposing a novel distributed approach to AIM utilizing multi-agent reinforcement learning (MARL). We show that by leveraging the 3D surround view technology for advanced assistance systems, autonomous vehicles can accurately navigate intersection scenarios without needing any centralised controller. The contributions of this paper thus include a MARL-based algorithm for the autonomous management of a 4-way intersection and also the introduction of a new strategy called prioritised scenario replay for improved training efficacy. We validate our approach as an innovative alternative to conventional centralised AIM techniques, ensuring the full reproducibility of our results. Specifically, experiments conducted in virtual environments using the SMARTS platform highlight its superiority over benchmarks across various metrics.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Connected autonomous vehicles (CAVs) represent a groundbreaking advancement in transportation, poised to revolutionize mobility by redefining commuting, parking, travel, and urban interaction <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Equipped with advanced sensors and AI systems, CAVs navigate roads with precision, reducing accidents caused by human error <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. This enhanced safety feature not only saves lives but also makes mobility more accessible and inclusive for individuals with disabilities or vulnerabilities <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>. On a societal level, CAVs optimize traffic flow and minimize congestion, reducing travel times and improving overall efficiency and stability <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. Additionally, CAVs promise environmental sustainability by integrating electric and hybrid propulsion systems, significantly reducing greenhouse gas emissions <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>Lately, significant strides in the development of CAVs have been made, largely attributed to the utilization of multi-agent reinforcement learning (MARL) <ref type="bibr" target="#b10">[11]</ref> within the framework of smart mobility, showing promise in addressing autonomous intersection management (AIM) <ref type="bibr" target="#b11">[12]</ref>. As it is widely believed that the resolution of AIM is pivotal to overcome in order to advance the adoption of CAVs, this control problem constitutes the primary focus for our study.</p><p>A vast literature exists on AIM. The research in this field spans multiple fronts, each leveraging distinct methodologies to address the challenges of optimizing traffic flow and ensuring safety in dynamic urban environments. By employing reinforcement learning (RL), AIM systems can effectively learn and adapt intersection control strategies in response to changing traffic conditions <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>. These systems typically comprise priority assignment models, intersection control model learning, and safe brake control mechanisms. Experimental simulations demonstrate the superiority of RL-inspired AIM approaches over traditional methods, showcasing enhanced efficiency and safety. Graph neural networks (GNNs) have also garnered attention for their potential in AIM <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>. Leveraging RL algorithms, GNNs optimize traffic flow at intersections by jointly planning for multiple vehicles. These models encode scene representations efficiently, providing individual outputs for all involved vehicles. Game theory serves then as a foundational framework for MARL approaches in AIM. Indeed, game-theoretic models facilitate safe and adaptive decision-making for CAVs at intersections <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18]</ref>. By considering the diverse behaviors of interacting vehicles, these algorithms ensure flexibility and adaptability, thus enhancing autonomous vehicle performances in challenging scenarios. Finally, recursive neural networks (RNNs) integrated in the MARL framework represent an interesting approach in AIM research to learn complex traffic dynamics and optimize vehicle speed control <ref type="bibr" target="#b18">[19]</ref>.</p><p>Despite the advancements in AIM techniques, their implementation still faces important challenges. One of the main obstacles is represented by the need for an expensive centralised server which has to be positioned in the proximity of the intersection, in order to simultaneously control all the vehicles. Moreover, the vehicles should continuously send their local information to this centralised controller, which will then gather and elaborate the data coming from all the road users, before sending back to each vehicle a velocity or acceleration command. Given the complexity and high demands of this technological framework, the integration of AIM devices into existing transportation infrastructures still requires many years of extensive research and testing. In this direction, we devise an alternative distributed approach based on the 3D surround view technology for advanced assistance systems <ref type="bibr" target="#b19">[20]</ref>. As shown in the sequel, such a method allows the reconstruction of a 360 ∘ scene centered around each CAV, which is useful to recover the information required for each agent involved in the proposed MARL-based technique. This, in turn, grants to effectively carry out AIM in a decentralised fashion, exploiting sensors that are currently available on the market, without the need for the centralised infrastructure described in the previous lines. More precisely, the contributions of this paper are multiple.</p><p>• As mentioned above, we offer a new distributed strategy that represents a competitive and realistic alternative to the classical centralised AIM techniques. • Relying on self-play <ref type="bibr" target="#b20">[21]</ref> and drawing inspiration from prioritised experience replay <ref type="bibr" target="#b21">[22]</ref> to improve training efficacy, we develop a MARL-based algorithm capable of tackling and solving a 4-way intersection by means of the SMARTS platform <ref type="bibr" target="#b22">[23]</ref>. • Our strategy outperforms a number of well-established benchmarks, which typically leverage traffic light regulation in function of travel time, waiting time and average speed. • Last but not least, we guarantee full reproducibility <ref type="foot" target="#foot_0">1</ref> of the code that is used for the generation of the virtual experiments shown in this manuscript.</p><p>The remainder of this paper unfolds as follows. The preliminaries for this study are yielded in Section 2; whereas, Section 3 provides the core of our contribution, namely the multi-agent decentralised dueling double deep q-networks algorithm with prioritised scenario replay (MAD4QN-PS). This innovative method is then tested and validated through several virtual experiments, as illustrated in Section 4. Finally, Section 5 draws the conclusions for the present investigation, proposing future developments.</p><p>Notation: The sets of natural and positive (zero included) real numbers are denoted by ℕ and ℝ + 0 , respectively. Given a random variable (r.v.) 𝑌, its probability mass function is denoted by 𝑃[𝑌 = 𝑦], whereas 𝑃[𝑌 = 𝑦 | 𝑍 = 𝑧] indicates the probability mass function of 𝑌 conditioned to the observation of a r.v. 𝑍. Moreover, the expected value of a r.v. 𝑌 is denoted by 𝔼[𝑌 ].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Theoretical background 2.1. Basic notions of reinforcement learning</head><p>RL is a machine learning paradigm in which an agent learns to solve a task by iteratively interacting with its environment. Solving the task means maximising the cumulative rewards obtained over time. A generic RL problem is formalised by the concept of Markov decision process (MDP) <ref type="bibr" target="#b23">[24]</ref>, which is a tuple composed by five elements: ⟨𝒮 , 𝒜 , 𝒫 , ℛ, 𝛾⟩. 𝒮 and 𝒜 are two generic sets, representing the state and action space respectively. 𝒫 (𝑠, 𝑎, 𝑠 ′ ) = 𝑃 [𝑆 𝑡+1 = 𝑠 ′ | 𝑆 𝑡 = 𝑠, 𝐴 𝑡 = 𝑎] is the state transition probability function, in charge of updating the environment to a new state 𝑠 ′ ∈ 𝒮 at each step, based on the previous state 𝑠 ∈ 𝒮 and the action 𝑎 ∈ 𝒜 performed by the agent. Moreover, the reward function ℛ(𝑠, 𝑎, 𝑠 ′ ) ∶ 𝒮 × 𝒜 × 𝒮 → ℝ is used to measure the quality of each transition, while 𝛾 ∈ [0, 1) denotes a discount factor, used to compute the cumulative reward at time 𝑡, i.e. the return 𝐺 𝑡 = ∑ ∞ 𝑘=0 𝛾 𝑘 𝑟 𝑡+𝑘+1 . The agent decides which action to take at each iteration exploiting its policy, a function that maps any state to the probability of selecting each possible action:</p><formula xml:id="formula_0">𝜋(𝑎|𝑠) = 𝑃[𝐴 𝑡 = 𝑎 | 𝑆 𝑡 = 𝑠], ∀𝑎 ∈ 𝒜 .<label>(1)</label></formula><p>Solving a RL problem means finding an optimal policy 𝜋 * . One criterion that is usually adopted to find 𝜋 * consists in the maximization of the state-action value function 𝑄 𝜋 (𝑠, 𝑎), i.e. the expected return starting from state 𝑠 ∈ 𝒮, taking action 𝑎 ∈ 𝒜, and thereafter following policy 𝜋:</p><formula xml:id="formula_1">𝑄 𝜋 (𝑠, 𝑎) = 𝔼 𝜋 [𝐺 𝑡 | 𝑆 𝑡 = 𝑠, 𝐴 𝑡 = 𝑎] .<label>(2)</label></formula><p>Consequently, given the state-action value function, the optimal policy is defined as 𝜋 * = arg max 𝜋 𝑄 𝜋 (𝑠, 𝑎). There is therefore an inherent relation between 𝜋 * and the optimal stateaction value function. Finally, two other important quantities which will be used in the proceeding of this article are the state value function and the advantage function. The former is defined as the expected return starting from state 𝑠 ∈ 𝒮 and then following policy 𝜋:</p><formula xml:id="formula_2">𝑉 𝜋 (𝑠) = 𝔼 𝜋 [𝐺 𝑡 | 𝑆 𝑡 = 𝑠] .<label>(3)</label></formula><p>The latter instead is used to give a relative measure of importance to each action for a particular state, and it is defined starting from 𝑄 𝜋 (𝑠, 𝑎) and 𝑉 𝜋 (𝑠):</p><formula xml:id="formula_3">𝐴 𝜋 (𝑠, 𝑎) = 𝑄 𝜋 (𝑠, 𝑎) − 𝑉 𝜋 (𝑠).<label>(4)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Q-Learning and Deep Q-Networks</head><p>To compute the optimal state-action value function we could theoretically exploit the recursive Bellman Optimality Equation <ref type="bibr" target="#b24">[25]</ref>:</p><formula xml:id="formula_4">𝑄 * (𝑠, 𝑎) = 𝔼 [𝑟 𝑡+1 + 𝛾 max 𝑎 ′ 𝑄 * (𝑆 𝑡+1 , 𝑎 ′ ) | 𝑆 𝑡 = 𝑠, 𝐴 𝑡 = 𝑎] ,<label>(5)</label></formula><p>however due to the curse of dimensionality and the need for perfect statistical information to compute the closed-form solution, it is necessary to resort to iterative learning strategies even to solve simple RL problems. The most common algorithm used in literature is Q-Learning <ref type="bibr" target="#b25">[26]</ref>,</p><p>where the state-action value function is represented by a table, which is iteratively updated at each step through an approximation of ( <ref type="formula" target="#formula_4">5</ref>):</p><formula xml:id="formula_5">𝑄 𝑡+1 (𝑠 𝑡 , 𝑎 𝑡 ) ← 𝑄 𝑡 (𝑠 𝑡 , 𝑎 𝑡 ) + 𝛼(𝑟 𝑡+1 + 𝛾 max 𝑎 ′ 𝑄 𝑡 (𝑠 𝑡+1 , 𝑎 ′ ) − 𝑄 𝑡 (𝑠 𝑡 , 𝑎 𝑡 )),<label>(6)</label></formula><p>where 𝛼 &gt; 0 is called the step-size parameter. The policy derived from the state-action value function is usually the 𝜀-greedy policy, suitable to balance the trade-off between exploration and exploitation <ref type="bibr" target="#b23">[24]</ref>:</p><formula xml:id="formula_6">𝜋(𝑎|𝑠) = { arg max 𝑎 𝑄(𝑠, 𝑎)</formula><p>with probability 1 − 𝜀 random action 𝑎 ∈ 𝒜 with probability 𝜀</p><p>Tabular Q-Learning works well for simple tasks, but the problem rapidly becomes intractable when the state space becomes very large or even continuous. For this reason state-of-the-art RL algorithms employ function approximators, such as neural networks (NNs), to solve realistic and complex problems. One of the first yet more used deep RL algorithms is Deep Q-Networks <ref type="bibr" target="#b26">[27]</ref>, which approximates the state-action value function through a NN, 𝑄(𝑠, 𝑎; 𝜃). A replay memory is used to store the transition tuples (𝑠, 𝑎, 𝑟, 𝑠 ′ ). Finally, the parameters 𝜃 of the Q-Network are optimised by sampling batches ℬ of transitions from the replay memory and minimizing a mean squared error loss derived from (6):</p><formula xml:id="formula_8">ℒ (𝜃) = 1 |ℬ| ∑ 𝑖∈ℬ [(𝑟 𝑖 + 𝛾 max 𝑎 ′ 𝑄(𝑠 ′ 𝑖 , 𝑎 ′ ; θ ) − 𝑄(𝑠 𝑖 , 𝑎 𝑖 ; 𝜃)) 2 ],<label>(8)</label></formula><p>where θ represent the parameters of a target network, which are periodically duplicated from 𝜃 and maintained unchanged for a predefined number of iterations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Multi-agent reinforcement learning</head><p>MARL expands upon traditional RL by incorporating multiple agents, each making decisions in an environment where their actions influence both the immediate rewards and the observations of other agents. In its most general definition, a MARL problem is formalised as a partially observable stochastic game (POSG), in which each agent has its own action space and reward function. Moreover, the partial observability derives from the fact that the agents do not perceive the global state, but just local observations, which carry incomplete information about the environment <ref type="bibr" target="#b27">[28]</ref>.</p><p>MARL algorithms can be categorised depending on the type of information available to the agents during training and execution: in centralised training and execution (CTCE), the learning of the policies as well as the policies themselves use some type of structure that is centrally shared between the agents. On the other hand, in decentralised training and execution (DTDE), the agents are fully independent and do not rely on centrally shared mechanisms. Finally, the centralised training and decentralised execution paradigm (CTDE) is in between the first two, exploiting centralised training to learn the policies, while the execution of the policies themselves is designed to be decentralised <ref type="bibr" target="#b28">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Multi-Agent Decentralised Dueling Double Deep Q-Networks with Prioritized Scenario replay</head><p>In this section we present our novel method based on MARL, called Multi-Agent Decentralised Dueling Double Deep Q-Networks with Prioritized Scenario replay (MAD4QN-PS). We begin by detailing how the system is modelled, and then we describe the original learning procedure that we implement in order to train agents through self-play <ref type="bibr" target="#b20">[21]</ref>. Finally, we shall introduce the prioritised scenario replay pipeline that is implemented to speed up training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">System modelling and design</head><p>The environment in which the agents live consists of a 4-way 1-lane intersection, with three different turning intentions available to each vehicle. Recalling Section 2.3, we formalize the problem as a POSG, which can be seen as a multi-agent extension to MDPs. For this reason, we shall define for each agent the observation space, the action space and the reward function.</p><p>The information retrieved by each vehicle at every time step consists of a local RGB bird-eye view image with the vehicle at the center. As already discussed in Section 1, this type of data is already recoverable from sensors with modern technology, thus making such a configuration extremely interesting from an application point of view. Moreover, the final observation passed to the agent is represented by a stack of 𝑛 ∈ ℕ consecutive frames, thus allowing the algorithm to capture temporal dependencies and understand how the environment is changing over time.</p><p>The action space of each agent instead is discrete and it contains 𝑚 ∈ ℕ velocity commands. This choice has been made because the purpose of our algorithm is not to learn the basic skills required for driving, such as keeping the lane and following a trajectory, but it is instead choosing how to behave in traffic conditions and when to interact with other vehicles present in the environment. Moreover, a similar high-level perspective has also been implemented in other works, related to the centralised AIM paradigm <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b18">19]</ref>.</p><p>Finally, for what concerns the reward function, we need to take into consideration the fact that each agent is trying to solve a multi-objective problem. Indeed the main goal of each vehicle is crossing the intersection and reaching the end of the scenario. In the meantime, a vehicle is also required not to collide with the others, by travelling as smoothly as possible. In order to fulfill all these objectives we design a reward signal composed by different terms:</p><formula xml:id="formula_9">𝑟 = ⎧ ⎪ ⎨ ⎪ ⎩ +𝑥 if 𝑥 &gt; 0, −𝑘 if vehicle not moving, −10 ⋅ 𝑘 if a collision occurs, +10 ⋅ 𝑘 if scenario completed,<label>(9)</label></formula><p>where 𝑥 ∈ ℝ + 0 is the distance travelled in meters from the previous time step and 𝑘 ∈ ℕ is a hyperparameter used to weight the importance of the last three components of the reward function with respect to the first one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Learning Strategy</head><p>The starting point for our learning strategy is the algorithm Deep Q-Networks, already presented in Section 2.2. This algorithm is then slightly modified by considering the Double DQN scheme and also the Dueling architecture, which will be briefly introduced in the sequel.</p><p>The idea of Double DQN <ref type="bibr" target="#b29">[30]</ref> is originated by the fact that Q-Learning, and consequently also DQN, are known to overestimate state-action values under certain conditions. This is due to the max operation (see in ( <ref type="formula" target="#formula_5">6</ref>) and ( <ref type="formula" target="#formula_8">8</ref>)) performed to compute the temporal difference target. To mitigate this effect, the idea is to decouple the action selection and evaluation steps by using two different networks. We thus exploit the online network in the action selection step, while we keep using the target network for evaluation. This leads to the following modification of the loss function:</p><formula xml:id="formula_10">ℒ (𝜃) = 1 |ℬ| ∑ 𝑖∈ℬ [(𝑟 𝑖 + 𝛾 𝑄(𝑠 ′ 𝑖 , arg max 𝑎 ′ 𝑄(𝑠 ′ 𝑖 , 𝑎 ′ ; 𝜃); θ ) − 𝑄(𝑠 𝑖 , 𝑎 𝑖 ; 𝜃)) 2 ].<label>(10)</label></formula><p>Dueling DQN <ref type="bibr" target="#b30">[31]</ref> instead introduces a modification in the NN architecture. Instead of having a unique final layer that outputs the Q-value for each possible action, we split it in two, with the first layer in charge of estimating the state value function (3) and the second layer used for evaluating the advantage function <ref type="bibr" target="#b3">(4)</ref>. These two quantities are then combined in the following way to produce an estimate of the state-action value function:</p><p>𝑄(𝑠, 𝑎; 𝜃, 𝛼, 𝛽) = 𝑉 (𝑠; 𝜃, 𝛼) + (𝐴(𝑠, 𝑎; 𝜃, 𝛽) − 1</p><formula xml:id="formula_11">|𝒜 | ∑ 𝑎 ′ 𝐴(𝑠, 𝑎 ′ ; 𝜃, 𝛽)),<label>(11)</label></formula><p>where 𝛼 and 𝛽 are the network parameters of the final layer, specific for the state-value function and advantage function respectively. Whereas, subtracting the term 1 |𝒜 | ∑ 𝑎 ′ 𝐴(𝑠, 𝑎 ′ ; 𝜃, 𝛽) is needed for stability reasons.</p><p>The final algorithm used for training is therefore a Multi-Agent version of Dueling Double DQN known as D3QN, with linearly-annealed 𝜀-greedy policy for all the agents. In order to allow for decentralised execution while developing at the same time a smart training strategy, we consider an intermediate approach between the DTDE and the CTDE paradigms. In particular, we initialize and train three different D3QN agents, one for each turning intention, i.e. left, straight and right. In this way each vehicle can select which model to use at the beginning of its path, according just to its own turning intention.</p><p>This approach is extremely sample-efficient, because we keep the number of network parameters constant, regardless of the number of vehicles considered. Moreover, these shared parameters are optimised through the experiences generated by all the vehicles, leading to a more diverse set of trajectories for training. Indeed, each of the three models has its own replay buffer, which contains transitions shared from all the vehicles with the corresponding turning intention. The crucial insight that makes our strategy effective is the fact that the observations gathered from each vehicle are invariant with respect to the road in which the vehicle itself is positioned. This parameter and experience sharing approach renders the training procedure of the algorithm somehow centralised because trajectories coming from different vehicles are used to train the three D3QN agents. However, we remark that, once the models have been trained, the execution phase is completely decentralised, since each vehicle locally stores the three different models. Then, at the beginning of the scenario, each CAV selects the model to use based only on its own turning intention.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">The prioritised scenario replay strategy</head><p>The agents are trained for a fixed number of iterations 𝑁 ∈ ℕ, keeping the intersection busy in order to obtain meaningful transitions to learn from. In particular, at each episode we consider the most complicated situation in which there are four vehicles simultaneously crossing the intersection, one for each road and with random turning intention.</p><p>Every 𝐸 ∈ ℕ time steps we pause training and run an evaluation phase. During this period, the agents use a greedy policy to face all the possible scenarios described above. When the evaluation is completed, we use the inverse of the returns from all the scenarios to build a probability distribution, and in the following training window we sample the different scenarios according to such a distribution. In this way we allow the agents to learn more from the most complicated situations. We name this original training strategy prioritised scenario replay because of its conceptual similarity with the prioritised experience replay scheme <ref type="bibr" target="#b21">[22]</ref>, common in many RL algorithms. Algorithm 1 illustrates the proposed learning strategy in detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments on virtual environments</head><p>In order to train and evaluate our algorithm we need a suitable simulation environment. For this project we have chosen the platform SMARTS <ref type="bibr" target="#b22">[23]</ref>, explicitly designed for MARL experiments for autonomous driving. SMARTS relies on the external provider SUMO (Simulation of Urban MObility) <ref type="bibr" target="#b31">[32]</ref>, which is a widely used microscopic traffic simulator, available under an open source license. For our setup, we have used SMARTS as a bridge between SUMO and the MARL framework, since it follows the standard Gymnasium APIs <ref type="bibr" target="#b32">[33]</ref>, widely used in the RL community.</p><p>To develop our code Python 3.8 was employed along with the version 1.4 of the Deep Learning library PyTorch <ref type="bibr" target="#b33">[34]</ref>. Moreover, a NVIDIA TITAN Xp GPU was used to run our experiments. As already mentioned in Section 3.1, we have built a 4-way 1-lane intersection scenario, with three different turning intentions available to the vehicles coming from each of the four ways. Assign each vehicle to one of the three agents, based on its turning intention Store each transition in the corresponding replay buffer 𝒟 𝑖 , 𝑖 =</p><p>for all agents 𝑖 = 1,</p><p>Sample random batch of transitions ℬ 𝑖 from replay buffer 𝒟 𝑖</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>24:</head><p>Update parameters 𝜃 𝑖 by minimising the loss function: end while 39: end while The simulation step has been fixed to 100𝑚𝑠. Regarding the observation of each vehicle, we have stacked 𝑛 = 3 consecutive frames, each consisting of a RGB image of dimensions 48 × 48 pixels. Whereas, the action space contains 𝑚 = 2 possible velocity commands<ref type="foot" target="#foot_1">2</ref> , namely 0 and 15𝑚/𝑠. The chosen velocity references are then fed at each iteration to a speed controller, in charge of driving the vehicle until the subsequent time step. For what concerns the reward function ( <ref type="formula" target="#formula_9">9</ref>), we have fixed its hyperparameter to 𝑘 = 1. Regarding the architecture of the NN used to approximate the state-action value function, we have considered a convolutional neural network (CNN), whose structural details are summarised in Table <ref type="table" target="#tab_2">1</ref>. Finally, the training hyperparameters of Algorithm 1 are reported in Table <ref type="table">2</ref>. </p><formula xml:id="formula_14">ℒ (𝜃 𝑖 ) = 1 |ℬ 𝑖 | ∑ 𝑏∈ℬ 𝑖 [(𝑟 𝑏 + 𝛾 Q 𝑖 (𝑜 ′ 𝑏 ,</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Baselines</head><p>In order to assess the quality of our algorithm, we benchmark it versus the following baselines<ref type="foot" target="#foot_2">3</ref> :</p><p>• Random policy (RP) for all the vehicles, which helps confirm whether our algorithm is effectively learning meaningful patterns, as it demonstrates its ability to outperform random actions, which lack any deliberate learning process. • Three symmetric (N/S &amp; W/E) fixed-time traffic lights (FTTL), considering two cycle lengths already analyzed in <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b18">19]</ref>, and also the optimal cycle length in function of the traffic flow, computed according to Webster's formula <ref type="bibr" target="#b36">[37]</ref>. The final flow rate of vehicles for evaluation has been set to 600 veh/hour, as will be discussed in Section 4.2.</p><p>• Two symmetric (N/S &amp; W/E) actuated traffic lights (ATL) <ref type="bibr" target="#b31">[32]</ref>, with different cycle lengths, which operate by transitioning to the next phase once they identify a pre-specified time gap between consecutive vehicles. In this way the allocation of green time across phases is optimised and the cycle duration is adjusted in accordance with changing traffic dynamics.</p><p>The parameters of the five traffic lights configurations are reported in Table <ref type="table" target="#tab_3">3</ref>. As a final note, we emphasize that we do not have considered any RL-based centralised AIM approach as baseline because the purpose of our method is to propose a more realistic and feasible alternative to them, which is however able to outperform classical intersection control methods, such as traffic lights.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results</head><p>To assess the quality of MAD4QN-PS we consider four metrics, namely the travel time, the waiting time, the average speed and the collision rate. We remark that we have accounted for vehicle-centered metrics because of the decentralised nature of our algorithm. However, it is evident that by optimizing the performance of each single road user we also implicitly improve the quality of the whole intersection. The robustness of our method is ensured by performing training ten times across different seeds, and then considering all the different trained models while evaluating our strategy. In particular, each model has been tested by running a cycle of the evaluation phase presented in Section 3.3, considering 600 veh/hour as flow rate of vehicles coming through the intersection. Then, the obtained results from the different models have been averaged out. Moreover, to ensure a fair comparison and analysis of the results, the same evaluation setup has been adopted for all the baselines introduced above. Finally, it is worth noticing that the inference time of the networks at evaluation phase is at most 1𝑚𝑠, thus allowing for real-time control, given that the simulation step has been fixed to 100𝑚𝑠, as discussed at the beginning of this section.</p><p>Starting from Figure <ref type="figure" target="#fig_2">1a</ref>, we can observe the average travel time and waiting time of a generic vehicle for all the methods. The former is defined as the overall time that the vehicle spends inside the environment, while the latter is defined as the fraction of the travel time in which the vehicle is moving with velocity less or equal to 0.1𝑚/𝑠, i.e. when it is stopped or almost stopped. We clearly see that our method strongly outperforms all the traffic lights configurations. This is mainly due to the fact that, when using traffic lights, a fraction of the vehicles is forced to stop as soon as the corresponding light becomes red. Conversely, the trained MAD4QN-PS agents are able to smoothly handle the interaction among multiple vehicles, allowing them to avoid stopping unless it is strictly necessary. Figure <ref type="figure" target="#fig_2">1b</ref> instead displays the average speed of each vehicle. The results shown in this histogram are clearly related to those in Figure <ref type="figure" target="#fig_2">1a</ref>; indeed, also in this case we can see that our method outperforms traffic lights control schemes. This occurs since the vehicles almost never stop, thus keeping a smoother velocity profile throughout all the duration of the simulation.</p><p>Lastly, we are left with the analysis of the random policy baseline, as we need to look at all the three plots to fully understand its behaviour. If we just look at Figures <ref type="figure" target="#fig_2">1a and 1b</ref> we could argue that the random policy performance is similar to that of MAD4QN-PS. This hypothesis is however disproved by Figure <ref type="figure" target="#fig_2">1c</ref>, where the average collision rate for each vehicle is illustrated. The extremely high collision percentage obtained by the random policy explains why each vehicle on average spends a small amount of time with high velocity into the environment. Indeed, the simulation is terminated as soon as a vehicle crashes. MAD4QN-PS, instead, achieves an extremely low collision rate. In addition, the fact that such a collision rate is non-zero is expected and also observed in other works exploiting RL-based techniques <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>, given that our algorithm has to implicitly learn collision avoidance through the reward signal. In practice, the remaining failures are not problematic, because we can integrate rule-based sanity checks in the pipeline in order to be 100% collision-free. Additionally, we note that two out of the ten trained models with different seeds achieve exactly 0% collision rate, meaning that if we select one of those models for deployment we are able to attain collision-free performances. This is interesting since from an applicability perspective only the best trained model would be used in practice. As a final note, we have not plotted the collision rate of the traffic lights methods for better visualization, since the latter quantity is trivially zero for all the configurations.</p><p>A short video showing the performances of MAD4QN-PS can be found at this link.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and future directions</head><p>In this study, we consider a distributed approach to face the AIM paradigm. In particular, we propose a novel algorithm which exploits MARL through self-play and an original learning strategy, named prioritised scenario replay, to train three different intersection crossing agents. The derived models are stored inside CAVs, that are then able to complete their paths by choosing the model corresponding to their own turning intention while relying just on local observations. Our algorithm represents a feasible and realistic alternative to the centralised AIM concept, that is still expected to require years of technological advancement to be implementable in a realworld scenario. In addition, simulation experiments demonstrates the superior performances of our method w.r.t. classic intersection control schemes, such as static and actuated traffic lights, in terms of travel time, waiting time and average speed for each vehicle.</p><p>In future works, we aim to explore different directions for advancements. In particular, one of the main objectives is to also consider human driven vehicles inside the environment and extend our approach to this field of research (see, e.g. the initial effort made in <ref type="bibr" target="#b37">[38]</ref>). In this case, the most challenging issue is indeed represented by the synchronization of traffic lights accounting for the presence of human driven vehicles. Moreover, given the decentralised nature of the proposed method, we expect to render our algorithm more robust without dramatically  change it. Conversely, significant redesign would be necessary for a centralised AIM approach. Furthermore, we envisage to test more complicated scenarios, both in terms of dimension and layout, again to improve the robustness of our algorithm. Finally, we intend to implement our algorithm in a scaled real-world scenario with miniature vehicles <ref type="bibr" target="#b38">[39]</ref>, to practically demonstrate the applicability of our method.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Algorithm 1 episode_terminated do 13 :</head><label>113</label><figDesc>MAD4QN-PS 1: Initialize three state-action value networks 𝑄 𝑖 with random parameters 𝜃 𝑖 , 𝑖 = 1, 2, 3 2: Initialize three target state-action value networks Q 𝑖 with parameters θ𝑖 = 𝜃 𝑖 , 𝑖 = 1, 2, 3 3: Initialize three replay buffers 𝒟 𝑖 , 𝑖 = 1, 2, 3 4: Setup initial 𝜀, decay factor 𝜀 𝑑 , evaluation period 𝐸, target update period 𝛿, discount factor 𝛾 5: Uniformly initialize the scenarios probability distribution 6: max_episode_steps ← 𝑀 7: 𝑛 ← 0 8: while 𝑛 &lt; 𝑁 do 𝑉 ← number of vehicles currently present in the simulation 14:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>FTTL1</head><label></label><figDesc>per vehicle Average waiting time per vehicle (a) Average travel and waiting time per vehicle FTTL1 FTTL2 FTTLOPT ATL1 ATL2 RP MAD4QN-</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Comparison between the performance metrics of our method (MAD4QN-PS) and the baselines introduced in Section 4.1.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Otherwise 𝑎 𝑛 𝑣 ← arg max 𝑎 𝑄 𝑖 (𝑜 𝑛 𝑣 , 𝑎; 𝜃 𝑖 ), where 𝑖 depends on the turning intention of 𝑣</figDesc><table><row><cell>15:</cell><cell>Collect observations for each vehicle 𝑜 𝑛 1 , ..., 𝑜 𝑛 𝑉</cell></row><row><cell>16:</cell><cell>for all vehicles 𝑣 in 1, ..., 𝑉 do</cell></row><row><cell>17:</cell><cell>With probability 𝜀 select a random action 𝑎 𝑛 𝑣</cell></row><row><cell>18:</cell><cell></cell></row><row><cell>19:</cell><cell>end for</cell></row><row><cell>20:</cell><cell>Apply actions 𝑎 𝑛 𝑣 and collect observations 𝑜 𝑛+1</cell></row></table><note>𝑣and rewards 𝑟 𝑛 𝑣 for 𝑣 in 1, ..., 𝑉 21:</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>arg max 𝑎 ′ 𝑄 𝑖 (𝑜 ′ 𝑏 , 𝑎 ′ ; 𝜃 𝑖 ); θ𝑖 ) − 𝑄 𝑖 (𝑜 𝑏 , 𝑎 𝑏 ; 𝜃 𝑖 )) 2 ]</figDesc><table><row><cell>25:</cell><cell>end for</cell></row><row><cell>26:</cell><cell>𝜀 ← 𝜀 − 𝜀 𝑑</cell></row><row><cell>27:</cell><cell>episode_steps ← episode_steps + 1</cell></row><row><cell>28:</cell><cell>𝑛 ← 𝑛 + 1</cell></row><row><cell>29:</cell><cell>if 𝑛 % 𝛿 == 0 then</cell></row><row><cell>30:</cell><cell>Update target network parameters θ𝑖 = 𝜃 𝑖 for each agent 𝑖 = 1, 2, 3</cell></row><row><cell>31:</cell><cell>end if</cell></row><row><cell>32:</cell><cell>if 𝑛 % 𝐸 == 0 then</cell></row><row><cell>33:</cell><cell>Run evaluation phase and update the scenarios probability distribution as described</cell></row><row><cell></cell><cell>in Section 3.3</cell></row><row><cell>34:</cell><cell>end if</cell></row><row><cell>35:</cell><cell>if a collision occurred or 𝑉 == 0 or episode_steps == max_episode_steps then</cell></row><row><cell>36:</cell><cell>episode_terminated ← True</cell></row><row><cell>37:</cell><cell>end if</cell></row><row><cell>38:</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>CNN architecture</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Layer</cell><cell cols="6">N. of Filters Kernel Size Stride Activation Function N. of Neurons</cell></row><row><cell>Convolutional</cell><cell>32</cell><cell>8 × 8</cell><cell>4</cell><cell>ReLU</cell><cell></cell><cell>/</cell></row><row><cell>Convolutional</cell><cell>64</cell><cell>4 × 4</cell><cell>2</cell><cell>ReLU</cell><cell></cell><cell>/</cell></row><row><cell>Convolutional</cell><cell>64</cell><cell>3 × 3</cell><cell>1</cell><cell>ReLU</cell><cell></cell><cell>/</cell></row><row><cell>Fully connected</cell><cell>/</cell><cell>/</cell><cell>/</cell><cell>ReLU</cell><cell></cell><cell>512</cell></row><row><cell>Fully connected (V)</cell><cell>/</cell><cell>/</cell><cell>/</cell><cell>Linear</cell><cell></cell><cell>1</cell></row><row><cell>Fully connected (A)</cell><cell>/</cell><cell>/</cell><cell>/</cell><cell>Linear</cell><cell></cell><cell>2</cell></row><row><cell>Table 2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Training hyperparameters</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Hyperparameter</cell><cell cols="2">Symbol Value</cell><cell cols="2">Hyperparameter</cell><cell cols="2">Symbol Value</cell></row><row><cell>Training steps</cell><cell>𝑁</cell><cell>10 6</cell><cell cols="2">Initial exploration rate</cell><cell>𝜀</cell><cell>1</cell></row><row><cell>Max episode steps Evaluation period</cell><cell>𝑀 𝐸</cell><cell>10 3 5 ⋅ 10 3</cell><cell cols="2">Exploration rate decay Buffer size</cell><cell>𝜀 𝑑 |𝒟 |</cell><cell>10 −6 1.5 ⋅ 10 5</cell></row><row><cell>Optimizer</cell><cell>/</cell><cell>RMSprop [35]</cell><cell cols="2">Discount factor</cell><cell>𝛾</cell><cell>0.99</cell></row><row><cell>Learning rate</cell><cell>𝑙𝑟</cell><cell>10 −4</cell><cell>Batch size</cell><cell></cell><cell>|ℬ|</cell><cell>256</cell></row><row><cell>Target update period</cell><cell>𝛿</cell><cell>10 3</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Traffic lights parameters</figDesc><table><row><cell cols="5">Traffic light Red &amp; Green Yellow Min. Duration Max. Duration</cell></row><row><cell>FTTL1</cell><cell>25𝑠</cell><cell>5𝑠</cell><cell>/</cell><cell>/</cell></row><row><cell>FTTL2</cell><cell>32𝑠</cell><cell>8𝑠</cell><cell>/</cell><cell>/</cell></row><row><cell>FTTLOPT</cell><cell>15𝑠</cell><cell>2𝑠</cell><cell>/</cell><cell>/</cell></row><row><cell>ATL1</cell><cell>25𝑠</cell><cell>5𝑠</cell><cell>10𝑠</cell><cell>40𝑠</cell></row><row><cell>ATL2</cell><cell>32𝑠</cell><cell>8𝑠</cell><cell>15𝑠</cell><cell>50𝑠</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The Python code of our work can be found at https://github.com/mcederle99/MAD4QN-PS. The authors want to stress the fact that reproducibility represents a crucial issue within this field of research.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Such a choice for the action space has been made because we observed that considering more velocity commands only introduced more complexity in the system, without increasing the performance of the algorithm.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">The baselines simulations with traffic lights have been performed exploiting Flow<ref type="bibr" target="#b35">[36]</ref>, another platform used to interface with SUMO, which easily allows for the definition and control of traffic lights.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This study was carried out within the MOST (the Italian National Center for Sustainable Mobility), Spoke 8: Mobility as a Service and Innovative Services, and received funding from Next-GenerationEU (Italian PNRR -CN00000023 -D.D. 1033 17/06/2022 -CUP C93C22002750006).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Connected &amp; autonomous vehicles-environmental impacts-a review</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kopelias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Demiridi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Vogiatzis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Skabardonis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Zafiropoulou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science of the total environment</title>
		<imprint>
			<biblScope unit="volume">712</biblScope>
			<biblScope unit="page">135237</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Smart mobility implementation in smart cities: A comprehensive review on state-of-art technologies</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Savithramma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">P</forename><surname>Ashwini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sumathi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT)</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="10" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Evaluating the safety impact of connected and autonomous vehicles on motorways</title>
		<author>
			<persName><forename type="first">A</forename><surname>Papadoulis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Quddus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Imprialou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Accident Analysis &amp; Prevention</title>
		<imprint>
			<biblScope unit="volume">124</biblScope>
			<biblScope unit="page" from="12" to="22" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Statistical accident analysis supporting the control of autonomous vehicles</title>
		<author>
			<persName><forename type="first">S</forename><surname>Szénási</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kertész</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Felde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Nádai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computational Methods in Sciences and Engineering</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="85" to="97" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Understanding the power of control in autonomous vehicles for people with vision impairment</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">N</forename><surname>Brewer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kameswaran</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th International ACM SIGAC-CESS Conference on Computers and Accessibility</title>
				<meeting>the 20th International ACM SIGAC-CESS Conference on Computers and Accessibility</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="185" to="197" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Systematic review: Automated vehicles and services for people with disabilities</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Dicianno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sivakanthan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Sundaram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satpute</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kulich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Powers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Deepak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Russell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Cooper</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neuroscience Letters</title>
		<imprint>
			<biblScope unit="volume">761</biblScope>
			<biblScope unit="page">136103</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A fleet of miniature cars for experiments in cooperative driving</title>
		<author>
			<persName><forename type="first">N</forename><surname>Hyldmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Prorok</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2019 International Conference on Robotics and Automation (ICRA)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3238" to="3244" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Influence of connected and autonomous vehicles on traffic flow stability and throughput</title>
		<author>
			<persName><forename type="first">A</forename><surname>Talebpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Mahmassani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transportation research part C: emerging technologies</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="143" to="163" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Greenhouse gas emission impact of autonomous vehicle introduction in an urban network</title>
		<author>
			<persName><forename type="first">J</forename><surname>Conlon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transportation Research Record</title>
		<imprint>
			<biblScope unit="volume">2673</biblScope>
			<biblScope unit="page" from="142" to="152" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A review on energy, environmental, and sustainability implications of connected and automated vehicles</title>
		<author>
			<persName><forename type="first">M</forename><surname>Taiebat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Safford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Environmental science &amp; technology</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="11449" to="11465" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An introduction to multiagent reinforcement learning and review of its application to autonomous mobility</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Brosig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Plinge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Eskofier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mutschler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="1342" to="1349" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Multiagent traffic management: A reservation-based intersection control mechanism</title>
		<author>
			<persName><forename type="first">K</forename><surname>Dresner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Stone</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Autonomous Agents and Multiagent Systems, International Joint Conference on</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="530" to="537" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Autonomous intersection management by using reinforcement learning</title>
		<author>
			<persName><forename type="first">P</forename><surname>Karthikeyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-A</forename><surname>Hsiung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Algorithms</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">326</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Advantage actor-critic for autonomous intersection management</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ayeelyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G.-H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-C</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-A</forename><surname>Hsiung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Vehicles</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1391" to="1412" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Cooperative behavior planning for automated driving using graph neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Klimke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Völz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Buchholz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Intelligent Vehicles Symposium (IV)</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="167" to="174" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">An enhanced graph representation for machine learning based automatic intersection management</title>
		<author>
			<persName><forename type="first">M</forename><surname>Klimke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gerigk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Völz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Buchholz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="523" to="530" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Safe and adaptive decision algorithm of automated vehicle for unsignalized intersection driving</title>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Brazilian Society of Mechanical Sciences and Engineering</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page">537</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Potential game-based decision-making for autonomous driving</title>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kolmanovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">E</forename><surname>Tseng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Filev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Girard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Intelligent Transportation Systems</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="8014" to="8027" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow&apos;s intersections</title>
		<author>
			<persName><forename type="first">G.-P</forename><surname>Antonio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Maria-Dolores</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Vehicular Technology</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="7033" to="7043" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">3-d surround view for advanced driver assistance systems</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Intelligent Transportation Systems</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="320" to="328" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Multiagent learning in the presence of memory-bounded agents</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Stone</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Autonomous agents and multi-agent systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="182" to="213" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Schaul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Quan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Antonoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1511.05952</idno>
		<title level="m">Prioritized experience replay</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Smarts: An open-source scalable multi-agent rl training school for autonomous driving</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Villella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rusu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Miao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Fadakar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Robot Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="264" to="285" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Reinforcement learning: An introduction</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Barto</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>MIT press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">A markovian decision process</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bellman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of mathematics and mechanics</title>
		<imprint>
			<biblScope unit="page" from="679" to="684" />
			<date type="published" when="1957">1957</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Learning from delayed rewards</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J C H</forename><surname>Watkins</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Human-level control through deep reinforcement learning</title>
		<author>
			<persName><forename type="first">V</forename><surname>Mnih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Rusu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Veness</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Bellemare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedmiller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Fidjeland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ostrovski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">nature</title>
		<imprint>
			<biblScope unit="volume">518</biblScope>
			<biblScope unit="page" from="529" to="533" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Stochastic games</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">S</forename><surname>Shapley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the national academy of sciences</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="1095" to="1100" />
			<date type="published" when="1953">1953</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Albrecht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Christianos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schäfer</surname></persName>
		</author>
		<title level="m">Multi-agent reinforcement learning: Foundations and modern approaches</title>
				<meeting><address><addrLine>Cambridge, MA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>Massachusetts Institute of Technology</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Deep reinforcement learning with double q-learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Van Hasselt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI conference on artificial intelligence</title>
				<meeting>the AAAI conference on artificial intelligence</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">30</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Dueling network architectures for deep reinforcement learning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schaul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hessel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hasselt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lanctot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Freitas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1995" to="2003" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Microscopic traffic simulation using sumo</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Behrisch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bieker-Walz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Erdmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-P</forename><surname>Flötteröd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hilbrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lücken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rummel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wagner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wießner</surname></persName>
		</author>
		<ptr target="https://elib.dlr.de/124092/" />
	</analytic>
	<monogr>
		<title level="m">The 21st IEEE International Conference on Intelligent Transportation Systems</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">M</forename><surname>Towers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">K</forename><surname>Terry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kwiatkowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">U</forename><surname>Balis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D</forename><surname>Cola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Deleu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Goulão</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kallinteris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krimmel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Perez-Vicente</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pierré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schulhoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Tai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T J</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">G</forename><surname>Younis</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.8127026</idno>
		<ptr target="https://zenodo.org/record/8127025.doi:10.5281/zenodo.8127026" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<pubPlace>Gymnasium</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Paszke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Devito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Desmaison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Antiga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lerer</surname></persName>
		</author>
		<title level="m">Automatic differentiation in pytorch</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Swersky</surname></persName>
		</author>
		<ptr target="https://class.coursera.org/neuralnets-2012-001/lecture" />
	</analytic>
	<monogr>
		<title level="m">Lecture 6a overview of mini-batch gradient descent</title>
		<title level="s">Coursera Lecture slides</title>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Flow: A modular learning framework for mixed autonomy traffic</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Kreidieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Parvate</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Vinitsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Bayen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Robotics</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="1270" to="1286" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<title level="m" type="main">Traffic signal settings</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">V</forename><surname>Webster</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1958">1958</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Reinforcement learning for mixed autonomy intersections</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Intelligent Transportation Systems Conference (ITSC), IEEE</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="2089" to="2094" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Duckietown: an open, inexpensive and flexible platform for autonomy education and research</title>
		<author>
			<persName><forename type="first">L</forename><surname>Paull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Alonso-Mora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Carlone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">F</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dusek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Robotics and Automation (ICRA)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="1497" to="1504" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
