<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Real Deal: A Review of Challenges and Opportunities in Moving Reinforcement Learning-Based Trafic Signal Control Systems Towards Reality</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rex Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fei Fang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norman Sadeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Software Research, School of Computer Science, Carnegie Mellon University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Trafic signal control (TSC) is a high-stakes domain that is growing in importance as trafic volume grows globally. An increasing number of works are applying reinforcement learning (RL) to TSC; RL can draw on an abundance of trafic data to improve signalling eficiency. However, RL-based signal controllers have never been deployed. In this work, we provide the first review of challenges that must be addressed before RL can be deployed for TSC. We focus on four challenges involving (1) uncertainty in detection, (2) reliability of communications, (3) compliance and interpretability, and (4) heterogeneous road users. We show that the literature on RL-based TSC has made some progress towards addressing each challenge. However, more work should take a systems thinking approach that considers the impacts of other pipeline components on RL.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Trafic signal control</kwd>
        <kwd>Reinforcement learning</kwd>
        <kwd>Intelligent transportation system</kwd>
        <kwd>System deployment</kwd>
        <kwd>Review</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As the trafic volume of metropolitan areas continues to grow worldwide, gridlock is becoming
an increasingly prevalent concern. According to the 2021 Urban Mobility Report [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], gridlock led
to over 4 billion hours in travel delay and $100+ million in congestion costs across the United
States in 2021. This not only impacts commercial productivity but also has environmental
consequences. One important mechanism for alleviating gridlock is improving the timing of
trafic signals [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Historically, most jurisdictions have used fixed timing plans based on trafic
models, which assume fixed values of factors such as lane volumes and arrival rates [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To
minimize implementation burden, traditional trafic signal control (TSC) either uses one fixed
plan throughout the entire day, or rotates through several plans depending on the time of the
day. However, fixed plans cannot respond in real time to changes in trafic demand [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Large trafic volumes also ofer an abundance of data that can be used for real-time
optimization of signal timing plans. Many deployed systems combine logic-triggered state changes with
data-driven searches over sets of schedules [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, an increasing number of approaches
traverse larger search spaces using optimization and scheduling algorithms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Among these
approaches, reinforcement learning (RL) has yielded significant improvements over fixed and
actuated TSC algorithms in simulations [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. RL allows systems to learn from the consequences
of their decisions, which enables them to achieve continuous self-improvement. Deployments
of RL algorithms have achieved success in a variety of complex domains involving human
interaction, such as card games [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], real-time strategy games [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and other applications in
transportation such as dispatching for ride-hailing services [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        However, to our knowledge, RL-based TSC algorithms have never been deployed. This is in
spite of the fact that papers introducing novel algorithms in this area commonly list real-world
deployment as a goal for future work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We believe that this discrepancy has arisen due to a
focus on methodological contributions, instead of on a holistic systems thinking approach based
on the data-to-deployment pipeline [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. If RL-based signal controllers are to achieve success in
deployment, domain experts in TSC and in RL must have a shared view of the problem. We
take a step towards bridging the gap between research and deployment by providing the first
review of challenges that may arise from end-to-end deployments of RL-based TSC, which we
intend to provide a common basis of collaboration between research in TSC and RL.
      </p>
      <p>We begin by describing our review methodology in Section 1.1. Then, we provide a high-level
review of the fields of TSC and RL in Section 2. Next, we explore four engineering challenges.
For each of these challenges, we will provide a review of (1) how these challenges are significant
concerns for the state of the art in RL-based TSC; (2) what practical considerations relevant to
these challenges have arisen in deployments of non-RL TSC systems; and (3) what progress has
been made in the RL-based TSC literature towards solving these challenges.
• Uncertainty in detection. (Section 3) Typically, RL-based TSC algorithms learn based
on metrics such as queue length or travel time. These require accurate vehicle detection
technologies, which may not always be available in the field. Strategies to deal with
detector uncertainty and failure are a prerequisite of deployment.
• Reliability of communications. (Section 4) Some decentralization is necessary for
RLbased TSC. Coordination between intersections is important for optimizing network-level
metrics, yet most work in RL-based TSC has not considered the practicalities of dealing
with failure and latency in inter-intersection communications.
• Compliance and interpretability. (Section 5) Jurisdictions will not have confidence
in RL-based signal controllers without assurances about compliance to standards (e.g.,
minimum green time) and safety requirements. The interpretability of models is important
for ensuring that signalling plans can be audited and adjusted by stakeholders.
• Heterogeneous road users. (Section 6) Most simulations for RL-based TSC assume that
all cars are the same size and have the same free-flow speed. However, cars share the
road with pedestrians, buses, emergency vehicles, and other road users. Algorithms must
detect and respond to the needs of diferent road users in a safe, equitable manner.</p>
      <p>Finally, we end with concluding thoughts and suggestions for future work in Section 7.</p>
      <sec id="sec-1-1">
        <title>1.1. Methodology</title>
        <p>–
To obtain an overview of the domain of RL-based TSC, we conducted a targeted search on
Google Scholar with the keywords “trafic signal”/“trafic light”, “reinforcement learning”, and
“review”/“survey”. We identified the four challenges addressed in the following sections through
these reviews. From here, we conducted snowball sampling based on their citations to locate
papers in the RL literature that discuss these challenges. For RL papers, we focused on those
published after 2015, since this field has rapidly evolved over the past several years. We also
performed additional targeted Google Scholar searches to find literature which describes non-RL
deployments of TSC, by searching the keywords “trafic signal”/“trafic light” and “adaptive” in
conjunction with the following keywords:
• For Section 3, “uncertainty”, “noise”, “sensing error”, “accuracy’.
• For Section 4, “coordination”, “communication”, “closed loop”, “message”, “NTCIP”.
• For Section 5, “compliance”, “safety”, “accountability”, “interpretability”/“explainability”.
• For Section 6, “pedestrian”/“leading pedestrian interval”, “cyclist”, “transit”, “emergency
vehicle”, “priority”, “preempt”.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Trafic signal control</title>
        <p>
          Trafic signal control (TSC) aims to allocate green time at an intersection to trafic moving in
diferent directions. Every approach (roadway entering the intersection) is split into lanes for
forward, left-turn, and (possibly) right turn movements (which may be assumed to always be
permissible) [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. For eficiency, pairs of compatible movements are often arranged into
phases and signalled simultaneously [
          <xref ref-type="bibr" rid="ref10 ref14 ref15">10, 14, 15</xref>
          ]. The task is to find some division of green time
between phases for each intersection in a road network, which maximizes metrics such as the
throughput of the network. We refer the reader to [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for details of the problem formulation.
        </p>
        <p>
          Diferent approaches to dividing green time include choosing phase durations or phase
sequences, or fixing a phase sequence within a cycle and choosing the length of the cycle or
the proportions of each phase within the cycle [
          <xref ref-type="bibr" rid="ref10 ref12 ref15">10, 12, 15</xref>
          ]. Three main types of algorithmic
approaches exist. In fixed-time control, which has historically been a popular strategy [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], a
small number of fixed plans are optimized based on past trafic data under the assumption of
uniform demand. In actuated control, detector inputs (such as vehicle presence data from loop
detectors) are used in conjunction with a fixed set of logical rules. Finally, adaptive control uses
more complex prediction and optimization algorithms to control signalling plans [
          <xref ref-type="bibr" rid="ref12 ref16">12, 16</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Reinforcement learning</title>
        <p>
          One emerging approach to adaptive control has been reinforcement learning (RL). RL is a
sequential decision-making paradigm wherein agents learn how to act through trial-and-error
interactions with an environment. The goal of RL is to learn policies, which describe how agents
–
should act given the state of the environment. Early work in reinforcement learning during
the 1980s and 1990s, which included the seminal -learning algorithm [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], relied on tabular
enumeration of environment states and agent actions. RL remained relatively dificult to scale
until the emergence of methods based on function approximation in the 2010s, specifically the
use of neural networks for deep RL [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Since then, the popularity and complexity of RL has
experienced explosive growth. Deep RL has also found novel applications in practical domains
such as robotics, natural language processing, finance, and healthcare [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Transportation has
been one of the most significant applications of deep RL, with tasks including autonomous
driving [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], vehicle dispatching [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and routing [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], and trafic signal control (see Section 2.3).
We refer the reader to [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] for an in-depth review of the history of reinforcement learning.
        </p>
        <p>
          The body of work that we review in this paper can be seen as a parallel to work in RL for
robotics that attempts to close the gap between simulations and reality. RL methods, especially
deep RL methods, require an abundance of data to learn from environmental interactions. Due
to the cost of real-world data collection, simulators are often employed instead to generate large
quantities of interactions. However, simulators can never perfectly emulate reality. This problem,
which is referred to as the reality gap [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], has been addressed by the sim-to-real literature.
Some sim-to-real methods employ randomization in sensors and controllers to learn robust
policies (domain randomization); some explicitly model the reality gap and try to unify the
feature spaces of the source and target environments (domain adaptation); some train policies
to generalize across diferent tasks ( meta-RL); some attempt to learn from demonstrations of
behaviour in target environments (imitation learning); and others attempt to improve simulators.
We refer the reader to [
          <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
          ] for surveys of these methods. In this work, we draw parallels
between some of these methods and developments in RL-based TSC. However, at the same
time, TSC involves unique challenges that are usually not present in robotics. Environments in
robotics where sim-to-real methods have been applied (see [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]) are usually highly controlled
with well-defined objectives (e.g., [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]) and minimal interaction with other agents. However,
TSC may be afected by varying environmental conditions and large numbers of road users.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Related reviews</title>
        <p>Various reviews of applications of RL in TSC have been published. While each of the following
reviews captures distinct aspects of the field that are highly relevant to our work, none of
them have focused on the key issue of practical engineering challenges that present barriers to
deployment, and — crucially — how to solve them instead of leaving them as open problems.</p>
        <p>
          [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], and [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] provided brief syntheses of early RL-based TSC methods in reviews of
applications of AI in transportation. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] were the first to take a systematic approach to
reviewing RL-based TSC algorithms; the former performed the first experimental comparison of
RL algorithms with a synthetic network, while the latter addressed data sources such as models
of road networks and vehicle arrivals. Both reviewed state, action, and reward formulations.
These reviews considered traditional algorithms in RL such as Q-learning and SARSA.
        </p>
        <p>
          With the increasing popularity of deep learning to address challenges of scalability in RL,
[
          <xref ref-type="bibr" rid="ref32 ref4">4, 32</xref>
          ] (the latter a follow-up to [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]) both reviewed deep RL methods for TSC and provided
recommendations for designing novel deep RL-based TSC algorithms. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] focused on choosing
state, action, and reward representations, with some discussion of data processing, but did not
–
consider downstream challenges in deployment. [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] provided a broad overview of various
algorithm and architecture designs with less of a focus on practicalities.
        </p>
        <p>
          Both [
          <xref ref-type="bibr" rid="ref15 ref33">15, 33</xref>
          ] reviewed alternative state, action, and reward formulations among deep
RLbased TSC algorithms, as well as options for inter-agent coordination and simulation-based
evaluation. They outlined, but did not investigate, challenges to deployment. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] further
compared deep RL-based algorithms to traditional actuated and adaptive methods. Likewise,
as part of a wider review on deep RL for intelligent trafic systems, [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] reviewed problem
formulations and the history of algorithmic developments for RL-based TSC. Finally, [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
performed a highly systematic overview of the past 26 years of research in this domain that
provides quantitative support for some of the patterns that we identify.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Uncertainty in detection</title>
      <sec id="sec-3-1">
        <title>3.1. Significance of challenges</title>
        <p>
          States are described in inputs to RL-based TSC algorithms using abstracted features. These
include vehicles’ queue lengths, positions, and speeds [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Many works take for granted that
these state features are readily available [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. As reported by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], 67% of surveyed papers
did not envision any specific data sources. Even in papers where potential data sources were
specified, it is unclear how robust the methods would be to detector noise or failure. For
instance, among algorithms that use vehicle positions as state features, [
          <xref ref-type="bibr" rid="ref36 ref37 ref38 ref39">36, 37, 38, 39</xref>
          ] all
used the simulator SUMO to obtain noiseless images of single-intersection toy networks; [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]
extended this approach with a 3D simulator for images from the perspectives of trafic cameras;
and [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] used simulated trafic in SUMO based on flow rates from trafic camera footage. Each
of these methods provides a sanitized representation that may not necessarily be representative
of real-world conditions. Furthermore, the loss of information to noise may cause state aliasing
[
          <xref ref-type="bibr" rid="ref42">42</xref>
          ], which hinders the generalizability of learned policies to diferent demand scenarios [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Lessons from deployments</title>
        <p>
          Types of instruments for trafic sensing include intrusive detectors (installed into the road
surface) and non-intrusive detectors (mounted above the road surface) [
          <xref ref-type="bibr" rid="ref44 ref45">44, 45</xref>
          ]. Among intrusive
detectors, loop detectors are relatively inexpensive, accurate, and robust to weather and time of
day, but they are also highly vulnerable to wear and tear [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. When they fail, loop detectors
are being increasingly replaced by non-intrusive detectors such as video-based and radar
detection systems [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ], which can be flexibly reconfigured to detect diferent road segments
and vehicle types. However, the accuracy of these systems degrades in inclement weather, and
video detectors are also inaccurate at night and on high-speed roads [
          <xref ref-type="bibr" rid="ref45 ref47">45, 47</xref>
          ]. RL-based signal
controllers must be designed with these limitations in mind; learning ensembles of models [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ]
to capture the strengths of diferent detectors may improve robustness. Although data about
speed and position from connected vehicles can be useful, penetration remains low, so they must
be integrated with traditional detector data. [
          <xref ref-type="bibr" rid="ref49">49</xref>
          ] showed in simulations that connected vehicle
data could improve adaptive control even with limited penetration. Furthermore, agencies may
configure their detectors diferently. To account for uncertainty in vehicle stopping positions,
–
for instance, the size of the detection zone behind the stop bar may vary [
          <xref ref-type="bibr" rid="ref50">50</xref>
          ]; detectors may
also report data at diferent frequencies [
          <xref ref-type="bibr" rid="ref51">51</xref>
          ]. Thus, verifying the mapping from real detector
data to abstract state representations is an important task for RL-based TSC.
        </p>
        <p>
          Agencies often address problems in detection by modifying their detection setup [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] or by
configuring parameters such as passage time (i.e., the amount of time that a phase is extended for
upon actuation) [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ]. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] explicitly addressed error in queue length detection for their adaptive
controller SURTRAC. To mitigate underestimation, they used heuristics based on diferences in
vehicle counts reported by advance and stop bar detectors [
          <xref ref-type="bibr" rid="ref52">52</xref>
          ]. They considered overestimation
acceptable, as it provides the algorithm with bufer time; similarly, [
          <xref ref-type="bibr" rid="ref53">53</xref>
          ] found that moderate
queue length overestimation significantly improves the performance of adaptive control.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Progress toward solutions</title>
        <p>Two lines of work within RL-based TSC have the potential to address detection uncertainty.</p>
        <p>
          First, various authors have investigated the efects of reducing the dimensionality of the state
space. In particular, [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] showed that complex image representations of intersection state achieve
inferior performance compared to a simple representation containing only vehicle counts and
phases. [
          <xref ref-type="bibr" rid="ref54">54</xref>
          ] reached similar conclusions with a state representation based on queue length.
Both papers also provided optimality results that connected these formulations to traditional
methods in TSC. Meanwhile, [
          <xref ref-type="bibr" rid="ref43 ref55">43, 55</xref>
          ] investigated the efects of switching to coarser state
representations with a single algorithm. [
          <xref ref-type="bibr" rid="ref55">55</xref>
          ] found that occupancy and speed data (e.g., from
loop detectors) yielded near-identical performance to high-fidelity position data (e.g., from
cameras). However, the experiments of [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] suggested that coarser state discretizations harm
generalization across sudden shifts in trafic flow. Regardless, simpler state representations
could facilitate identification and debugging of issues caused by detection uncertainty.
        </p>
        <p>
          Second, other work has attempted to imbue RL-based TSC algorithms with robustness to
detection uncertainty. Several methods are analogous to domain randomization in the
simto-real literature [
          <xref ref-type="bibr" rid="ref26 ref56">26, 56</xref>
          ]. The approach of [
          <xref ref-type="bibr" rid="ref57">57</xref>
          ] is closest to the sim-to-real literature: they
randomize weather and lighting conditions in their trafic simulator and train policies based on
the resulting images. [
          <xref ref-type="bibr" rid="ref58">58</xref>
          ] applied Dropout to neural network units to prevent overfitting and
thus to learn robust policies. They evaluated their algorithm with a simulation of probabilistic
detector failure. As is done in adversarial machine learning, [
          <xref ref-type="bibr" rid="ref59">59</xref>
          ] injected Gaussian noise into
queue length observations, and validated their approach with simulations where trucks cause
vehicle count overestimation. Meanwhile, to handle miscalibrated measurements, [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] combined
next state prediction with imitation learning from a real trafic controller (SCOOTS), [
          <xref ref-type="bibr" rid="ref60">60</xref>
          ] used
autoencoders to denoise input data, and [
          <xref ref-type="bibr" rid="ref61">61</xref>
          ] evaluated the efects of lane-blocking incidents
and detector noise on performance. Finally, in a growing body of work that uses connected
vehicle data for RL, [
          <xref ref-type="bibr" rid="ref62">62</xref>
          ] was the first to explicitly address partial observability by adding the
phase duration into the state space to learn its indirect impact on delay.
        </p>
        <p>Overall, these methods are helpful approaches for improving the robustness of RL-based TSC
to detection uncertainty. However, they should be designed and tuned to address the challenges
of specific deployments, leveraging past knowledge to identify and address potential causes of
detector noise or failure. It may also help to model partial observability as part of the problem.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Reliability of communications</title>
      <sec id="sec-4-1">
        <title>4.1. Significance of challenges</title>
        <p>
          –
Some level of controller decentralization is often applied in RL-based TSC, because the
computational cost of RL may be prohibitive when the state and action space dimensionalities are
high. At the same time, to ensure that controllers take the trafic conditions of other
intersections into account for signalling decisions, a growing number of works have implemented
mechanisms for inter-intersection coordination [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ]. Typical approaches involve sharing states
[
          <xref ref-type="bibr" rid="ref63 ref64 ref65 ref66 ref67 ref68">63, 64, 65, 66, 67, 68</xref>
          ], actions [
          <xref ref-type="bibr" rid="ref69">69</xref>
          ], or hidden state representations from neural networks [
          <xref ref-type="bibr" rid="ref70 ref71">70, 71</xref>
          ]
between controllers for neighbouring intersections. While much of this work has focused on
designing neural network architectures to leverage shared information (such as graph neural
networks [
          <xref ref-type="bibr" rid="ref66 ref67 ref70 ref71">66, 67, 70, 71</xref>
          ]), less attention has been devoted to the mechanisms by which
information must be exchanged in the first place. If there are inconsistencies in the availability of
communication infrastructure and detectors between intersections (see also Section 3), it is
unclear how they may afect the performance of RL-based TSC.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Lessons from deployments</title>
        <p>
          In practice, signal controllers are commonly deployed as part of closed-loop systems, where
control is distributed over three levels. At the top level, trafic management centres (TMCs)
make policy-based signalling decisions, often involving dialogue with other stakeholders. These
decisions are used to configure field master controllers (FMCs), which are installed on-site
and coordinate multiple local intersection controllers (LICs) [
          <xref ref-type="bibr" rid="ref72">72</xref>
          ]. Each FMC aggregates trafic
conditions reported by connected LICs to make signalling decisions over a small region; FMCs
also synchronize the clocks of LICs to ensure that they are coordinated [
          <xref ref-type="bibr" rid="ref12 ref14">12, 14</xref>
          ]. As 90% of TSC
systems in the United States are closed-loop [
          <xref ref-type="bibr" rid="ref73">73</xref>
          ], upgrades to adaptive control have largely
been implemented within this hierarchical organization [
          <xref ref-type="bibr" rid="ref51">51</xref>
          ]. LICs may make some limited
decisions based on local trafic conditions, but coordination is still largely delegated to FMCs
even in adaptive control [
          <xref ref-type="bibr" rid="ref72">72</xref>
          ]. Transitioning to adaptive control has also required agencies to
update to Type 2070 or ATC controllers [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], but some controllers in road networks may retain
relatively outdated hardware [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. RL-based signal controllers will likely be deployed into such
ecosystems, where control is distributed hierarchically and diferent intersections have diferent
capabilities for control and/or detection. Thus, algorithms based on techniques for domain
adaptation from the sim-to-real literature may be helpful.
        </p>
        <p>
          Messages are sent between controllers and TMCs using multiple communication media in
modern TSC systems [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. For wired connections, fibre optic cables are increasingly replacing
traditional copper wires or coaxial cables. Wireless communication systems implemented using
radio or Wi-Fi are also becoming increasingly common [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ]. Thus, communication bandwidth
is not likely to be a concern, except in jurisdictions where fibre optic infrastructure is not readily
available. However, a major issue reported by agencies in [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] was connection reliability: poor
signal strength often results in data loss or latency. In terms of data formatting, the NTCIP
1202 standard includes standard object definitions for actuated signal controllers, which has
also been used for adaptive systems [
          <xref ref-type="bibr" rid="ref73">73</xref>
          ]. Communications for RL would need to fit into this
standard, at least until it is updated (as has already been done for connected vehicles) [
          <xref ref-type="bibr" rid="ref74">74</xref>
          ].
In SURTRAC, [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] encoded data for communication between neighbouring intersections using
JSON messages with standard types.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Progress toward solutions</title>
        <p>
          One line of work in RL-based TSC has sought to learn more compact representations of
information. Although bandwidth is not a concern, reducing message dimensionality could still mitigate
the impact of communication failures. Several algorithms directly exchange state values of
learned policies instead of learning from exchanged state representations. In [
          <xref ref-type="bibr" rid="ref75 ref76">75, 76</xref>
          ], state
values are directly exchanged between neighbours and weighted; [
          <xref ref-type="bibr" rid="ref37 ref77 ref78">37, 77, 78</xref>
          ] leveraged the
max-plus algorithm for coordination graphs, which is known to converge to near-optimality
even for cyclic graphs [
          <xref ref-type="bibr" rid="ref79">79</xref>
          ]. Meanwhile, [
          <xref ref-type="bibr" rid="ref80">80</xref>
          ] designed an architecture to exchange information
from the previous time step to ensure robustness to latency, and showed that it asymptotically
reduces communication relative to neighbour-based approaches by 50%. [
          <xref ref-type="bibr" rid="ref81">81</xref>
          ] demonstrated that
cumulative rewards can be estimated based only on vehicle counts on inbound approaches.
        </p>
        <p>
          Some work has also focused on designing RL-based TSC algorithms for hierarchically
distributed frameworks of communication and control, which could improve RL’s robustness,
scalability, and applicability for deployment in closed-loop systems. [
          <xref ref-type="bibr" rid="ref82">82</xref>
          ] implemented a
twolevel architecture where LICs can either act independently or receive joint actions from FMCs
based on predictions of the regional trafic state. [
          <xref ref-type="bibr" rid="ref63">63</xref>
          ] introduced a feudal RL algorithm, in
which “manager” controllers do not directly control the actions of “worker” controllers, but
instead set goals that influence their rewards. [
          <xref ref-type="bibr" rid="ref83">83</xref>
          ] trained multiple sub-policies that minimize
various proxy metrics such as queue length and waiting time, and a high-level controller that
adaptively delegates control to sub-policies to minimize the longer-term metric of travel time.
However, all of these architectures are conceptual and further work is needed to deploy them.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Compliance and interpretability</title>
      <sec id="sec-5-1">
        <title>5.1. Significance of challenges</title>
        <p>
          At the heart of the fact that RL-based TSC algorithms have not been deployed are the potential
regulatory and safety risks that are introduced by RL [
          <xref ref-type="bibr" rid="ref15 ref34">15, 34</xref>
          ]. The issue of trust and safety for
RL is by no means exclusive to the domain of TSC [
          <xref ref-type="bibr" rid="ref84 ref85 ref86">84, 85, 86</xref>
          ], but in this case the stakes are
high because contollers must interact with a large number of human users and mistakes may
have fatal consequences. For RL-based signal controllers to be trusted, we need to assess — both
prospectively or retrospectively — whether their decisions comply with standards and reasonable
expectations [
          <xref ref-type="bibr" rid="ref87">87</xref>
          ]. However, the proliferation of deep RL algorithms based on complicated state
representations runs counter to this goal, as assessment of compliance is not possible if we
cannot understand or at least verify their decisions. At the same time, issues of interpretability
and safety have rarely been discussed in the literature on RL-based TSC [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and are more often
mentioned as desiderata for future work in reviews [
          <xref ref-type="bibr" rid="ref10 ref15 ref34">10, 15, 34</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Lessons from deployments</title>
        <p>
          –
In the real world, regulatory frameworks for trafic signalling are often scattershot. In the United
States, the federal Manual on Uniform Trafic Control Devices [
          <xref ref-type="bibr" rid="ref88">88</xref>
          ] includes standards about the
necessity, meaning, and placement of diferent trafic signals. Many of these standards involve
the control of individual movement signals, which would be abstracted away from RL through
phase-based action space definitions. However, factors such as yellow change and red clearance
intervals are left to “engineering judgement”. States may impose further requirements on signal
timing plans based on regional transportation policies [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In a review of signal timing policies
for 15 states, [
          <xref ref-type="bibr" rid="ref89">89</xref>
          ] found recommendations for factors such as minimum green, yellow change,
and red clearance intervals, as well as when to serve turn movements. Such recommendations
should be incorporated into the design of the RL action space, as was done by [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] who treated
safety constraints as inputs to SURTRAC. Yet, these recommendations can also be arbitrary and
dependent on data (e.g., vehicle and pedestrian clearing times [
          <xref ref-type="bibr" rid="ref89">89</xref>
          ]), and algorithmic approaches
to stakeholder preference learning [
          <xref ref-type="bibr" rid="ref90">90</xref>
          ] may help to find better values.
        </p>
        <p>
          One common strategy to ensure the safety of signal timing plans is to review common types
and causes of crashes in historical data [
          <xref ref-type="bibr" rid="ref89">89</xref>
          ]. Naturally, this is a reactive approach that requires
crashes to happen in the first place, and crash reports may also be biased by severity or by
environmental conditions [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Accident modification factors (AMFs) are a popular method of
quantitative analysis; they statistically estimate the efectiveness of particular changes to signal
timing plans based on their expected reductions in crash rate [
          <xref ref-type="bibr" rid="ref91 ref92 ref93">91, 92, 93</xref>
          ]. We are unaware of any
work in RL that estimates or uses AMFs, but they may be a valuable pathway to interpretability.
The Highway Safety Manual also provides standard crash risk assessment models, but these
models often require extensive tuning to local conditions [
          <xref ref-type="bibr" rid="ref94 ref95 ref96">94, 95, 96</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Progress toward solutions</title>
        <p>
          Some work has enhanced the interpretability of RL-based TSC through algorithm design. [
          <xref ref-type="bibr" rid="ref97">97</xref>
          ]
focused on learning surrogate policies that are regulatable, i.e. monotonic in state variables,
which allows parameters to be viewed as weights. [
          <xref ref-type="bibr" rid="ref98">98</xref>
          ] learned human-auditable decision tree
surrogates using VIPER, an algorithm that identifies critical states where suboptimality harms
future rewards. Closer to the literature on interpretability for machine learning, [
          <xref ref-type="bibr" rid="ref99">99</xref>
          ] used SHAP
values to analyze how induction loop detections contribute to choices of phases for a controller
in a simulated roundabout. They found that advance detectors have higher SHAP values as
they are more indicative of congestion. Similarly, [
          <xref ref-type="bibr" rid="ref57">57</xref>
          ] used Grad-CAM to generate heatmaps
for image-based inputs. Instead of directly interfacing with the simulator, [
          <xref ref-type="bibr" rid="ref100">100</xref>
          ] used logical
rules based on signal controllers to post-process RL policy outputs for ensuring compliance.
        </p>
        <p>
          Further work has applied heuristic modifications to RL algorithms to enforce safety. [
          <xref ref-type="bibr" rid="ref101">101</xref>
          ]
prevented their system from taking actions when pedestrians are detected in crosswalks, and
enforced minimum green times for pedestrians. [
          <xref ref-type="bibr" rid="ref102">102</xref>
          ] drew on their models of rear-end conflict
rates (based on various observable intersection state features [
          <xref ref-type="bibr" rid="ref103">103</xref>
          ]) to design a reward
formulation that minimizes such conflicts. Similarly, [
          <xref ref-type="bibr" rid="ref104">104</xref>
          ] used a binary logistic crash risk model to
define crash penalties while also minimizing waiting time. Using a state formulation based on
individual signals, [
          <xref ref-type="bibr" rid="ref105">105</xref>
          ] regularized the red light duration of signalling plans to mitigate unsafe
behaviour caused by driver frustration with extended red lights. [
          <xref ref-type="bibr" rid="ref106">106</xref>
          ] included yellow change
intervals in their action space and added a penalty for emergency braking by vehicles.
        </p>
        <p>
          While we have reviewed many promising methods that have been developed for the
interpretability and safety of RL-based TSC, more work is still needed on determining which of
these methods correspond well to stakeholder requirements. Furthermore, there is a substantial
literature on safe reinforcement learning using constrained optimization [
          <xref ref-type="bibr" rid="ref107 ref108 ref109">107, 108, 109</xref>
          ], which
has hitherto not been applied to TSC; it is likely that such work can provide more rigorous
theoretical guarantees about algorithm behaviour. We also believe that, to deal with safety
failures ethically, work is needed in algorithmic accountability for RL-based signal controllers.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Heterogeneous road users</title>
      <sec id="sec-6-1">
        <title>6.1. Significance of challenges</title>
        <p>
          Traditional models of trafic flow used for TSC assume, simplistically, that all vehicles are
identical [
          <xref ref-type="bibr" rid="ref110 ref111">110, 111</xref>
          ]. In reality, the assumption of identical or even unimodal trafic is often
unrealistic, because many types of vehicles and road users — each with diferent needs and
behavioural patterns — interact with each other on roads. RL algorithms can still implicitly
encode these assumptions through simplistic state spaces, since common state variables such as
queue length and vehicle position [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] do not account for inter-vehicle variation. Although such
state formulations can be helpful for deriving optimality results based on traditional models
in TSC [
          <xref ref-type="bibr" rid="ref3 ref54">3, 54</xref>
          ], it is unclear how these assumptions may impact the performance and safety of
RL-based signal controllers in practice, especially because road users such as pedestrians and
cyclists may behave non-intuitively. Dedicated simulators developed for RL-based TSC likewise
abstract away inter-vehicle variation [
          <xref ref-type="bibr" rid="ref112">112</xref>
          ]. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] found in 160 papers on RL-based TSC that only
three accounted for non-private vehicle types, and only one accounted for pedestrians.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Lessons from deployments</title>
        <p>
          In practice, agencies make a variety of adjustments to signalling plans to accommodate diferent
classes of road users other than regular passenger vehicles, including pedestrians, cyclists,
transit vehicles, and emergency vehicles [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In this section, we focus on current practice in
the field for pedestrians and transit/emergency vehicles. When balancing the needs of diferent
road user classes in RL-based signal controllers, stakeholders’ requirements should be taken
into account; in the US, for instance, agencies’ opinions difer on whether preemption for trains
should take priority over pedestrians [
          <xref ref-type="bibr" rid="ref89">89</xref>
          ].
        </p>
        <p>
          For pedestrians, the simplest option is for the pedestrian signal to be activated in the direction
of the through movement, as is implicitly assumed by many works in RL and made explicit in
some (e.g., [
          <xref ref-type="bibr" rid="ref113">113</xref>
          ]). However, doing so may cause pedestrians to impede the flow of left-turning
and right-turning trafic, which creates safety hazards. In practice, leading pedestrian intervals
(LePIs) mitigate this risk by allowing pedestrians to start crossing before cars are permitted to
make turns [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Alternative phase sequence designs add lagging pedestrian intervals (after
turning phases) or phases exclusively for pedestrians. [
          <xref ref-type="bibr" rid="ref114">114</xref>
          ] developed a benefit-cost model to
assess the safety-delay tradeofs for LePIs at individual intersections. Beyond safety, additional
work has tried to minimize the delay of pedestrians so that they are treated equitably compared
to drivers, as codified by regulations in Germany, the UK, and China [
          <xref ref-type="bibr" rid="ref115">115</xref>
          ]. For the deployed
SURTRAC system, [
          <xref ref-type="bibr" rid="ref116">116</xref>
          ] adaptively set pedestrian walk intervals based on predicted phase
lengths to avoid cutting them short, while [
          <xref ref-type="bibr" rid="ref117">117</xref>
          ] considered using vehicular volumes and
pedestrian actuation frequencies to switch between controller modes. We are unaware of any
work in RL that has explicitly included LePIs as part of the action space formulation.
        </p>
        <p>
          As for handling transit and emergency vehicles, typical strategies include the prioritization
and preemption of signals. Prioritization handles requests made by vehicles through
vehicle-toinfrastructure (V2I) communications, and may or may not result in adjustments to signalling
plans. Meanwhile, preemption (often used for firetrucks or trains) deterministically replaces
the signal plan with a predefined routine that favours the preempting vehicle. Typically, signal
controllers need multiple cycles after preemption to recover from the interruption [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The
adaptive SCATS controller natively implements both prioritization and preemption; compared to
prior practice, [
          <xref ref-type="bibr" rid="ref118">118</xref>
          ] found that SCATS’ performance improvements were robust to prioritization,
and [
          <xref ref-type="bibr" rid="ref119">119</xref>
          ] found that it could reduce recovery time from preemption. These results suggest the
potential of implementing prioritization and preemption with RL-based methods; in particular,
explicit modelling of recovery from preemption may further improve recovery times. In addition
to interactions at intersections, RL-based signal controllers should also consider the efects
of transit and emergency vehicles on trafic between intersections. For instance, when buses
are stopped on roads, they may block other trafic from passing. As initial steps towards
implementing bus prioritization in the SURTRAC system, [
          <xref ref-type="bibr" rid="ref120">120</xref>
          ] delayed the allocation of green
time in intersections located downstream from stopped buses, and [
          <xref ref-type="bibr" rid="ref121">121</xref>
          ] predicted bus dwelling
times at stops by leveraging V2I communications.
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Progress toward solutions</title>
        <p>
          One paper in RL-based TSC was cited by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] as explicitly modelling pedestrians: [
          <xref ref-type="bibr" rid="ref101">101</xref>
          ] defined
the reward using the weighted average of the local intersection’s vehicular queue length,
neighbouring intersections’ vehicular queue lengths, and the local intersection’s pedestrian
queue length. Beyond this paper, several other works have explicitly considered pedestrians as
part of the problem formulation. [
          <xref ref-type="bibr" rid="ref122">122</xref>
          ] likewise addressed joint vehicle-pedestrian control at
intersections, but made no assumptions about pedestrian detector capabilities. [
          <xref ref-type="bibr" rid="ref123">123</xref>
          ] used deep
RL to control a signalized crosswalk across a road (with the actions being to set the pedestrian
signal to green or red), and found that it outperformed actuation under moderate levels of
pedestrian demand in simulations. [
          <xref ref-type="bibr" rid="ref61">61</xref>
          ] analyzed the performance of RL-based TSC in the
presence of jaywalking pedestrians that cause vehicles to slow.
        </p>
        <p>
          Several works in RL-based TSC have also considered prioritization and preemption. For
prioritization, [
          <xref ref-type="bibr" rid="ref57">57</xref>
          ] upweighted buses and emergency vehicles in their throughput-based reward
formulation; [
          <xref ref-type="bibr" rid="ref124">124</xref>
          ] used a state representation based on the cell transmission trafic model and
modelled priority as a binary variable; [
          <xref ref-type="bibr" rid="ref125">125</xref>
          ] adopted an implicit approach based on minimizing
delay per person instead of per vehicle; [
          <xref ref-type="bibr" rid="ref126">126</xref>
          ] and [
          <xref ref-type="bibr" rid="ref127">127</xref>
          ] both considered prioritization for trams,
with the former’s rewards being based on tram schedule adherence and the latter using model
predictive control to model driver behaviour; and [
          <xref ref-type="bibr" rid="ref128">128</xref>
          ] adaptively altered vehicles’ priorities
depending on queue length, waiting time, and emergency vehicle presence. For preemption,
[
          <xref ref-type="bibr" rid="ref129">129</xref>
          ] learned TSC policies for emergency vehicle routing with rewards that encourage low
vehicle density, and [
          <xref ref-type="bibr" rid="ref130">130</xref>
          ] used RL to learn policies for notifying connected vehicles to clear out
lanes for emergency vehicles to pass.
        </p>
        <p>
          Lastly, [
          <xref ref-type="bibr" rid="ref100">100</xref>
          ] included demand data from the field for multiple types of road users — including
pedestrians, cyclists, motorcyclists, trucks, and buses — in their benchmark simulation for
RL-based TSC, LemgoRL, which is based on a real road network; they also included pedestrian
waiting times in rewards and enforced minimum pedestrian green times. There is a need to
connect high-fidelity simulations such as LemgoRL to the various approaches for handling
diferent road user classes that we outlined above, so as to ensure their ecological validity.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We have reviewed four barriers to the deployment of RL-based controllers for TSC. Each of these
barriers has been insuficiently addressed by the majority of new work in RL-based TSC, which
has focused on algorithmic contributions. However, TSC algorithms do not exist in a vacuum —
they must be trained based on data from detectors, interface with signals through controllers,
and control the movements of a variety of road users. Challenges both intrinsic to RL algorithms
and in other pipeline components may cascade into failures with significant implications for
the eficiency and safety of transportation infrastructure. Based on our literature review, we
suggested ways in which further work in RL-based TSC could address these challenges.</p>
      <p>
        Echoing the recommendations of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we emphasize the importance of engaging in
consultation with agency stakeholders and experts in TSC for RL practitioners. This can break down
information silos that would otherwise prevent the recognition of issues during requirements
engineering and integration (cf. [
        <xref ref-type="bibr" rid="ref131">131</xref>
        ]); we could not have identified these challenges ourselves
without engaging with the literature on traditional TSC. Additionally, as we discussed, the
practicalities of these challenges — including the availability and configuration of detectors,
signalling constraints, and the priorities of diferent road users — will often vary depending on the
statuses of road networks and their responsible agencies. While benchmark simulations based
on synthetic networks facilitate evaluation, we advocate for the creation of more simulations
like [
        <xref ref-type="bibr" rid="ref100">100</xref>
        ] that incorporate realistic domain constraints. RL algorithms that are trained using
such benchmarks would likely have better generalizability and robustness in deployments.
      </p>
      <p>
        More generally, we uncovered a diversity of work that addresses each challenge, which
previous reviews of TSC have not comprehensively surveyed. This suggests that RL-based
TSC is closer to deployment than might be suggested by a review of state-of-the-art methods.
If future developments focus on combining algorithmic improvements with both real-world
considerations and reproducibility techniques to facilitate collaboration [
        <xref ref-type="bibr" rid="ref132">132</xref>
        ], we believe that
the integration of RL to improve real-world transportation infrastructure is within reach.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The authors thank Christian Kästner, Eunsuk Kang, Stephanie Milani, Peide Huang, Ryan Shi,
and Steven Jecmen for useful information and suggestions that they provided to support the
drafting of this review.
–</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Schrank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eisele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lomax</surname>
          </string-name>
          ,
          <source>2021 Urban Mobility Report, Technical Report</source>
          ,
          <string-name>
            <surname>Texas</surname>
            <given-names>A</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>M Transportation Institute</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Franzese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gibson</surname>
          </string-name>
          ,
          <article-title>Temporary losses of highway capacity and impacts on performance: Phase 2</article-title>
          ,
          <string-name>
            <given-names>Technical</given-names>
            <surname>Report</surname>
          </string-name>
          <string-name>
            <surname>ORNL</surname>
          </string-name>
          /TM-2004/209, Oak Ridge National Laboratory,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gayah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Diagnosing reinforcement learning for trafic signal control, arXiv preprint (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>1905</year>
          .04716.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gregurić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vujić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alexopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Miletić</surname>
          </string-name>
          ,
          <article-title>Application of deep reinforcement learning in trafic signal control: An overview and impact of open trafic data</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <fpage>4011</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Barlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-F.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Rubinstein</surname>
          </string-name>
          ,
          <article-title>Smart urban signal networks: Initial application of the SURTRAC adaptive trafic signal control system</article-title>
          ,
          <source>in: Proceedings of the 23rd International Conference on Automated Planning and Scheduling</source>
          ,
          <source>ICAPS '13</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>434</fpage>
          -
          <lpage>442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Toward a thousand lights: Decentralized deep reinforcement learning for large-scale trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 34th AAAI Conference on Artificial Intelligence</source>
          , AAAI '
          <fpage>20</fpage>
          ,
          <year>2020</year>
          , pp.
          <fpage>3414</fpage>
          -
          <lpage>3421</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brown</surname>
          </string-name>
          , T. Sandholm,
          <article-title>Superhuman AI for heads-up no-limit poker: Libratus beats top professionals</article-title>
          ,
          <source>Science</source>
          <volume>359</volume>
          (
          <year>2017</year>
          )
          <fpage>418</fpage>
          -
          <lpage>424</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          , I. Babuschkin,
          <string-name>
            <surname>W. M. C. amd Michaël</surname>
            <given-names>Mathieu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudzik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Powell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ewalds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Georgiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Horgan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kroiss</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Danihelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Agapiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jaderberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Vezhnevets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leblond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pohlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dalibard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Budden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sulsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Molloy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Paine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pfaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yogatama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wünsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schaul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hassabis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Apps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <article-title>Grandmaster level in StarCraft II using multi-agent reinforcement learning</article-title>
          ,
          <source>Nature</source>
          <volume>575</volume>
          (
          <year>2019</year>
          )
          <fpage>350</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z. T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <article-title>Ride-hailing order dispatching at didi via reinforcement learning</article-title>
          ,
          <source>INFORMS Journal on Applied Analytics</source>
          <volume>50</volume>
          (
          <year>2020</year>
          )
          <fpage>272</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Noaeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Naik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Crebo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Abrar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. S. H.</given-names>
            <surname>Abad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Bazzan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Far</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning in urban network trafic signal control: A systematic literature review</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>199</volume>
          (
          <year>2022</year>
          )
          <fpage>116830</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Perrault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sinha</surname>
          </string-name>
          , M. Tambe,
          <article-title>AI for social impact: Learning and planning in the data-to-deployment pipeline, arXiv preprint (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>2001</year>
          .00088.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Gordon</surname>
          </string-name>
          , W. Tighe,
          <source>Trafic Control Systems Handbook, Federal Highway Administration</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Learning phase competition for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '19</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1963</fpage>
          -
          <lpage>1972</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Koonce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rodegerdts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Quayle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Beaird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Braud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bonneson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tarnof</surname>
          </string-name>
          , T. Urbanik, Trafic Signal Timing Manual, Federal Highway Administration,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gayah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey on trafic signal control methods, arXiv preprint (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>1904</year>
          .08117.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eom</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.-I. Kim</surname>
          </string-name>
          ,
          <article-title>The trafic signal control problem for intersections: a review</article-title>
          ,
          <source>European Transport Research Review</source>
          <volume>12</volume>
          (
          <year>2020</year>
          )
          <fpage>50</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>C. J. C. H. Watkins</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dayan</surname>
          </string-name>
          ,
          <article-title>Q-learning</article-title>
          ,
          <source>Machine Learning</source>
          <volume>8</volume>
          (
          <year>1992</year>
          )
          <fpage>279</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mnih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Rusu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Veness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Bellemare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Fidjeland</surname>
          </string-name>
          , G. Ostrovski,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Beattie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadik</surname>
          </string-name>
          , I. Antonoglou,
          <string-name>
            <given-names>H.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kumaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wierstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Legg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hassabis</surname>
          </string-name>
          ,
          <article-title>Human-level control through deep reinforcement learning</article-title>
          ,
          <source>Nature</source>
          <volume>518</volume>
          (
          <year>2015</year>
          )
          <fpage>529</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning</article-title>
          ,
          <source>arXiv preprint</source>
          (
          <year>2018</year>
          ). arXiv:
          <year>1810</year>
          .06339.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Kiran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sobh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Talpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mannion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A. A.</given-names>
            <surname>Sallab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yogamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning for autonomous driving: A survey</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nazari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oroojlooy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Takáč</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V.</given-names>
            <surname>Snyder</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning for solving the vehicle routing problem</article-title>
          ,
          <source>in: Proceedings of the 32nd International Conference on Neural Information Processing Systems</source>
          , NIPS '
          <volume>18</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>9861</fpage>
          -
          <lpage>9871</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Barto</surname>
          </string-name>
          ,
          <article-title>Early history of reinforcement learning</article-title>
          ,
          <source>in: Reinforcement Learning: An Introduction</source>
          , The MIT Press,
          <year>2018</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>J.-B. Mouret</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Chatzilygeroudis</surname>
          </string-name>
          ,
          <article-title>20 years of reality gap: a few thoughts about simulators in evolutionary robotics</article-title>
          ,
          <source>in: Proceedings of the 2017 Genetic and Evolutionary Computation Conference Companion, GECCO '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1121</fpage>
          -
          <lpage>1124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Queralta</surname>
          </string-name>
          , T. Westerlund,
          <article-title>Sim-to-real transfer in deep reinforcement learning for robotics: a survey</article-title>
          ,
          <source>in: Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence</source>
          , SSCI '
          <volume>20</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>737</fpage>
          -
          <lpage>744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dimitropoulos</surname>
          </string-name>
          , I. Hatzilygeroudis,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chatzilygeroudis</surname>
          </string-name>
          ,
          <article-title>A brief survey of Sim2Real methods for robot learning</article-title>
          ,
          <source>in: Proceedings of the 2022 International Conference on Robotics in Alpe-Adria Danube Region, RAAD '22</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Andrychowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chociej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Józefowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McGrew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pachocki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Petron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Plappert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Powell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sidor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tobin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaremba</surname>
          </string-name>
          ,
          <article-title>Learning dexterous in-hand manipulation</article-title>
          ,
          <source>The International Journal of Robotics Research</source>
          <volume>39</volume>
          (
          <year>2020</year>
          )
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>B.</given-names>
            <surname>Abdulhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kattan</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning: Introduction to theory and potential for transport applications</article-title>
          ,
          <source>Canadian Journal of Civil Engineering</source>
          <volume>30</volume>
          (
          <year>2003</year>
          )
          <fpage>981</fpage>
          -
          <lpage>991</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A. L. C.</given-names>
            <surname>Bazzan</surname>
          </string-name>
          ,
          <article-title>Opportunities for multiagent systems and multiagent reinforcement learning in trafic control</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>18</volume>
          (
          <year>2009</year>
          )
          <fpage>342</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A. L. C.</given-names>
            <surname>Bazzan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Klügl</surname>
          </string-name>
          ,
          <article-title>A review on agent-based technology for trafic and transportation</article-title>
          ,
          <source>The Knowledge Engineering Review</source>
          <volume>29</volume>
          (
          <year>2013</year>
          )
          <fpage>375</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mannion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duggan</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Howley,</surname>
          </string-name>
          <article-title>An experimental review of reinforcement learning algorithms for adaptive trafic signal control</article-title>
          ,
          <source>in: Autonomic Road Transport Support Systems</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>K.-L. A. Yau</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Qadir</surname>
            ,
            <given-names>H. L.</given-names>
          </string-name>
          <string-name>
            <surname>Khoo</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Komisarczuk</surname>
          </string-name>
          ,
          <article-title>A survey on reinforcement learning models and algorithms for trafic signal control</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>50</volume>
          (
          <year>2017</year>
          )
          <fpage>34</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rasheed</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-L. A. Yau</surname>
            ,
            <given-names>R. M.</given-names>
          </string-name>
          <string-name>
            <surname>Noor</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , Y.-C.
          <article-title>Low, Deep reinforcement learning for trafic signal control: A review</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>208016</fpage>
          -
          <lpage>208044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gayah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Recent advances in reinforcement learning for trafic signal control: A survey of models and evaluation</article-title>
          ,
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <fpage>12</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Haydari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning for intelligent transportation systems: A survey</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>23</volume>
          (
          <year>2022</year>
          )
          <fpage>11</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Deep Q learning-based trafic signal control algorithms: Model development and evaluation with field data</article-title>
          ,
          <source>Journal of Intelligent Transportation Systems</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>W.</given-names>
            <surname>Genders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razavi</surname>
          </string-name>
          ,
          <article-title>Using a deep reinforcement learning agent for trafic signal control, arXiv preprint (</article-title>
          <year>2016</year>
          ). arXiv:
          <volume>1611</volume>
          .
          <fpage>01142</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>E. van der</given-names>
            <surname>Pol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Oliehoek</surname>
          </string-name>
          ,
          <article-title>Coordinated deep reinforcement learners for trafic light control</article-title>
          ,
          <source>in: Proceedings of the 30th Conference on Neural Information Processing Systems</source>
          , NIPS '
          <volume>16</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Mousavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schukat</surname>
          </string-name>
          , E. Howley,
          <article-title>Trafic light control using deep policy-gradient and value-function based reinforcement learning</article-title>
          ,
          <source>IET Intelligent Transport Systems</source>
          <volume>11</volume>
          (
          <year>2017</year>
          )
          <fpage>417</fpage>
          -
          <lpage>423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <article-title>A deep reinforcement learning network for trafic light cycle control</article-title>
          ,
          <source>IEEE Transactions on Vehicular Technology</source>
          <volume>68</volume>
          (
          <year>2019</year>
          )
          <fpage>1243</fpage>
          -
          <lpage>1253</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chli</surname>
          </string-name>
          , G. Vogiatzis,
          <article-title>Deep reinforcement learning for autonomous trafic light control</article-title>
          ,
          <source>in: Proceedings of the 2018 3rd International Conference on Intelligent Transportation Engineering</source>
          , ICITE '
          <volume>18</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          , G. Zheng,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>IntelliLight: A reinforcement learning approach for intelligent trafic light control</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD '18</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2496</fpage>
          -
          <lpage>2505</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>M. T. J. Spaan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Vlassis</surname>
          </string-name>
          ,
          <article-title>A point-based pomdp algorithm for robot planning</article-title>
          ,
          <source>in: Proceedings of the 2004 IEEE International Conference on Robotics and Automation</source>
          ,
          <source>ICRA '04</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>2399</fpage>
          -
          <lpage>2404</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Alegre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Bazzan</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. C.</surname>
          </string-name>
          <article-title>da Silva, Quantifying the impact of non-stationarity in reinforcement learning-based trafic signal control</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>e575</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dodoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rubio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Penumala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pratt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sunkari</surname>
          </string-name>
          ,
          <source>Synthesis study of Texas signal control systems: technical report, Technical Report FHWA/TX-13/0-6670-1</source>
          ,
          <string-name>
            <surname>Texas</surname>
            <given-names>A</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>M Transportation Institute</surname>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sunkari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bibeka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balke</surname>
          </string-name>
          ,
          <article-title>Impact of Trafic Signal Controller Settings on the Use of Advanced Detection Devices</article-title>
          ,
          <source>Technical Report FHWA/TX-18/0-6934-R1</source>
          ,
          <string-name>
            <surname>Texas</surname>
            <given-names>A</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>M Transportation Institute</surname>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gibson</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. K. P. Mills</surname>
            ,
            <given-names>D. R.</given-names>
          </string-name>
          <string-name>
            <surname>Jr</surname>
          </string-name>
          .,
          <article-title>Staying in the loop: The search for improved reliability of trafic sensing systems through smart test instruments</article-title>
          ,
          <source>Public Roads</source>
          <volume>62</volume>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rhodes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Bullock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Sturdevant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. T.</given-names>
            <surname>Clark</surname>
          </string-name>
          , Evaluation of Stop Bar Video Detection Accuracy at Signalized Intersections,
          <source>Technical Report FHWA</source>
          /IN/JTRP-2005/28, Joint Transportation Research Program, Indiana Department of Transportation and Purdue University,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Laskin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          , P. Abbeel,
          <article-title>SUNRISE: A simple unified framework for ensemble learning in deep reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 38th International Conference on Machine Learning, ICML '21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>6131</fpage>
          -
          <lpage>6141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>S. M. A. B. A.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tajalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mohebifard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hajbabaie</surname>
          </string-name>
          ,
          <article-title>Efects of connectivity and trafic observability on an adaptive trafic signal control system</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>2675</volume>
          (
          <year>2021</year>
          )
          <fpage>800</fpage>
          -
          <lpage>814</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>A. M. T. Emtenan</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          <string-name>
            <surname>Day</surname>
          </string-name>
          ,
          <article-title>Impact of detector configuration on performance measurement and signal operations</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>2674</volume>
          (
          <year>2020</year>
          )
          <fpage>300</fpage>
          -
          <lpage>313</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>F.</given-names>
            <surname>Luyanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gettman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Head</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shelby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bullock</surname>
          </string-name>
          , P. Mirchandani,
          <article-title>ACS-Lite algorithmic architecture: Applying adaptive control system technology to closed-loop trafic signal control systems</article-title>
          ,
          <source>Transportation Research Record</source>
          <year>1856</year>
          (
          <year>2003</year>
          )
          <fpage>175</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>X.-F.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Barlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. B.</given-names>
            <surname>Rubinstein</surname>
          </string-name>
          ,
          <article-title>Accounting for Real-World Uncertainty in Real-Time Adaptive Trafic Control</article-title>
          ,
          <source>Technical Report ATCSTR12</source>
          , Carnegie Mellon University,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hengst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ye</surname>
          </string-name>
          , E. Huang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aydos</surname>
          </string-name>
          , G. Geers,
          <article-title>On the performance of adaptive trafic signal control</article-title>
          ,
          <source>in: Proceedings of the Second International Workshop on Computational Transportation Science, ICWTS '09</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Zheng,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gayah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>PressLight: Learning max pressure control to coordinate trafic signals in arterial network</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1290</fpage>
          -
          <lpage>1298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>W.</given-names>
            <surname>Genders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razavi</surname>
          </string-name>
          ,
          <article-title>Evaluating reinforcement learning state representations for adaptive trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 9th International Conference on Ambient Systems, Networks and Technologies, ANT '18</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tobin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaremba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          ,
          <article-title>Domain randomization for transferring deep neural networks from simulation to the real world</article-title>
          ,
          <source>in: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chli</surname>
          </string-name>
          , G. Vogiatzis,
          <article-title>Fully-autonomous, vision-based trafic signal control: from simulation to reality</article-title>
          ,
          <source>in: Proceedings of the 21th International Conference on Autonomous Agents and MultiAgent Systems</source>
          , AAMAS '
          <volume>22</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>454</fpage>
          -
          <lpage>462</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Azevedo</surname>
          </string-name>
          ,
          <article-title>Towards robust deep reinforcement learning for trafic signal control: Demand surges, incidents and sensor failures</article-title>
          ,
          <source>in: Proceedings of the 2019 International Conference on Intelligent Transportation Systems, ITSC '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3559</fpage>
          -
          <lpage>3566</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [59]
          <string-name>
            <surname>K. L. Tan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <article-title>Robust deep reinforcement learning for trafic signal control</article-title>
          ,
          <source>Journal of Big Data Analytics in Transportation 2</source>
          (
          <year>2020</year>
          )
          <fpage>263</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A regional trafic signal control strategy with deep reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 37th Chinese Control Conference, CCC '18</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>7690</fpage>
          -
          <lpage>7695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aslani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seipel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Mesgari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiering</surname>
          </string-name>
          ,
          <article-title>Trafic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran</article-title>
          ,
          <source>Advanced Engineering Informatics</source>
          <volume>38</volume>
          (
          <year>2018</year>
          )
          <fpage>639</fpage>
          -
          <lpage>655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          [62]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Di,</surname>
          </string-name>
          <article-title>CVLight: Decentralized learning for adaptive trafic signal control with connected vehicles, arXiv preprint (</article-title>
          <year>2021</year>
          ). arXiv:
          <volume>2104</volume>
          .
          <fpage>10340</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          [63]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Feudal multi-agent deep reinforcement learning for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems</source>
          , AAMAS '
          <volume>20</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>816</fpage>
          -
          <lpage>824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          [64]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Codecà</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Multi-agent deep reinforcement learning for large-scale trafic signal control</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1086</fpage>
          -
          <lpage>1095</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          [65]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Network-wide trafic signal control based on the discovery of critical nodes and deep reinforcement learning</article-title>
          ,
          <source>Journal of Intelligent Transportation Systems</source>
          <volume>24</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          [66]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Trafic signal control with reinforcement learning based on region-aware cooperative strategy</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          [67]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <article-title>GraphLight: Graph-based reinforcement learning for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 6th International Conference on Computer and Communication Systems, ICCCS '21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>645</fpage>
          -
          <lpage>650</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          [68]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Braud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Kangasharju,</surname>
          </string-name>
          <article-title>ERL: Edge based reinforcement learning for optimized urban trafic light control</article-title>
          ,
          <source>in: Proceedings of the 3rd International Workshop on Smart Edge Computing and Networking</source>
          ,
          <source>SmartEdge '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>849</fpage>
          -
          <lpage>854</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          [69]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ren</surname>
          </string-name>
          , G. Tan,
          <article-title>Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>40797</fpage>
          -
          <lpage>40809</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          [70]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Zheng,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>CoLight: Learning network-level cooperation for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1913</fpage>
          -
          <lpage>1922</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          [71]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Otaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hayakawa</surname>
          </string-name>
          , T. Yoshimura,
          <article-title>Trafic signal control based on reinforcement learning with graph convolutional neural nets</article-title>
          ,
          <source>in: Proceedings of the 2018 International Conference on Intelligent Transportation Systems, ITSC '18</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>877</fpage>
          -
          <lpage>883</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          [72]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Comparison of current practical adaptive trafic control systems</article-title>
          ,
          <source>in: Proceedings of the 10th International Conference of Chinese Transportation Professionals, ICCTP '10</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1611</fpage>
          -
          <lpage>1619</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref73">
        <mixed-citation>
          [73]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gettman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Shelby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Head</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Bullock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Soyke</surname>
          </string-name>
          ,
          <article-title>Data-driven algorithms for real-time adaptive tuning of ofsets in coordinated trafic signal systems</article-title>
          ,
          <source>Transportation Research Record</source>
          <year>2035</year>
          (
          <year>2007</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref74">
        <mixed-citation>
          [74]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Leslie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balse</surname>
          </string-name>
          ,
          <source>Infrastructure Connectivity Certification Test Procedures for Infrastructure-Based Connected Automated Vehicle Components: Test Procedures, Signal Phase and Timing - NTCIP 1202 v03, Technical Report FHWA-JPO-20-802</source>
          , Leidos,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref75">
        <mixed-citation>
          [75]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Intelligent transportation control based on proactive complex event processing</article-title>
          ,
          <source>in: Proceedings of the 3rd International Conference on Mechanics and Mechatronics Research</source>
          , ICMMR '
          <volume>16</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref76">
        <mixed-citation>
          [76]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Qin,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>Distributed cooperative reinforcement learning-based trafic signal control that integrates V2X networks' dynamic clustering</article-title>
          ,
          <source>IEEE Transactions on Vehicular Technology</source>
          <volume>66</volume>
          (
          <year>2017</year>
          )
          <fpage>8667</fpage>
          -
          <lpage>8681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref77">
        <mixed-citation>
          [77]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-S.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Cooperative trafic signal control using multi-step return and of-policy asynchronous advantage actor-critic graph algorithm</article-title>
          ,
          <source>KnowledgeBased Systems</source>
          <volume>183</volume>
          (
          <year>2019</year>
          )
          <fpage>104855</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref78">
        <mixed-citation>
          [78]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Trafic signal control by distributed reinforcement learning with minsum communication</article-title>
          ,
          <source>in: Proceedings of the 2017 American Control Conference, ACC '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5095</fpage>
          -
          <lpage>5100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref79">
        <mixed-citation>
          [79]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Kok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vlassis</surname>
          </string-name>
          ,
          <article-title>Using the max-plus algorithm for multiagent decision making in coordination graphs</article-title>
          ,
          <source>in: Proceedings of the Fourth Robot Soccer World Cup, RoboCup '05</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref80">
        <mixed-citation>
          [80]
          <string-name>
            <given-names>D.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dong</surname>
          </string-name>
          , IEDQN:
          <article-title>Information exchange DQN with a centralized coordinator for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 2020 International Joint Conference on Neural Networks, IJCNN '20</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref81">
        <mixed-citation>
          [81]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <article-title>Multi-agent reinforcement learning for trafic signal control through universal communication method, arXiv preprint (</article-title>
          <year>2022</year>
          ). arXiv:
          <volume>2204</volume>
          .
          <fpage>12190</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref82">
        <mixed-citation>
          [82]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdoos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Bazzan</surname>
          </string-name>
          ,
          <article-title>Hierarchical trafic signal optimization using reinforcement learning and trafic prediction with long-short term memory</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>171</volume>
          (
          <year>2021</year>
          )
          <fpage>114580</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref83">
        <mixed-citation>
          [83]
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Hierarchically and cooperatively learning trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 35th AAAI Conference on Artificial Intelligence</source>
          , AAAI '
          <fpage>21</fpage>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref84">
        <mixed-citation>
          [84]
          <string-name>
            <given-names>L.</given-names>
            <surname>Brunke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Greef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Panerati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Schoellig</surname>
          </string-name>
          ,
          <article-title>Safe learning in robotics: From learning-based control to safe reinforcement learning</article-title>
          ,
          <source>Annual Review of Control, Robotics, and Autonomous Systems</source>
          <volume>5</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref85">
        <mixed-citation>
          [85]
          <string-name>
            <given-names>J.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on safe reinforcement learning</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>16</volume>
          (
          <year>2015</year>
          )
          <fpage>1437</fpage>
          -
          <lpage>1480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref86">
        <mixed-citation>
          [86]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nemati</surname>
          </string-name>
          , G. Yin,
          <article-title>Reinforcement learning in healthcare: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref87">
        <mixed-citation>
          [87]
          <string-name>
            <given-names>F. R.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Habli</surname>
          </string-name>
          ,
          <article-title>An assurance case pattern for the interpretability of machine learning in safety-critical systems</article-title>
          ,
          <source>in: Proceedings of the 2020 International Conference on Computer Safety</source>
          , Reliability, and Security,
          <source>SAFECOMP '20</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>395</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref88">
        <mixed-citation>
          [88]
          <string-name>
            <surname>DOT</surname>
          </string-name>
          , Manual on Uniform Trafic Signal Control Devices, revision 2 ed.,
          <source>US Department of Transportation</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref89">
        <mixed-citation>
          [89]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bonneson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pratt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zimmerman</surname>
          </string-name>
          ,
          <article-title>Development of a Trafic Signal Operations Handbook</article-title>
          ,
          <source>Technical Report FHWA/TX-09/0-5629-1</source>
          ,
          <string-name>
            <surname>Texas</surname>
            <given-names>A</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>M Transportation Institute</surname>
          </string-name>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref90">
        <mixed-citation>
          [90]
          <string-name>
            <surname>M. K. Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kusbit</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kahng</surname>
            ,
            <given-names>J. T.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>See</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Noothigattu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Psomas</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <string-name>
            <surname>Procaccia</surname>
          </string-name>
          ,
          <article-title>WeBuildAI: Participatory framework for algorithmic governance</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>3</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref91">
        <mixed-citation>
          [91]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bonneson</surname>
          </string-name>
          ,
          <article-title>Role and application of accident modification factors within highway design process</article-title>
          ,
          <source>Transportation Research Record</source>
          <year>1961</year>
          (
          <year>2006</year>
          )
          <fpage>65</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref92">
        <mixed-citation>
          [92]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Validation of crash modification factors derived from crosssectional studies with regression models</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>2514</volume>
          (
          <year>2015</year>
          )
          <fpage>88</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref93">
        <mixed-citation>
          [93]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          , M. D.
          <string-name>
            <surname>Fontaine</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Estimation of crash modification factors for an adaptive trafic-signal control system</article-title>
          ,
          <source>Journal of Transportation Engineering</source>
          <volume>142</volume>
          (
          <year>2016</year>
          )
          <fpage>04016061</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref94">
        <mixed-citation>
          [94]
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Magri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Shirazi</surname>
          </string-name>
          ,
          <article-title>Application of Highway Safety Manual draft chapter: Louisiana experience</article-title>
          ,
          <source>Transportation Research Record</source>
          <year>1950</year>
          (
          <year>2006</year>
          )
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref95">
        <mixed-citation>
          [95]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Brown</surname>
          </string-name>
          , P. Edara,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carlos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nam</surname>
          </string-name>
          ,
          <article-title>Calibration of the Highway Safety Manual for Missouri</article-title>
          ,
          <source>Technical Report 25-1121-0003-177</source>
          , Mid-America Transportation Center,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref96">
        <mixed-citation>
          [96]
          <string-name>
            <given-names>F.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gladhill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Dixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Monsere</surname>
          </string-name>
          ,
          <article-title>Calibration of Highway Safety Manual predictive models for Oregon state highways</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>2241</volume>
          (
          <year>2011</year>
          )
          <fpage>19</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref97">
        <mixed-citation>
          [97]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Hanna</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sharon, Learning an interpretable trafic signal control policy</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems</source>
          , AAMAS '
          <volume>20</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref98">
        <mixed-citation>
          [98]
          <string-name>
            <given-names>V.</given-names>
            <surname>Jayawardana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Landler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Mixed autonomous supervision in trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 2021 International Conference on Intelligent Transportation Systems, ITSC '21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1767</fpage>
          -
          <lpage>1773</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref99">
        <mixed-citation>
          [99]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Rizzo</surname>
          </string-name>
          , G. Vantini,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning with explainability for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 2019 International Conference on Intelligent Transportation Systems, ITSC '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3567</fpage>
          -
          <lpage>3572</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref100">
        <mixed-citation>
          [100]
          <string-name>
            <given-names>A.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rangras</surname>
          </string-name>
          , G. Schnittker,
          <string-name>
            <given-names>M.</given-names>
            <surname>Waldmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Friesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ferfers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schreckenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hufen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jasperneite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiering</surname>
          </string-name>
          ,
          <article-title>Towards real-world deployment of reinforcement learning for trafic signal control</article-title>
          ,
          <source>in: Proceedings of the 20th IEEE International Conference on Machine Learning and Applications</source>
          , ICMLA '
          <volume>21</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>507</fpage>
          -
          <lpage>514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref101">
        <mixed-citation>
          [101]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Intelligent trafic light control using distributed multi-agent Q learning</article-title>
          ,
          <source>in: Proceedings of the 2017 International Conference on Intelligent Transportation Systems, ITSC '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref102">
        <mixed-citation>
          [102]
          <string-name>
            <given-names>M.</given-names>
            <surname>Essa</surname>
          </string-name>
          , T. Sayed,
          <article-title>Self-learning adaptive trafic signal control for real-time safety optimization</article-title>
          ,
          <source>Accident Analysis &amp; Prevention</source>
          <volume>146</volume>
          (
          <year>2020</year>
          )
          <fpage>105713</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref103">
        <mixed-citation>
          [103]
          <string-name>
            <given-names>M.</given-names>
            <surname>Essa</surname>
          </string-name>
          , T. Sayed,
          <article-title>Trafic conflict models to evaluate the safety of signalized intersections at the cycle level</article-title>
          ,
          <source>Transportation Research Part C: Emerging Technologies</source>
          <volume>89</volume>
          (
          <year>2018</year>
          )
          <fpage>289</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref104">
        <mixed-citation>
          [104]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdel-Aty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q.</surname>
          </string-name>
          <article-title>Cai, Multi-objective reinforcement learning approach for improving safety at intersections with adaptive trafic signal control</article-title>
          ,
          <source>Accident Analysis &amp; Prevention</source>
          <volume>144</volume>
          (
          <year>2020</year>
          )
          <fpage>105655</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref105">
        <mixed-citation>
          [105]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Zhang,</surname>
          </string-name>
          <article-title>Time diference penalized trafic signal timing by LSTM Q-network to balance safety and capacity at intersections</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>80086</fpage>
          -
          <lpage>80096</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref106">
        <mixed-citation>
          [106]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <article-title>Smarter and safer trafic signal controlling via deep reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 29th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '20</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3345</fpage>
          -
          <lpage>3348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref107">
        <mixed-citation>
          [107]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bohez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdolmaleki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neunert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buchli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Heess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <article-title>Value constrained model-free continuous control, arXiv preprint (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>1902</year>
          .04623.
        </mixed-citation>
      </ref>
      <ref id="ref108">
        <mixed-citation>
          [108]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Basar,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          ,
          <article-title>Natural policy gradient primal-dual method for constrained Markov decision processes</article-title>
          ,
          <source>in: Proceedings of the 34th International Conference on Neural Information Processing Systems</source>
          ,
          <source>NeurIPS '20</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8378</fpage>
          -
          <lpage>8390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref109">
        <mixed-citation>
          [109]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Isenbaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Constrained variational policy optimization for safe reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 39th International Conference on Machine Learning, ICML '22</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref110">
        <mixed-citation>
          [110]
          <string-name>
            <given-names>D.</given-names>
            <surname>Branston</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. van Zuylen</surname>
          </string-name>
          ,
          <article-title>Comparison of queue-length models at signalized intersections</article-title>
          ,
          <source>Transportation Research</source>
          <volume>12</volume>
          (
          <year>1978</year>
          )
          <fpage>47</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref111">
        <mixed-citation>
          [111]
          <string-name>
            <given-names>F.</given-names>
            <surname>Viloria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Courage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Avery</surname>
          </string-name>
          ,
          <article-title>Comparison of queue-length models at signalized intersections</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>1710</volume>
          (
          <year>2000</year>
          )
          <fpage>222</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref112">
        <mixed-citation>
          [112]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Feng, C. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>CityFlow: A multi-agent reinforcement learning environment for large scale city trafic scenario</article-title>
          ,
          <source>in: Proceedings of the 2019 World Wide Web Conference, WWW '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3620</fpage>
          -
          <lpage>3624</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref113">
        <mixed-citation>
          [113]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Askary</surname>
          </string-name>
          ,
          <article-title>A reinforcement learning approach for intelligent trafic signal control at urban intersections</article-title>
          ,
          <source>in: Proceedings of the 2019 International Conference on Intelligent Transportation Systems, ITSC '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4242</fpage>
          -
          <lpage>4247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref114">
        <mixed-citation>
          [114]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Smaglik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kothuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koonce</surname>
          </string-name>
          , T. Huang,
          <article-title>Leading pedestrian intervals: Treating the decision to implement as a marginal benefit-cost problem</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>2620</volume>
          (
          <year>2017</year>
          )
          <fpage>96</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref115">
        <mixed-citation>
          [115]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boltze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakamura</surname>
          </string-name>
          ,
          <article-title>Initial comparative analysis of international practice in road trafic signal control</article-title>
          ,
          <source>in: Global Practices on Road Trafic Signal Control, Elsevier</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>285</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref116">
        <mixed-citation>
          [116]
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Surtrac for the People: Upgrading the Surtrac Pittsburgh Deployment to incorporate Pedestrian Friendly Extensions and Remote Monitoring Advances</article-title>
          ,
          <source>Technical Report 01730614, Mobility21</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref117">
        <mixed-citation>
          [117]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kothuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kading</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Smaglik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sobie</surname>
          </string-name>
          , Improving Walkability Through Control Strategies at Signalized Intersections,
          <source>Technical Report NITC-RR-782, National Institute for Transportation and Communities</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref118">
        <mixed-citation>
          [118]
          <string-name>
            <given-names>C.</given-names>
            <surname>Slavin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Figliozzi</surname>
          </string-name>
          , P. Koonce,
          <article-title>Statistical study of the impact of adaptive trafic signal control on trafic and transit performance</article-title>
          ,
          <source>Transportation Research Record</source>
          <volume>2356</volume>
          (
          <year>2016</year>
          )
          <fpage>117</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref119">
        <mixed-citation>
          [119]
          <string-name>
            <given-names>J.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. O'Brien</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pachman</surname>
          </string-name>
          ,
          <string-name>
            <surname>Memorandum: Farmington Road Adaptive Trafic Control Benefits Analysis</surname>
          </string-name>
          ,
          <source>Technical Report</source>
          , DKS Associates,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref120">
        <mixed-citation>
          [120]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahendran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-F.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Bus Detection for Adaptive Trafic Signal Control</article-title>
          ,
          <source>Technical Report</source>
          , Carnegie Mellon University,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref121">
        <mixed-citation>
          [121]
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Isukapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bronstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Igoe</surname>
          </string-name>
          ,
          <article-title>Integrating transit signal priority with adaptive signal control in a connected vehicle environment: Phase 1 Final Report</article-title>
          ,
          <source>Technical Report 01675986, Mobility21</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref122">
        <mixed-citation>
          [122]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Menendez</surname>
          </string-name>
          ,
          <article-title>A reinforcement learning method for trafic signal control at an isolated intersection with pedestrian flows</article-title>
          ,
          <source>in: Proceedings of the 19th COTA International Conference of Transportation Professionals, CICTP '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3123</fpage>
          -
          <lpage>3135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref123">
        <mixed-citation>
          [123]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Fricker,
          <article-title>Investigating smart trafic signal controllers at signalized crosswalks: A reinforcement learning approach</article-title>
          ,
          <source>in: Proceedings of the 7th International Conference on Models and Technologies for Intelligent Transportation Systems, MT-ITS '21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref124">
        <mixed-citation>
          [124]
          <string-name>
            <given-names>P.</given-names>
            <surname>Chanloha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chinrungrueng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Usaha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aswakul</surname>
          </string-name>
          ,
          <article-title>Cell transmission model-based multiagent Q-learning for network-scale signal control with transit priority</article-title>
          ,
          <source>The Computer Journal</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>451</fpage>
          -
          <lpage>468</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref125">
        <mixed-citation>
          [125]
          <string-name>
            <given-names>S. M. A.</given-names>
            <surname>Shabestray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Abdulhai</surname>
          </string-name>
          ,
          <article-title>Multimodal iNtelligent Deep (MiND) trafic signal controller</article-title>
          ,
          <source>in: Proceedings of the 2019 International Conference on Intelligent Transportation Systems, ITSC '19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4532</fpage>
          -
          <lpage>4539</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref126">
        <mixed-citation>
          [126]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Schedule-driven signal priority control for modern trams using reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 17th COTA International Conference of Transportation Professionals, CICTP '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2122</fpage>
          -
          <lpage>2132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref127">
        <mixed-citation>
          [127]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>An integrated MPC and deep reinforcement learning approach to trams-priority active signal control</article-title>
          ,
          <source>Control Engineering Practice</source>
          <volume>110</volume>
          (
          <year>2021</year>
          )
          <fpage>104758</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref128">
        <mixed-citation>
          [128]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dhakad</surname>
          </string-name>
          ,
          <article-title>An integrated MPC and deep reinforcement learning approach to trams-priority active signal control</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <fpage>4919</fpage>
          -
          <lpage>4928</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref129">
        <mixed-citation>
          [129]
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. D.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Chakraborty,
          <article-title>EMVLight: A decentralized reinforcement learning framework for eficient passage of emergency vehicles</article-title>
          ,
          <source>in: Proceedings of the 36th AAAI Conference on Artificial Intelligence</source>
          , AAAI '
          <fpage>22</fpage>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref130">
        <mixed-citation>
          [130]
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>Dynamic queue-jump lane for emergency vehicles under partially connected settings: A multi-agent deep reinforcement learning approach</article-title>
          , arXiv preprint (
          <year>2021</year>
          ). arXiv:
          <year>2003</year>
          .01025.
        </mixed-citation>
      </ref>
      <ref id="ref131">
        <mixed-citation>
          [131]
          <string-name>
            <given-names>N.</given-names>
            <surname>Nahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kästner</surname>
          </string-name>
          ,
          <article-title>Collaboration challenges in building ML-enabled systems: Communication, documentation, engineering, and process</article-title>
          ,
          <source>in: Proceedings of the 44th International Conference on Software Engineering, ICSE '22</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref132">
        <mixed-citation>
          [132]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pineau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent-Lamarre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Larivière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beygelzimer</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>d'Alché-</article-title>
          <string-name>
            <surname>Buc</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <article-title>Improving reproducibility in machine learning research (a report from the NeurIPS 2019 Reproducibility Program)</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>