<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>B. Hegde);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>MARL and control barrier functions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bharathkumar Hegde</string-name>
          <email>hegdeb@tcd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Melanie Bouroche</string-name>
          <email>melanie.bouroche@tcd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Multi-Agent Reinforcement Learning (MARL)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Multi-Agent Systems (MAS)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deep learning</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Intelligent</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Transportation System (ITS)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Statistics, Trinity College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2085</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Connected and Autonomous Vehicles (CAVs) are expected to improve road safety and trafic eficiency in the near future. Recently, Multi-Agent Reinforcement Learning (MARL) algorithms have been applied to optimise lane change control decisions to improve the average speed of CAVs. The MARL algorithms, however, are limited by a lack of safety guarantees. Control Barrier Functions (CBFs) have been used for ensuring safety of a Reinforcement Learning (RL) agent performing safety-critical control tasks such as robotic navigation and autonomous driving. In this work, the CBF has been defined for a Multi-Agent System (MAS) of CAVs to ensure safety of a MARL lane change controller with three major contributions. The first is an architecture to integrate the high-level behavioural layer with a safe controller at the low-level motion planning layer. The high-level control layer implements a state-of-the-art MARL lane change controller, while the safe low-level motion planning layer constrains the vehicle to safe states using CBF functions. Secondly, multi-agent actor dependencies are defined to ensure that control decisions are made by CAVs in a specific order. Finally, decentralised CBF constraint formulations are defined to comply with the safety specifications. The proposed design, CBF-CAV, can guarantee safe manoeuvres while executing a behavioural control decision made by the MARL controller.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>As a result of the increasing trend in private vehicle ownership, there are over a billion vehicles in the
world’s motor fleet currently, and this is expected to continue growing in the near future [ 1]. This
trend is likely to cause increased congestion and road accidents. According to a report from the World
Economic Forum (2018), road congestion cost 87 billion dollars to the US economy in 2018 due to loss
of productivity [2]. Furthermore, a European Union (EU) report states that around 78% of road crashes
are considered to be a result of human errors [3]. To minimise congestion and improve trafic safety,
Autonomous Vehicles (AVs) are considered one of the main interventions in Intelligent Transportation
Systems (ITS) [4].</p>
      <p>AV technologies are evolving with the developments in communication technologies and Artificial
Intelligence (AI). Connected Autonomous Vehicles (CAVs) are leveraging recent advancements in
vehicular communication (V2X) technologies to make collaborated manoeuvres to improve trafic
safety and eficiency. AI has been a popular option to solve some of the complex problems in AV
technologies, such as localisation, mapping, perception, route planning, and motion control [5]. For
CAV motion controllers specifically, our previous work shows that Multi-Agent Reinforcement Learning
(MARL) is a popular choice [6]. Lane changing is one of the complex problems in motion control, as
improper lane change may cause a collision that could damage the costly components in AVs or even
cause loss of lives. Many forms of MARL using Deep Q-Networks (DQNs) [7, 8, 9], and Actor-Critic
Networks (ACN) [10, 11, 12] has been applied for designing lane change controllers. Among them,
MARL-CAV [12] is an open-sourced state-of-the-art MARL lane change controller designed for CAVs
https://www.scss.tcd.ie/Melanie.Bouroche/ (M. Bouroche)</p>
      <p>CEUR</p>
      <p>ceur-ws.org
[13]. The MARL-CAV significantly improves trafic eficiency and safety. This approach, however, uses
a predication-based priority assignment to avoid collisions and encourage safe behaviour, and therefore
safety is not guaranteed, which limits its applicability.</p>
      <p>Our previous work identifies that Control Barrier Functions (CBFs) are suitable for ensuring the safety
of CAV lane change controllers [13]. CBFs have been recently applied to ensure the safe operation
of Reinforcement Learning (RL) based single-agent AV controllers [14]. This CBF implementation
demonstrates a longitudinal safety constraint in a simple scenario. The CBF can be formulated by
considering dynamic safety constraints relative to the surrounding vehicles [15]. This single-agent
CBF, however, assumes that other agents make worst-case decisions. Such a safety constraint results in
conservative behaviour, negatively afecting trafic eficiency.</p>
      <p>Overall, CAV lane change controllers can be designed using MARL to improve trafic eficiency, but
they do not ensure safety. This design aims to integrate the CBF safety constraints with the MARL-based
lane change controller [12] to ensure safety by considering multi-agent vehicular dynamics to design
safety constraints. The main contributions of this work are:
• The architecture to integrate CBF constraints to the high-level MARL-based lane change
controllers (Section 3).
• The structure for defining the dynamics of multi-agent interaction between CAVs (Section 4.2).
• The specifications and formulations of the CBF constraints to ensure the safety of CAVs (Section 4.1
and Section 4.3).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>The background details related to AV control hierarchy, vehicle dynamics, RL, and CBFs are provided
in this section. First, the scope of this research is explained based on control hierarchies. Next, the
kinematic bicycle model is explained, and the assumptions related to vehicle dynamics are outlined.
Then, notations used for RL formulations are discussed. Finally, a general form of a CBF is defined
along with an optimisation problem for evaluating the safe control inputs.</p>
      <sec id="sec-2-1">
        <title>2.1. Hierarchy of control layers</title>
        <p>The control decisions of AVs can be separated into four hierarchical levels such as route planning,
behavioural layer, motion planning, and local feedback control [16]. The route planning layer first
identifies a feasible route to the destination provided by the user using the road network information.
The route generated from this layer consists of a sequence of waypoints. While moving along these
waypoints, the behavioural layer makes high-level driving decisions such as following a lane, performing
a lane change, negotiating at the intersection, or moving in an unstructured environment. The motion
planning layer generates reference control actions, such as acceleration and steering, to execute a
specific manoeuvre from the high-level decision. In the last layer, a local feedback controller performs
necessary actuation, such as steering, throttling, and braking, to follow the control references.</p>
        <p>The lane change controller can be developed by engineering high-level behavioural and low-level
motion planning layers. Specifically, a behavioural layer can be designed to make discrete decisions to
change lanes or follow the current lane. Based on this decision, the motion planning layer can identify
the control references to execute the desired driving manoeuvre. Therefore, this article mainly focuses
on designing these two layers in the AV control hierarchy.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Vehicle dynamics</title>
        <p>In this article, the kinematic bicycle model is considered to define vehicle dynamics. This model
considers the two wheels in the front as one wheel and the same for the back wheels, as illustrated in
Figure 1. The distance between the front and back wheels is denoted as   . The vehicle’s position is
defined using  and  , longitudinal and lateral coordinates along the road. The vehicle’s velocity ( ) is
controlled by adjusting the acceleration input ( 1), and the steering angle ( ) is controlled by adjusting
the steering velocity ( 2). The steering velocity represents the rate of change in steering angle with
time [17]. The steering angle is considered the same as the angle of the front wheels with respect to
the current heading of the vehicle ( ). The equations for the kinematic bicycle model that assumes the
centre of gravity ( ) on the axle with equal distance from the front and back wheels can be written as
[18],
where  is a slip angle at the centre of gravity:</p>
        <p>=̇   ,  =̇   ,  ̇ =  1 cos ( + ),
 ̇ =  1 sin ( + ),
 
 sin ,</p>
        <p>=̇  2
 =̇</p>
        <p>
          2
 = tan−1 ( 1 tan  )
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Reinforcement learning</title>
        <p>RL is a computational approach for learning a sequence of actions to achieve a specific goal. The RL
problem is formulated using a Markov Decision Process (MDP) defined by the tuple
( ,  ,  , ℛ,  )
. 
is the state space, the set of state variables that an agent can observe. An agent observes a state   ∈ 
at a time step  .  is the action space consisting of the set of actions that an agent can perform. At a
time step  , an agent performs an action   ∈  .  ∶  ×  ×  → [0, 1]
is the state transition function
that defines the likelihood of changes in the state observed from the environment based on an action
  ∈  . ℛ ∶  ×  ×  → ℝ</p>
        <p>is a reward function that defines the agent’s goal. At a time step  , the
agent receives a reward   , which is a real number calculated for a transition from the previous state
to the current state through an action. The reward function formulation plays a vital role in defining
agents’ behaviour in the system.  ∈ (0, 1] is a discount factor used to define the discounted reward   ,
  =
∞
∑</p>
        <p>=+1
The discounted reward can provide a measure to choose an action that has higher probability of getting
better rewards in the future. Using the discounted reward, a state-action value function, also known
as the Q-value, can be derived for a policy. A policy  is a mapping from states to the probability
of selecting possible actions. The Q-function under policy  provides an expected future reward by
choosing action   from state   . It can be defined as,</p>
        <p>(  ,   ) = [  |  ,   ]</p>
        <p>For simple problems with a small number of possible states and actions, Q-values can be calculated
based on the transition probability</p>
        <p>using the following equation:
  (  ,   ) = ∑  (  ,   ,  +1 )[  +  max  ∗( +1 ,  +1 )]
 . Therefore, approximation methods are
usually used to find a policy to achieve higher rewards in such tasks. These approximations can be
implemented using deep neural networks [19]. Deep RL (DRL) approximation algorithms achieved
impressive results in playing Atari games [20]. Some of the recent RL approximation algorithms include
Deep Q-Networks (DQN) and policy gradient methods, such as Actor-Critic Networks (ACNs). The
open-source ACN algorithms such as PPO [21] and ACKTR [22] have been developed and published on
repositories like StableBaselines-3 [23] and OpenAI baselines [24]. These algorithms are applied to solve
optimisation problems in various research areas, including manufacturing, robotics, large language
models, and autonomous vehicles.</p>
        <p>The RL algorithms extended to MAS considering various forms of learning and control components
are known as MARL [25]. The learning components learn an approximate optimal policy, and the
control components execute that policy. These components are integrated into an agent in single-agent
tasks such as a robot cleaning the house. Many real-world tasks, however, can be considered to be MAS,
as multiple agents may need to work together in the same environment. For example, systems such
as multiplayer online games, cooperative robots in factories, trafic control systems, and CAVs can be
considered as MAS [26]. However, MARL applications are limited to non-safety-critical tasks, as they
can not ensure safety because of the blackbox property [27]. To overcome this limitation, CBF safety
constraints can be used.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Control barrier functions</title>
        <p>
          Consider a discrete time nonlinear control system defined by the following transition dynamics
 =̇  (  ) + (  ) 
where change in state variables  ̇ per unit time is defined using unactuated dynamics  ∶  →  , and
actuated dynamics  ∶  → ℝ , ,  and  are number of variables in the state space  and action space
 respectively,   ∈  is the system state, and   ∈  is the control action at time step  . The  and  are
defined based on known system dynamics and they are locally Lipschitz continuous, in other words,
continuous functions limited by a maximum rate of change. For example, the kinematic bicycle model
defined in equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) can be defined as a discrete time nonlinear control system (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) as follows,
 =̇ ⎢
⎡
⎢ 
⎢
⎢
⎢  
⎣



0
0
0
⎢  sin  ⎥
⎤
⎥
⎥
⎥
⎥
⎦
⎡
⎢
⎢
⎢
⎣
0
0
0
0
⎥ + ⎢
⎢cos ( + ) 0 ⎥⎥ [ 1
⎢sin ( + ) 0 ⎥  2
        </p>
        <p>]
0</p>
        <p>⎤
0⎥
⎥
⎥
0
1⎦
 ∶ {  ∈  ∶ ℎ(  ) ≥ 0}</p>
        <p>Consider a safe set  defined as the super-level set of the continuously diferentiable function
ℎ ∶  → ℝ</p>
        <p>
          To ensure the safety of the control system (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), the safe set  must be forward invariant. When  is
forward invariant, safe actions can be defined for each state   ∈  such that the system continues to
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
stay in  . The safe set  is considered invariant if the function ℎ is a Control Barrier Function (CBF) such
that there exists  ∈ [0, 1] for all   ∈  satisfying the following equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
sup [ℎ ( (  ) + (  )  ) + ( − 1)ℎ(  )] ≥ 0
  ∈
where  defines the magnitude at which the system is pushed within the safe set  [14]. Using smaller
values of  can enforce the constraints strictly, whereas higher values can relax the constraints. Therefore,
 represents how strongly the barrier function pushes the states inwards within  . The existence of a
CBF implies that for all   ∈  , there exist   such that  is forward invariant [28]. Therefore, the goal is
to find a minimal safe action  cbf that satisfies (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) to ensure the safety of a control system (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ).
        </p>
        <p>Let us consider the afine barrier function of the form</p>
        <p>
          ℎ(  ) =  T  + 
where  ∈ ℝ  and  ∈ ℝ are the parameters used to define a safety constraint ℎ on the state   . Combining
the afine barrier function with the condition (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ), the following constraint can be defined for the control
action   ,
        </p>
        <p>− T(  )  ≤  T (  ) +  T( − 1)  +</p>
        <p>To consider multiple safety constraints defined using CBFs,  can be considered as the intersecting
half spaces defined by  afine barrier functions [ 15]. The afine constraint on   can be defined by
stacking all the constraints.</p>
        <p>≤ ,
where,  = [ 1,  2, ...,   ], with   = − T(  )</p>
        <p>= [ 1,  2, ...,   ], with   =  T (  ) +  T( − 1)  +</p>
        <p>
          This constraint can be used to reformulate the CBF given by (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) into the following optimisation
problem
  = arg min ||  ||2
        </p>
        <p>s.t   ≤ ,
which can be eficiently solved in each time step using quadratic program [ 29].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Integrating CBF with MARL for highway merging</title>
      <p>In the AV control hierarchy (Section 2.1), the safety constraints can be integrated with the motion
planning layer to ensure safe lane change manoeuvres [30]. The safety constraints act as a shield to
override the control decisions from the motion planning layer to ensure that a vehicle stays in a safe
state [31]. The architecture to integrate the MARL behavioural layer, motion planning layer, and safety
constraints is presented in this section to develop a safe MARL lane change controller, illustrated in
Figure 2.</p>
      <sec id="sec-3-1">
        <title>3.1. MARL behavioural layer</title>
        <p>
          A vehicle, referred to as the ego vehicle, makes behavioural decisions based on its state information,
measured by onboard sensors such as LIDAR, RADAR, Camera, GPS, and IMU, as well as information
about the states of the surrounding  vehicles. The ego vehicle can decide whether to change lanes,
follow the lane, speed up, or slow down [12]. Since the ego vehicle can observe vehicles within the
range of vehicular communication (V2X), the previously defined MDP (in Section 2.3) can be extended
as a Partially Observable MDP(POMDP) for this MARL application. Moreover, V2X is assumed to be a
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
(10)
perfect communication interface without any delays or packet drops. The MARL formulation defined
in this section is similar to the MARL-CAV formulation, [12], with minor changes in the state space and
reward function.
        </p>
        <p>The state space   of a vehicle  consists of state variables including,
•  : The longitudinal position of the vehicle.
•  : The lateral position of the vehicle.
•   : The longitudinal velocity of the vehicle.
•   : The lateral velocity of the vehicle.</p>
        <p>•  : The vehicle heading with respect to the road.</p>
        <p>These variables are observed with respect to a global coordinate system, while observed vehicle
state variables are relative to the ego vehicle. As the ego vehicle observes states from  surrounding
vehicles, the overall multi-agent state space is defined as a Cartesian product of the individual states,
 =  0 ×  1 ×  2 × ... ×   . The  = 5 is observed to achieve the best performance [12]. We have added
the heading  to the state variables considered by MARL-CAV to capture the lane change intentions of
the CAVs, which ensures safety.</p>
        <p>The action space  is the same as defined in MARL-CAV, which consists of five discrete variables
representing a specific behaviour, namely, right lane change, left lane change , follow lane, speed up, slow
down. The behavioural layer chooses one of these high-level actions. The low-level controller explained
in Section 3.2 further executes these decisions.</p>
        <p>The reward function constitutes rewards for avoiding collision   , maintaining desirable speed   ,
maintaining desirable headway  ℎ, and feedback from the CBF evaluation   along with an associated
weight  ∗ for each reward component. These weights can be tuned to prioritise the CAV objectives.
Therefore, the reward for a CAV  at time  is defined as</p>
        <p>, =     +     +  ℎ ℎ +</p>
        <p>The feedback from the CBF evaluation,   , is an additional component added to the reward formulation
used in MARL-CAV. This can reward the agent for staying in the safe state, which minimises the control
overrides required from the CBF layer. Further, this reward encourages the agent to explore within the
safe states [14]. Note that the reward  , is the reward associated with an individual agent. To achieve
collaborative goals, MARL-CAV combines rewards from the surrounding agents to define a local reward
as
 , = 1</p>
        <p>=0</p>
        <p>∑  ,</p>
        <p>The MARL-CAV is demonstrated with multiple RL algorithms, such as Multi-Agent extensions (MA*)
of PPO [21], ACKTR [22], and DQN [20], namely MAPPO, MAACKTR, and MADQN. The multi-agent
extension of these algorithms is inspired by the parameter sharing approach proposed in the
MultiAgent Actor Critic (MA2C) RL algorithm [32]. Among them, MAPPO performed best compared to
other algorithms in Chen et al. 2023’s MARL benchmark analysis [12]. Therefore, this work considers
the MAPPO algorithm to train the high-level behavioural layer.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Motion planning layer</title>
        <p>For the low-level motion planning layer, a Proportional-Integral-Derivative (PID) controller can be
used to generate control actions, such as acceleration and steering velocity, to execute a behavioural
command (defined in Section</p>
        <p>3.1). Because of its simplicity, the PID controller can generate control
actions in real time. Moreover, it does not require any pre-defined model. While it is possible to
integrate the behavioural and motion planning layer using learning-based methods to design end-to-end
controllers, they have been criticised for the dificulty in training policies to perform complex tasks
[33]. Especially for autonomous driving tasks with dynamic surroundings, end-to-end controllers
sufer from poor sample eficiency, resulting in high resource requirements [
27]. Another option to
integrate the high-level and low-level control layers with learning-based methods is to use hierarchical
RL [34]. However, this approach is dificult to reproduce because of the complex training process. Other
model-based approaches, such as Model Predictive Controller (MPC), require a model for generating
low-level control actions [35]. Estimating such a model for generating CAV control actions in a complex
scenario can be dificult. Therefore, the PID controller is a viable option for the low-level control layer
along with the high-level MARL controller.
must be constrained to ensure safety.</p>
        <p>Since the high-level MARL controller is not guaranteed to make safe control decisions, the low-level
controller can generate unsafe control actions. The low-level control action  ll at time  generated from
the PID controller aims to execute the high-level behavioural decision. Therefore, the control action  ll</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Safety with CBF shield</title>
        <p>ll with a correction  
control action   can be defined as
Safety of a control system can be ensured by overriding the possibly unsafe lower-level control action
cbf to comply with safety constraints defined using CBFs [ 14]. Therefore, the final</p>
        <p>
          =  ll +  cbf
With the updated definition for the action   (11), the constraints defined in (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) can be updated to modify
the optimisation problem (10) as follows
 cbf = arg min ||
        </p>
        <p>cbf||2
 cbf
s.t</p>
        <p>cbf ≤  ll,
ll
where  ll = [ 1
,  2ll
, ...,  ll], with  ll =  T (  ) +  T( − 1)  +   +  T(  ) 
ll
ensure the safety of the system.</p>
        <p>In the above optimisation problem (12),  
cbf is optimised to evaluate the minimal correction required to
(11)
(12)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Decentralised CBF for CAVs</title>
      <p>In the previous sections, the CBF ℎ(  ) has been defined based only on the ego vehicle’s state. The
safety constraints defined in this section consider MAS dynamics because CAVs depend on the control
decisions of other vehicles to ensure their own safety. These safety constraints are defined for pure
CAV trafic. The extension of the safety constraints to mixed CAV trafic, consisting of vehicles with
varying levels of autonomy and connectivity, is left for future work. In this section, specifications for
decentralised CBFs are defined first (Section 4.1). Then, the multi-agent actor dependencies are defined
for CAVs(Section 4.2). In the end, CBF formulations are defined based on the actor dependencies to
comply with the specifications (Section 4.3).</p>
      <sec id="sec-4-1">
        <title>4.1. Specifications</title>
        <p>The following specifications are considered to formulate decentralised CBFs for CAVs:
1. Ensure the safety of all CAVs in a MAS.
2. Safe acceleration control to avoid collision with the preceding vehicle.
3. Safe steering control to avoid collisions during lane change manoeuvres.
4. Respect the CAV controller’s physical limits.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Multi-agent actor dependencies</title>
        <p>As CAVs can share their states with the surrounding vehicles, the state   of the ego vehicle constitutes
its own ego states,  e, and observed states,  o, of observed vehicles. With this consideration, a dynamic
CBF ℎ for two CAVs can be defined as</p>
        <p>ℎ(  ) = ℎ(e) + ℎ(o)
 e e +  o o ≤  e +  o</p>
        <p>Notice that each term in the previously-defined optimisation constraint ( 12) is defined based on the
state and the action variables from a single agent. For MAS, each term can be separated into variables
associated with the ego vehicle ∗e and the observed vehicle ∗o as follows
where  e and  e are equivalent to  and  ll (from(12)), but evaluated using the state   and the action  ll
associated with the ego vehicle. Similarly,  o and  o are derived from the state and the action variables
associated with the observed vehicle.</p>
        <p>The safe action for an ego vehicle can be obtained by optimising the minimum control correction  e,
assuming that the observed vehicle shares its state variables and safe control decisions. Therefore, the
multi-agent constraint in (14) is modified to update the quadratic program ( 12) for optimising  e ,
 e = arg min || e||2, s.t  e e ≤  ma, and  ma =  e +  o −  o o
 e
(15)</p>
        <p>The actor dependency exists between ego vehicle and observed vehicles as the term  ma in the above
equation (15) requires the observed vehicles to make their control decision,  o, before the ego vehicle.
Based on the ego vehicle’s high-level behaviour, the observed vehicles are identified to be considered in
CBF constraints.</p>
        <p>If the ego vehicle is following a lane, its control action depends on the immediate leading vehicle
within the communication range, as illustrated in Figure 3a. In this case, the longitudinal distance
between the ego and the leading vehicle Δ l must be constrained to ensure safety. If a vehicle in the
adjacent lane intends to change lanes to the ego vehicle’s current lane behind the immediate leading
vehicle, the ego vehicle’s decision depends on the adjacent vehicle, as shown in Figure 3b. In this case,
the longitudinal and lateral distances, Δ a and Δ a, with the adjacent vehicle are constrained. As the ego
(13)
(14)
(a) Follow lane</p>
        <p>(b) Follow lane with a lane changing vehicle
(c) Lane changing vehicle
vehicle’s decision depends on the adjacent vehicle’s action, this dependency encourages collaborative
lane change behaviour among CAVs.</p>
        <p>While changing lanes, the ego vehicle depends on the actions of immediate leading vehicles in the
current lane and the adjacent target lane. Similar to the previous case, longitudinal distance from the
leading vehicle Δ l is constrained. Moreover, the longitudinal and lateral distances from the adjacent
vehicle, Δ a and Δ a, are constrained. This dependency is illustrated in Figure 3c.</p>
        <p>Collectively, the safety of all the CAVs can be ensured by following the actor dependency as an
individual CAV ensures safety with respect to its preceding vehicles. To honour this dependency,
CAVs in a MAS are assumed to make control decisions sequentially in the decreasing order of their
longitudinal position. Moreover, this actor dependency is applicable only to roads with two lanes. These
limitations will be addressed in future work.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. CBF formulation</title>
        <p>CBF constraints for CAVs, CBF-CAV, are defined to ensure safe longitudinal and lateral motion without
violating the physical control limits of the vehicles. This section defines CBFs for each type of safety
constraint, along with the conditions for their applicability.</p>
        <sec id="sec-4-3-1">
          <title>4.3.1. Longitudinal motion</title>
          <p>The safety constraint for longitudinal motion allows the ego vehicle to maintain a safe headway with a
preceding vehicle in the current lane. This constraint can be defined as,
where Δ l is the longitudinal distance from the rear end of the preceding vehicle and the front of the
ego vehicle. The  safe is the safe distance that must be maintained between two vehicles to ensure that
the following vehicle has enough time to slowdown if the leading vehicle start slowing down abruptly.
The safe distance threshold can be evaluated from the ego vehicle velocity  e and time headway  as
shown below,
ℎlon = Δ l −  safe
 safe =  ∗  e
(16)
(17)</p>
          <p>Both lane following and lane changing vehicles use this constraint, as the ego vehicle is expected to
maintain a safe distance from the leading vehicle in all driving scenarios. Moreover, the leading vehicle
must be within the ego vehicle’s communication range to enforce this constraint.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. Lateral motion</title>
          <p>This constraint ensures safety when a CAV moves laterally to change lanes. During the lane change,
a vehicle must maintain a safe distance  safe with a leading vehicle in the same lane. This constraint
can be enforced with a previously defined CBF ( 17). As the vehicle moves laterally, either a safe lateral
distance  safe or a safe longitudinal distance  safe must be maintained with a leading vehicle in the
adjacent lane. This constraint is defined as ℎlat
ℎlat =
Δ a +
 safe
Δ a − 1
 safe
 safe =   −  
where  safe is the same variable defined in equation ( 17),  safe is a constant defined based on lane width
  and vehicle width   to ensure comfortable lateral distance when the adjacent leading vehicle is
moving in parallel.</p>
          <p>A safe distance can be maintained with a vehicle in the adjacent lane with this constraint. Before
changing a lane, this constraint ensures that the ego vehicle maintains a safe lateral distance from
the adjacent vehicle. During the lane change, this constraint allows partial violations of lateral and
longitudinal constraints while maintaining suficient distance to avoid collision. During the execution
of lane change manoeuvre, the partial violations allow a vehicle to gradually reduce the lateral distance
Δ a while gradually increasing the longitudinal distance Δ a with the adjacent vehicle. The gradual
increase in longitudinal distance ensures that a safe distance is maintained after completing the lane
change manoeuvre. For example, if a vehicle in the adjacent lane is parallel to the ego vehicle, then the
lateral safe distance must be maintained, Δ a ≥  safe. In another case, if the adjacent vehicle is about
to enter the ego vehicle’s lane, then the ego vehicle must gradually increase the longitudinal distance,
such that Δ a ≥  safe when the adjacent vehicle enters the current lane. This constraint is applied to
lane changing vehicles and lane following vehicles if they are obstructed by the adjacent leading vehicle
These constraints are applied to all CAVs, as they must be honoured in all scenarios.
(18)
(19)
(20)
(21)
changing lanes.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>4.3.3. Control limits</title>
          <p>Given that vehicle control inputs are subject to physical limitations, they must be constrained. The
angle is one of the state variables, hence the constraints are defined using CBFs.
physical constraints are defined on the steering angle (  ), which is constrained within the range
[- max,  max]. This physical constraint is defined using two CBFs ℎmax and ℎmin. Note that the steering



ℎmax =  −  max
ℎmin = − −  max
− 1max ≤  1 ≤  1max
enforced by including it in the constraints of the optimisation problem defined in ( 12).</p>
          <p>The acceleration range of the vehicle is defined as [ − 1max,  1max]. This physical constraint can be</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The proposed CBF formulations, CBF-CAV, integrated with the behavioural layer with MARL lane
change controller can have minimal impact on the eficiency, because the CBF constraints override the
actions only when a vehicle is about to go towards unsafe states. Moreover, by restricting agents to the
safe states, the MARL controller can be trained eficiently by exploring the safe states only. Furthermore,
the actor dependencies are used to consider MAS dynamics in the CAV trafic. These constraints ensure
both lateral and longitudinal safe motions for all CAVs in a trafic scenario.</p>
      <p>The provided formulations are suitable for pure CAV trafic, where a lower safe distance can be used.
In the future, this can be extended to mixed trafic with dynamic CBFs to maintain higher safe distances
with human driven vehicles, assuming they take worst-case control decisions.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors wish to thank the editors and anonymous reviewers for their valuable comments and
helpful suggestions which greatly improved the paper’s quality. This work was supported by the SFI
Centre for Research Training in Advanced Networks for Sustainable Societies (ADVANCE CRT), Ireland
under the Grant number 18/CRT/6222.
com/doi/abs/10.1111/mice.12702. doi:10.1111/mice.12702, _eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1111/mice.12702.
[10] W. Zhou, D. Chen, J. Yan, Z. Li, H. Yin, W. Ge, Multi-agent reinforcement learning for
cooperative lane changing of connected and autonomous vehicles in mixed trafic, Autonomous
Intelligent Systems 2 (2022) 5. URL: https://doi.org/10.1007/s43684-022-00023-5. doi:10.1007/
s43684-022-00023-5.
[11] Y. Hou, P. Graf, Decentralized Cooperative Lane Changing at Freeway Weaving Areas Using
Multi-Agent Deep Reinforcement Learning, arXiv:2110.08124 [cs] (2021). URL: http://arxiv.org/
abs/2110.08124, arXiv: 2110.08124.
[12] D. Chen, M. R. Hajidavalloo, Z. Li, K. Chen, Y. Wang, L. Jiang, Y. Wang, Deep Multi-Agent
Reinforcement Learning for Highway On-Ramp Merging in Mixed Trafic, IEEE Transactions
on Intelligent Transportation Systems (2023) 1–16. doi:10.1109/TITS.2023.3285442, conference
Name: IEEE Transactions on Intelligent Transportation Systems.
[13] B. Hegde, M. Bouroche, Multi-agent reinforcement learning for safe lane changes by connected
and autonomous vehicles: A survey, AI Communications 37 (2024) 203–222. URL: https://content.
iospress.com/articles/ai-communications/aic220316. doi:10.3233/AIC-220316, publisher: IOS
Press.
[14] R. Cheng, G. Orosz, R. M. Murray, J. W. Burdick, End-to-End Safe Reinforcement Learning
through Barrier Functions for Safety-Critical Continuous Control Tasks, Proceedings of the AAAI
Conference on Artificial Intelligence 33 (2019) 3387–3395. URL: https://ojs.aaai.org/index.php/
AAAI/article/view/4213. doi:10.1609/aaai.v33i01.33013387, number: 01.
[15] X. Wang, Ensuring Safety of Learning-Based Motion Planners Using Control Barrier Functions,
IEEE Robotics and Automation Letters 7 (2022) 4773–4780. doi:10.1109/LRA.2022.3152313,
conference Name: IEEE Robotics and Automation Letters.
[16] B. Paden, M. Čáp, S. Z. Yong, D. Yershov, E. Frazzoli, A Survey of Motion Planning and Control
Techniques for Self-Driving Urban Vehicles, IEEE Transactions on Intelligent Vehicles 1 (2016)
33–55. doi:10.1109/TIV.2016.2578706, conference Name: IEEE Transactions on Intelligent
Vehicles.
[17] A. De Luca, G. Oriolo, C. Samson, Feedback control of a nonholonomic car-like robot, in:
M. Thoma, J. P. Laumond (Eds.), Robot Motion Planning and Control, volume 229, Springer
Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 171–253. URL: http://link.springer.com/10.
1007/BFb0036073. doi:10.1007/BFb0036073.PDFfoundinhttps://www.di.ens.fr/jean-paul.
laumond/promotion/chap4.pdf, series Title: Lecture Notes in Control and Information Sciences.
[18] M. Althof, M. Koschi, S. Manzinger, CommonRoad: Composable benchmarks for motion planning
on roads, in: 2017 IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 719–726. URL: https:
//ieeexplore.ieee.org/document/7995802. doi:10.1109/IVS.2017.7995802.
[19] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, second edition ed., MIT press,
2018. Edition: Second edition. Publisher: MIT Press,.
[20] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M.
Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King,
D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep
reinforcement learning, Nature 518 (2015) 529–533. URL: http://www.nature.com/articles/nature14236.
doi:10.1038/nature14236, number: 7540 Publisher: Nature Publishing Group.
[21] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy
Optimization Algorithms, 2017. URL: http://arxiv.org/abs/1707.06347. doi:10.48550/arXiv.1707.06347,
arXiv:1707.06347 [cs].
[22] Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, J. Ba, Scalable trust-region method for deep reinforcement
learning using Kronecker-factored approximation, in: Advances in Neural Information Processing
Systems, volume 30, Curran Associates, Inc., 2017.
[23] A. Rafin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-Baselines3: Reliable
Reinforcement Learning Implementations, Journal of Machine Learning Research 22 (2021) 1–8.</p>
      <p>URL: http://jmlr.org/papers/v22/20-1364.html.
[24] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu,
P. Zhokhov, OpenAI Baselines, 2017. URL: https://github.com/openai/baselines, publication Title:
GitHub repository.
[25] J. K. Terry, N. Grammel, S. Son, B. Black, Parameter Sharing For Heterogeneous Agents in
MultiAgent Reinforcement Learning, 2022. URL: http://arxiv.org/abs/2005.13625. doi:10.48550/arXiv.
2005.13625, arXiv:2005.13625 [cs, stat].
[26] T. T. Nguyen, N. D. Nguyen, S. Nahavandi, Deep Reinforcement Learning for Multiagent Systems:
A Review of Challenges, Solutions, and Applications, IEEE Transactions on Cybernetics 50
(2020) 3826–3839. doi:10.1109/TCYB.2020.2977374, conference Name: IEEE Transactions on
Cybernetics.
[27] S. Teng, X. Hu, P. Deng, B. Li, Y. Li, Y. Ai, D. Yang, L. Li, Z. Xuanyuan, F. Zhu, L. Chen, Motion
Planning for Autonomous Driving: The State of the Art and Future Perspectives, IEEE Transactions
on Intelligent Vehicles 8 (2023) 3692–3711. doi:10.1109/TIV.2023.3274536, conference Name:
IEEE Transactions on Intelligent Vehicles.
[28] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, P. Tabuada, Control Barrier
Functions: Theory and Applications, in: 2019 18th European Control Conference (ECC), 2019, pp.
3420–3431. URL: https://ieeexplore.ieee.org/abstract/document/8796030. doi:10.23919/ECC.2019.
8796030.
[29] S. P. Boyd, L. Vandenberghe, Convex optimization, version 29 ed., Cambridge University Press,</p>
      <p>Cambridge New York Melbourne New Delhi Singapore, 2004.
[30] J. Li, L. Sun, J. Chen, M. Tomizuka, W. Zhan, A Safe Hierarchical Planning Framework for Complex
Driving Scenarios based on Reinforcement Learning, in: 2021 IEEE International Conference on
Robotics and Automation (ICRA), 2021, pp. 2660–2666. URL: https://ieeexplore.ieee.org/abstract/
document/9561195. doi:10.1109/ICRA48506.2021.9561195, iSSN: 2577-087X.
[31] I. Elsayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, L. Feng, Safe Multi-Agent
Reinforcement Learning via Shielding, 2021. URL: http://arxiv.org/abs/2101.11196. doi:10.48550/arXiv.
2101.11196, arXiv:2101.11196 [cs].
[32] K. Lin, R. Zhao, Z. Xu, J. Zhou, Eficient Large-Scale Fleet Management via Multi-Agent Deep
Reinforcement Learning, in: Proceedings of the 24th ACM SIGKDD International Conference
on Knowledge Discovery &amp; Data Mining, KDD ’18, Association for Computing Machinery, New
York, NY, USA, 2018, pp. 1774–1783. URL: http://doi.org/10.1145/3219819.3219993. doi:10.1145/
3219819.3219993.
[33] A. Tampuu, T. Matiisen, M. Semikin, D. Fishman, N. Muhammad, A Survey of End-to-End Driving:
Architectures and Training Methods, IEEE Transactions on Neural Networks and Learning Systems
33 (2022) 1364–1384. doi:10.1109/TNNLS.2020.3043505, conference Name: IEEE Transactions
on Neural Networks and Learning Systems.
[34] J. Duan, S. Eben Li, Y. Guan, Q. Sun, B. Cheng, Hierarchical reinforcement learning for
selfdriving decision-making without reliance on labelled driving data, IET Intelligent Transport
Systems 14 (2020) 297–305. URL: https://onlinelibrary.wiley.com/doi/abs/10.1049/iet-its.2019.0317.
doi:10.1049/iet-its.2019.0317.
[35] Z. Zhao, Z. Wang, G. Wu, F. Ye, M. J. Barth, The State-of-the-Art of Coordinated Ramp Control
with Mixed Trafic Conditions, in: 2019 IEEE Intelligent Transportation Systems Conference
(ITSC), 2019, pp. 1741–1748. doi:10.1109/ITSC.2019.8917067.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] WHO,
          <source>Global status report on road safety 2023</source>
          ,
          <string-name>
            <surname>Technical</surname>
            <given-names>Report</given-names>
          </string-name>
          , WHO,
          <year>2023</year>
          . URL: https://www. who.
          <source>int/publications-detail-redirect/9789240086517.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>WEF</surname>
          </string-name>
          ,
          <article-title>Trafic congestion cost the US economy nearly $87 billion in 2018</article-title>
          ,
          <source>Technical Report</source>
          , World Economic Forum,
          <year>2019</year>
          . URL: https://www.weforum.org/agenda/2019/03/ traffic-congestion
          <article-title>-cost-the-us-economy-</article-title>
          <string-name>
            <surname>nearly-</surname>
          </string-name>
          87
          <string-name>
            <surname>-</surname>
          </string-name>
          billion-in-2018/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <article-title>Road safety in the EU: fatalities below pre-pandemic levels but progress remains too slow</article-title>
          , European Commission - European
          <string-name>
            <surname>Commission</surname>
          </string-name>
          (
          <year>2023</year>
          ). URL: https://ec.europa.eu/ commission/presscorner/detail/en/ip_23_
          <fpage>953</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Commission</surname>
          </string-name>
          , D.-G. for
          <article-title>Mobility andTransport, Next steps towards 'Vision Zero' - EU road safety policy framework 2021-2030</article-title>
          ,
          <string-name>
            <given-names>Publications</given-names>
            <surname>Ofice</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://data.europa.eu/doi/10. 2832/391271. doi:doi/10.2832/391271.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence applications in the development of autonomous vehicles: a survey</article-title>
          ,
          <source>IEEE/CAA Journal of Automatica Sinica</source>
          <volume>7</volume>
          (
          <year>2020</year>
          )
          <fpage>315</fpage>
          -
          <lpage>329</lpage>
          . doi:
          <volume>10</volume>
          . 1109/JAS.
          <year>2020</year>
          .
          <volume>1003021</volume>
          , conference Name: IEEE/CAA Journal of Automatica Sinica.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bouroche</surname>
          </string-name>
          ,
          <article-title>Design of AI-based lane changing modules in connected and autonomous vehicles: a survey</article-title>
          ,
          <source>in: Twelfth International Workshop on Agents in Trafic and Transportation</source>
          , Vienna,
          <year>2022</year>
          , p.
          <fpage>16</fpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3173</volume>
          /7.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Tan, Distributed Multiagent Coordinated Learning for Autonomous Driving in Highways Based on Dynamic Coordination Graphs</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>735</fpage>
          -
          <lpage>748</lpage>
          . doi:
          <volume>10</volume>
          .1109/ TITS.
          <year>2019</year>
          .
          <volume>2893683</volume>
          , conference Name:
          <source>IEEE Transactions on Intelligent Transportation Systems.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Steinfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Labi</surname>
          </string-name>
          ,
          <article-title>Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment</article-title>
          ,
          <source>Transportation Research Part C: Emerging Technologies</source>
          <volume>128</volume>
          (
          <year>2021</year>
          )
          <article-title>103192</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S0968090X21002084. doi:
          <volume>10</volume>
          .1016/j.trc.
          <year>2021</year>
          .
          <volume>103192</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. Y. J.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Labi</surname>
          </string-name>
          ,
          <article-title>Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles</article-title>
          ,
          <source>ComputerAided Civil and Infrastructure Engineering</source>
          <volume>36</volume>
          (
          <year>2021</year>
          )
          <fpage>838</fpage>
          -
          <lpage>857</lpage>
          . URL: http://onlinelibrary.wiley.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>