<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Advancing Predictive Control: Insights from Maze Exploration Using Markov Decision Processes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robel Asgedom</string-name>
          <email>robelasgedom629@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Korobiichuk</string-name>
          <email>igor.korobiichuk@pw.edu.pl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ukraine Independent Researcher</institution>
          , Ł
          <addr-line>ódź</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Warsaw University of Technology</institution>
          ,
          <addr-line>plac Politechniki 1, 00-661, Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Predictive control plays a significant role in mobile robotics, especially in trajectory tracking, obstacle avoidance, and real-time decision-making. In this study, we explore how Markov Decision Processes (MDPs) can be integrated with predictive control to enhance navigation, particularly in maze-like environments. A case study on MDP-based maze exploration analyzes key system limitations, including computational complexity and real-time adaptability. While MDPs often struggle to adapt to dynamic environments, predictive techniques like Model Predictive Control (MPC) offer improvements in trajectory optimization and responsiveness. We also discuss practical applications in areas such as warehouse navigation and multirobot coordination, showing the benefits of combining MDPs and predictive control for robust performance in real-world scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Predictive Control</kwd>
        <kwd>Markov Decision Processes</kwd>
        <kwd>Maze Exploration</kwd>
        <kwd>Mobile Robots</kwd>
        <kwd>Trajectory Tracking1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Autonomous navigation is a fundamental capability in mobile robotics, allowing robots to traverse
complex and dynamic environments efficiently. Achieving accurate trajectory tracking and efficient
maze exploration is still challenging due to uncertainties in the environment, sensor limitations, and
computational constraints. Addressing these challenges requires robust decision-making frameworks
and control techniques.</p>
      <p>
        Predictive control techniques, particularly Model Predictive Control (MPC), have demonstrated
significant advantages in trajectory tracking and obstacle avoidance by enabling real-time
adjustments based on predicted future states [
        <xref ref-type="bibr" rid="ref1">1,2</xref>
        ]. Its structured approach has seen success in
autonomous driving, industrial automation, and robotic path planning, offering a structured approach
to real-time motion optimization while ensuring the satisfaction of the constraints. In parallel,
Markov Decision Processes (MDPs) offer a robust mathematical foundation for decision-making
under uncertainty, widely applied in navigation and mapping tasks [3,4].
      </p>
      <p>Successes of MDPs and MPC are well documented, but their integration in mobile robotics is still
underexplored. Existing studies primarily focus on standalone MDPs for decision-making or MPC for
trajectory optimization, yet few works have attempted to bridge the gap between these two methods.
Most of the literature on MDPs addresses static environments with predefined state transitions,
limiting their real-time adaptability. Although MPC offers dynamic control it lacks the high-level
policy optimization capabilities of MDPs. To overcome the limitations, the article examines
integrating MDPs with predictive control techniques. We aim to combine MDP-based
decisionmaking with the real-time adaptability of MPC to enhance mobile robot trajectory tracking in
dynamic and uncertain environments.</p>
      <p>This study extends previous work by analyzing the limitations of MDP-based maze exploration
and demonstrating how predictive control can address these challenges. We highlight the novelty of
our approach by reviewing existing literature and identifying gaps in current research. Researchers
have extensively studied individual applications of MDPs and MPC, yet their combined use to
enhance real-time adaptability and decision-making in maze exploration remains underexplored.
0009-0007-2330-3345 (R. Asgedom); 0000-0002-5865-7668 (I. Korobiichuk)
© 2025 Copyright for this paper by its authors.</p>
      <p>Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>Primarily, this article aims to contribute to this area by presenting a structured approach for
integrating predictive control with MDP-based systems.</p>
      <p>The rest of this paper is structured as follows. Section II reviews related work, analyzing existing
MDP and predictive control approaches in mobile robotics. Section III presents the case study,
discussing the implementation of MDPs for maze exploration. Section IV explores the integration of
predictive control techniques and their impact on real-time navigation. Finally, Section V outlines
future research directions and potential improvements in hybrid MDP-MPC frameworks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Markov Decision Processes in Robotics</title>
        <p>
          Markov Decision Processes (MDPs) provide a mathematical framework to model decision-making
problems in stochastic environments [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. An MDP is defined as a tuple (S, A, P, R, γ), where:
 S is a finite set of states representing the possible configurations of the environment.
 A is a finite set of actions available to the agent.
 P(s′|s, a) is the state transition probability, which defines the probability of reaching the state.
 R(s, a) is the reward function, which assigns a scalar reward to each state-action pair.
 γ ∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] is the discount factor, which determines the importance of future rewards.
        </p>
        <p>The objective in an MDP is to find an optimalpolicy π(s), which maps state to actions to maximize
the expected cumulative reward:</p>
        <p>V π ( s )= E [∑ γ t R ( st , at )],
∞
t=0
where Vπ(s) is the value function representing the expected reward when following policy π from
state s. The optimal policy π ∗ maximizes this value, often computed using Value Iteration or
Policy Iteration algorithms [15]:</p>
        <p>❑
V π¿ ( s )=max [∑ P ( s'|s , a) R ( st , at )+ γ t V π¿ ( s' )].</p>
        <p>a s'</p>
        <p>MDPs have been widely used in robotics for path planning, exploration, and navigation [3]. They
enable robots to compute optimal policies for sequential decision problems, making them particularly
effective for grid-world environments where the system must balance exploration and exploitation.</p>
        <p>However, one major limitation of MDPs is their computational complexity in real-time
applications, especially in large environments. Since MDPs rely on full knowledge of transition
probabilities and rewards, they struggle with dynamic environments where state transitions may
change unpredictably. This motivates the need for predictive control to enhance real-time
adaptability.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Predictive Control for Mobile Robots</title>
        <p>Predictive control, particularly Model Predictive Control (MPC), has emerged as a powerful approach
for real-time motion planning and trajectory tracking in robotics [2]. Unlike MDPs, which focus on
long-term reward optimization, MPC formulates an optimal control problem over a finite prediction
horizon and continuously updates actions based on real-time sensor data.</p>
        <p>MPC solves an optimization problem at each time step to minimize a cost function J while
satisfying system constraints:</p>
        <p>N
J =∑ [ xTk Q xk +uTk R uk ]</p>
        <p>k=0
where: J is the cost function,
xk represents the state vector at time step k,
uk represents the control input at time step k,
Q and R are weight matrices that penalize state deviation and control effort, respectively,
N is the prediction horizon.</p>
        <p>Following [14], we adapt the equation for this context.</p>
        <p>MPC predicts future states using the system dynamics:</p>
        <p>+1 =  (, ).</p>
        <p>Subject to constraints:</p>
        <p>≤  ≤ ,  ≤  ≤ .</p>
        <p>Having a predictive capability allows MPC to dynamically adjust robot actions, making it highly
effective for applications such as:
 Obstacle avoidance in dynamic environments [12].
 Multi-robot coordination, ensuring collision-free paths [13].
 Real-time trajectory planning in complex terrains [11].</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Combining MDPs and Predictive Control</title>
        <p>Although MDPs provide a structured approach for high-level decision-making, they lack adaptability
in real time. While MPC excels at short-term control and constraint handling, it lacks an inherent
ability to model long-term decision-making.</p>
        <p>Integrating MDPs with MPC leverages the advantages of both:
 MDPs generate an optimal policy for global navigation based on reward optimization.
 MPC executes the policy in real time while adapting to dynamic changes.</p>
        <p>Hybrid approach allows for robust decision-making and efficient trajectory execution, particularly
in dynamic maze exploration and autonomous navigation scenarios. The following sections explore
how this integration can enhance mobile robot performance.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Case Study: Maze Exploration with MDPs</title>
      <sec id="sec-3-1">
        <title>3.1. System Description</title>
        <p>A mobile robot explores a maze autonomously in a grid world environment, where each cell
represents a state. Markov Decision Process (MDP)-based algorithms define the transitions between
states, guiding the robot’s decision-making [4]. The objective is to enable efficient navigation from a
starting position to a goal while avoiding obstacles and optimizing movement based on predefined
rewards.</p>
        <p></p>
        <p>Hardware Setup: The physical robot consists of different components, as shown in Fig. 1,
mainly:
 Microcontroller: The ESP32 microcontroller processes the MDP algorithm and controls the
robot’s movement.
 Sensors:
1. Ultrasonic sensor: Obstacle detection relies on the HC-SR04 sensor.
2. Camera module: A separate Sony IMX298 camera module connects to the Raspberry
Pi using the MIPI CSI-2 interface, then transmits data to the ESP32 microcontroller via Wi-Fi
for processing.</p>
        <p>Object detection techniques were employed to distinguish the robot from the environment, and a
localization module processed this data for accurate mapping [7].</p>
        <p>
           Motors Driver: Dual-shaft DC motors with a motor driver for precise motion control over
movement [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Along with two caster wheels, a freely rotating wheel supports the robot’s weight
and enables smooth, multi-directional movement.
 Software Algorithm Implementation:
 MDP-Based Decision Making: The robot uses an MDP framework to determine optimal
actions in each state.
 Policy Iteration Value Iteration Algorithms: These methods compute the best navigation
policy based on state transitions and rewards.
 Localization Mapping: A vision-based system helps in state estimation and tracking the
robot’s movement.
 Combine a left-hand rule maze exploration algorithm to optimize performance and minimize
the robot's rotation time.
 Grid-World Representation:
        </p>
        <p>The environment is modeled as a 3x4 discrete grid-world maze with defined start, goal, and
obstacle states, as shown in Fig. 2. The goal was to determine an optimal policy for the robot to
navigate from the start state to the goal state while avoiding obstacles and maximizing rewards where:
1.Each cell represents a state (position in the maze). ∗ State transitions are probabilistic,
accounting for uncertainties in movement.
2.An agent assigns rewards to different states:
 +1 for reaching the goal,
 -1 for entering an obstacle,
 0 for intermediate steps</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental Results</title>
        <p>We conducted experiments in physical and virtual environments to validate the implementation of
MDP-based maze exploration. We tested the robot in a 3x4 grid-world maze and a more extensive
virtual 6x8 grid-world environment. The key results are summarized below:</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1. 3x4 Grid-World Environment</title>
        <p>In the physical setup, the robot successfully navigated the 3x4 maze, which associates one
obstacle(inaccessible) state and two terminal states where the episode ends (reward of +1 or -1) out of
the twelve states (cells of the grid) in total, achieving the following outcomes:
 Convergence of Policy: Figure 3 shows that the MDP policy iteration algorithm converged
after 11 iterations, demonstrating efficient policy computation in small environments [2].
 Optimal Navigation Path: The robot followed the computed optimal policy, avoiding obstacles
and reaching the goal state. The resulting path minimized cumulative costs and maximized
rewards.
 Localization Performance: Localization performance was enhanced by the vision-based
localization module, which accurately identified the robot's position in most cases and
facilitated smooth navigation.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2. 6x8 Grid-World Environment</title>
        <p>To evaluate the scalability of the proposed Markov Decision Process (MDP)-based exploration
strategy. We tested the system in a more enormous 6×8 virtual maze. The environment consists of 48
states, incorporating:
 There are nine obstacle (inaccessible) states, which refer to areas with obstacles (walls) where
the robot cannot traverse.
 There are three terminal states, each with assigned rewards: one positive goal state and two
negative penalty states.</p>
        <p>Virtually, the robot explored the maze using MDP as its primary algorithm to decide the motion
from the current cell to the next potential cell, along with the Left-hand Rule maze exploration
algorithm to guide the robot during unwanted maneuvers. The Maze exploration algorithm does not
affect either the optimal policy that emerged or the efficiency matrix. Overall, after a short time stamp,
the optimal policy generated the as shown in Fig. 4. The final policies for both setups are included to
provide a visual understanding which demonstrated the following key observations:
 Policy Convergence: The optimal policy was computed after 19 iterations, indicating
increased computational demands for larger environments [4]. Since the 6×8 grid-world is 4 times
larger than the 3×4 grid-world (48 states vs. 12 states), if the system scaled linearly, we would
expect 44 iterations. However, with the help of the maze exploration algorithm, the system
converged into 19 iterations instead of 44. The percentage optimization is 56.82%.
 Optimal Policy Map: The computed policy effectively directed the robot to navigate the maze
while avoiding prohibited cells. The policy map provided apparent direction vectors for each state.
As a result, the optimal policy demonstrates the final, accepted flow that guides the robot reaching
the goal state from any permissible cell in the maze.
 Efficiency Metrics: Increasing the maze size resulted in a corresponding increase in total
computation time, highlighting the necessity for optimization techniques to enhance performance
in larger-scale environments. A better strategy emerges from the need to achieve optimal flow
convergence in a maze containing various cell types.</p>
        <p>We included figures to illustrate the convergence plots, reward values, and final policies for both
setups, helping to provide a clear visual understanding of the results.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.3. Discussion</title>
        <p>The results demonstrate the effectiveness of MDP-based methods for maze exploration and
navigation. However, several challenges and opportunities for improvement were identified:
3.3.1. Strengths
 Policy Accuracy: The MDP algorithms generated reliable policies that guided the robot
effectively, even in complex environments.
 Scalability: The approach scaled well to larger mazes, demonstrating robustness in generating
optimal policies for various grid sizes.
 Flexibility: Integrating vision-based localization and sensor data enables the system to
successfully facilitate real-world navigation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Connecting MDPs to Predictive Control</title>
      <sec id="sec-4-1">
        <title>4.1. Advantages of Predictive Control for Mobile Robots</title>
        <p>
          Predictive control techniques, such as Model Predictive Control (MPC), have demonstrated significant
advantages in addressing real-time adaptability and constraint handling in mobile robotics. Unlike
MDPs, which focus on long-term decision-making through reward optimization, MPC excels in
shortterm trajectory planning by continuously predicting future states and adjusting control inputs
accordingly [
          <xref ref-type="bibr" rid="ref1">1,6</xref>
          ].
        </p>
        <p>Many regard MPC as one of the most effective methods for controlling autonomous systems under
constraints [9]. Its ability to incorporate physical limitations (e.g., motor torque, velocity) and
maintain smooth trajectories makes it a valuable complement to MDP-based approaches [2]. Its
predictive nature allows the system to compute optimal control actions at each step by solving a
constrained optimization problem [10]. It is particularly effective for dynamic environments where
robots must respond to changes such as moving obstacles or time-varying conditions [16].
Applications of MPC in mobile robotics include:
 Obstacle avoidance in dynamic environments plays a critical role in real-time navigation [11].
 Real-time trajectory planning for autonomous vehicles is crucial for ensuring safe and
efficient navigation [12].
 Coordinated control is essential for multi-robot systems to function optimally [13].</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Challenges in MDP-Based Systems</title>
        <p>While MDPs provide an optimal policy for high-level decision-making, they encounter several
limitations when applied to real-world robotic systems:</p>
        <p>• Real-Time Constraints: The iterative computation of policies in MDPs can lead to delays,
especially in larger environments, limiting their applicability for fast-changing scenarios [7].</p>
        <p>• Dynamic Environments: MDPs lack an inherent design for handling dynamic changes, such
as moving obstacles or sudden environment updates [2].</p>
        <p>• Trajectory Execution: Translating discrete state-action policies into smooth, continuous
motion trajectories can be challenging without additional control layers [9].</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Proposed Integration of MDPs and Predictive Control</title>
        <p>Integrating MDPs with predictive control offers a promising approach to leverage the strengths of
both methods [16]. The proposed framework involves:
 MDP for High-Level Planning: Use MDPs to generate optimal policies based on long-term
goals and rewards. These policies provide a high-level decision-making framework for robots [4].
 MPC for Low-Level Control: Employ MPC to execute the MDP-generated policies in real time,
ensuring smooth trajectory tracking and adherence to system constraints [8].
 Feedback Loop: Integrate a feedback mechanism so MPC informs the MDP of environmental
changes, enabling policy adaptation.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Potential Benefits of Integration</title>
        <p>The integration of MDPs and MPC can address the limitations of standalone methods while enhancing
overall system performance:
 Real-Time Adaptability: MPC’s predictive capabilities enable rapid responses to dynamic
changes, complementing MDPs’ high-level planning [10].
 Trajectory Optimization: MPC ensures smooth and efficient trajectory execution, translating
discrete MDP policies into actionable continuous motion [9].
 Scalability and Robustness: The combined approach allows scalable application to complex
environments while maintaining robustness to uncertainties and disturbances [11].</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Applications for Combined Methods</title>
        <p>The integration of MDPs and predictive control has broad applications in mobile robotics, including:
 Autonomous Navigation: Robots navigating warehouses, hospitals, or urban environments
can benefit from the combined framework for efficient and adaptive path planning.
 Multi-Robot Coordination: Predictive control can optimize interactions between robots in
collaborative tasks, while MDPs ensure high-level task allocation [13].
 Dynamic Obstacle Avoidance: The feedback mechanism between MDPs and MPC can handle
real-time updates to avoid moving obstacles effectively.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work and Conclusion</title>
      <p>
        The findings from this study highlight several key areas for further research and improvement. Future
work should focus on addressing the current limitations of MDP-based maze exploration and
predictive control integration, including the following aspects:
 Developing Hybrid MDP-MPC Systems: While MDPs provide an effective framework for
high-level decision-making [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], they lack real-time adaptability. Conversely, Model Predictive
Control (MPC) excels in trajectory tracking but does not inherently optimize long-term
decisionmaking [12]. Future work should focus on designing hybrid systems that leverage MDPs for
strategic planning and MPC for real-time control, ensuring a seamless balance between
computational efficiency and adaptability in dynamic environments.
 Enhancing Localization Accuracy Through Sensor Fusion: One of the primary challenges
observed in this study is the reliance on vision-based localization, which is susceptible to errors
under poor lighting conditions. Future research should explore multi-sensor fusion techniques,
incorporating data from LiDAR, inertial measurement units (IMUs), and ultrasonic sensors to
improve localization robustness. Advanced filtering techniques, such as Kalman Filters or Particle
Filters, can further enhance state estimation accuracy [11]. • Optimizing Computational Efficiency
for Real-Time Applications: MDP-based decision-making suffers from scalability issues when
applied to large or dynamic environments. Future efforts should explore reinforcement learning
approaches, such as Q-learning or Deep Q-Networks (DQNs), to approximate value functions
efficiently. Additionally, parallel computing and GPU acceleration could be utilized to speed up
policy computation and real-time adaptability [3].
 Application in Real-World Scenarios: Future studies should validate the proposed hybrid
MDP-MPC system in real-world environments beyond simulated grid-world setups. Potential
applications include warehouse automation, autonomous navigation in urban settings, and search
and rescue missions, where adaptive decision-making and precise control are crucial [13].
 Improving Obstacle Avoidance Strategies: The current MDP framework assumes a static
environment. However, real-world navigation often involves dynamic obstacles. Future research
should focus on integrating dynamic obstacle avoidance mechanisms using predictive models and
real-time environmental perception [11].
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study explored the integration of Markov Decision Processes (MDPs) with predictive control
techniques for mobile robot trajectory tracking and maze exploration. Through a case study, we
demonstrated that while MDPs provide an effective framework for navigation in structured
environments [4], they face limitations in real-time adaptability. To address these challenges, we
examined how Model Predictive Control (MPC) can enhance trajectory tracking performance by
dynamically adjusting control actions in response to environmental changes [12].</p>
      <p>Our findings suggest that combining MDPs with predictive control can significantly improve the
efficiency and adaptability of autonomous navigation systems. By leveraging the strengths of both
methods, robots can achieve optimal decision-making while maintaining real-time responsiveness.
The proposed approach has potential applications in autonomous robotics, warehouse automation,
and dynamic path planning for mobile robots operating in uncertain environments [13].</p>
      <p>Future research should focus on enhancing localization accuracy, optimizing computational
efficiency, and applying the hybrid MDP-MPC framework in real world robotic systems. The
integration of reinforcement learning techniques [2] and sensor fusion strategies [11] could further
improve performance, making mobile robots more capable of handling complex, real-world
navigation tasks.</p>
      <p>In conclusion, this work revisited MDP-based maze exploration and highlighted its potential when
combined with predictive control for mobile robot trajectory tracking.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check. After using this tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[2] R. S. Sutton and A. G. Barto. “Reinforcement Learning: An Introduction”, MIT Press, 2017. URL:
https://doi.org/10.5555/3312046
[3] E. Alpaydin. “Introduction to Machine Learning.”, MIT Press, 2014. URL:
https://ieeexplore.ieee.org/book/6267367
[4] N. Privault. “Understanding Markov Chains: Examples and Applications.”, Springer, 2013. URL:
https://doi.org/10.1007/ 978-981-13-0659-4
[5] Q. Hu and W. Yue. “Markov Decision Processes with Their Applications.”, Springer, 2008. URL:
http://dx.doi.org/10.1007/ 978-0-387-36951-8
[6] E. A. Feinberg and A. Shwartz. “Handbook of Markov Decision Processes.” Springer, 2002. URL:
http://dx.doi.org/10.1007/ 978-1-4615-0805-2
[7] X. Wang, X. Wang, and D. M. Wilkes. “Machine Learning-Based Natural Scene Recognition for
Mobile Robot Localization in an Unknown Environment.”, Springer, 2019. URL:
http://dx.doi.org/10.1007/ 978-981-13-9217-7
[8] [8] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. “Constrained model predictive
control: Stability and optimality.”, Automatica, 36(6):789–814, 2000. URL:
https://doi.org/10.1016/S0005-1098(99)00214-9
[9] E E. F. Camacho and C. Bordons. “Model Predictive Control.” Springer, 2013. URL:
https://doi.org/10.1007/978-0-85729-398-5
[10] [10] J. B. Rawlings, D. Q. Mayne, and M. M. Diehl. “Model Predictive Control: Theory and</p>
      <p>Design”. Nob Hill Publishing, 2009. URL: https://www.nobhillpublishing.com/mpc/
[11] X. Qian, J. R. Akella, and H. A. Ghasemi&gt; “Adaptive Model Predictive Control for Obstacle
Avoidance in Dynamic Environments.”, IEEE Transactions on Robotics, 2019, vol. 35, no. 2, pp.
431–446. URL: https://doi.org/10.48550/arXiv.2303.15869
[12] P. Falcone, F. Borrelli, J. Asgari, H. E. Tseng, and D. Hrovat. “Predictive Active Steering Control
for Autonomous Vehicle Systems,” IEEE Transactions on Control Systems Technology, 2007, vol.
15, no. 3, pp. 566–580. URL: https://doi.org/10.1109/TCST.2007.894653
[13] M. Turpin, N. Michael, and V. Kumar. “CAPT: Coordinated path planning for multiple robots.”,
The International Journal of Robotics Research, 2014, 33(9):980–999. URL:
https://doi.org/10.1177/0278364914525241.
[14] William C. Cohen. “Optimal control theory—an introduction. Control.”, Prentice-Hall, 1971, vol.</p>
      <p>17, pp. 1018. URL: https://doi.org/10.1002/aic.690170452.
[15] J. Russell and P. Norvig. “Artificial Intelligence: A Modern Approach (International Edition).”,</p>
      <p>Pearson, 2021. URL: https://elibrary.pearson.de/book/99.150005/9781292401171
[16] J. Li, J. Sun, L. Liu, and J. Xu. “Model predictive control for the tracking of autonomous mobile
robot combined with a local path planning,” Measurement and Control, 2021, vol. 54, no. 9-10, pp.
1319–1325. URL: https://doi.org/10.1177/00202940211043070</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.L.</given-names>
            <surname>Puterman</surname>
          </string-name>
          . “
          <article-title>Markov Decision Processes: Discrete Stochastic Dynamic Programming</article-title>
          .”, John Wiley &amp; Sons,
          <year>2005</year>
          . URL: https://doi.org/10.1002/9780470316887
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>