=Paper=
{{Paper
|id=Vol-3896/short14
|storemode=property
|title=Behavioral model of autonomous robotic systems using reinforcement learning methods
|pdfUrl=https://ceur-ws.org/Vol-3896/short14.pdf
|volume=Vol-3896
|authors=Oleksii Matsiievskyi,Igor Achkasov,Yevhenii Borodavka,Roman Mazurenko
|dblpUrl=https://dblp.org/rec/conf/ittap/MatsiievskyiABM24
}}
==Behavioral model of autonomous robotic systems using reinforcement learning methods==
Behavioral model of autonomous robotic
systems using reinforcement learning
methods
Oleksii Matsiievskyi1, Igor Achkasov1, Yevhenii Borodavka1,, Roman Mazurenko1
1
Kyiv National University of Construction and Architecture, 31, Air Force Avenue, Kyiv, 03037, Ukraine
Abstract
This study is devoted to modeling the behavior of autonomous robotic systems using reinforcement
learning (RL) methods. With the rapid development of robotics, computing, artificial intelligence, and
machine learning, it is becoming increasingly important to develop new approaches that allow
autonomous robots to adapt to dynamic and unpredictable environments. Unlike traditional control
methods, RL allows robots to autonomously learn optimal strategies through interaction with the
environment, receiving rewards for correct actions and penalties for mistakes. This study discusses
the key components and challenges of applying RL in real robotic systems, including environmental
complexity, efficient modeling, and scalability. The study also presents a neural network model
specifically designed for robotic agents and demonstrates its effectiveness through simulations. The
results confirm that RL-based models significantly increase the adaptability and reliability of
autonomous robots in achieving predefined goals, such as obstacle avoidance and target navigation.
Keywords ⋆1
Autonomous Robotic Systems, Reinforcement Learning (RL), Neural Network Models, Artificial
Intelligence in Robotics, Dynamic Environments
Introduction
The behavior of autonomous robotic systems is one of the most promising and challenging
tasks of modern science and technology. The rapid development of technologies in the field of
robotics, computing, artificial intelligence, and machine learning is contributing to the
emergence of new methods and approaches to solve problems related to the autonomous
operation of robots in the real world. Modern robots must not only execute predefined
commands but also adapt their behavior to environmental conditions, make decisions in
complex and unpredictable situations, while ensuring high accuracy, reliability, and safety [1].
ITTAP’2024: 4th International Workshop on Information Technologies: Theoretical and Applied Problems, October 23-
25, 2024, Ternopil, Ukraine, Opole, Poland ∗ Corresponding author.
†
These authors contributed equally.
matsievskiyolexiy@gmail.com (O. Matsiievskyi); achckasov.i@ukr.net (I Achkasov);
yevgeniy.borodavka@gmail.com(Y. Borodavka); mazurenkodev@gmail.com (R. Mazurenko)
0009-0008-2341-8166 (O. Matsiievskyi); 0000-0002-7049-0530 (I Achkasov); 0000-0002-7476-9387
(Y. Borodavka); 0000-0003-3954-9423 (R. Mazurenko)
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
One of the key approaches to achieving this goal is the use of RL reinforcement learning
methods [2]. Reinforcement learning allows robots to independently learn optimal behavioral
strategies through interaction with the environment, receiving rewards for correct actions and
penalties for mistakes.
The essence of reinforcement learning is that the agent does not have predefined rules or
behavioral patterns [3]. Instead, it gradually accumulates knowledge about the environment,
determining which actions are best for achieving goals. The importance of this approach lies in
the ability of agents to adapt to changing conditions, which cannot be achieved using traditional
programming methods.
In the context of autonomous robotic systems, reinforcement learning is of particular
importance because it allows robots to interact with the physical world, taking into account its
dynamism and uncertainty [4]. For example, autonomous vehicles must not only follow traffic
rules, but also take into account the behavior of other road users, changes in weather conditions,
and road conditions.
Classical approaches to robot control [5], such as hard-coded rules or scheduling algorithms,
are often insufficient in complex dynamic environments. This is because realworld conditions
may differ significantly from those planned at the stage of algorithm development. This is where
reinforcement learning demonstrates its advantage, as the robot can learn from its own mistakes
and improve its behavioral strategy based on feedback.
The application of reinforcement learning to robotic systems also contributes to the
development of new methods and models of interaction with physical objects and people. For
example, autonomous robots can learn to recognize facial expressions, gestures, or other signs
that indicate human intentions and adjust their actions accordingly [6].
Despite the significant progress in reinforcement learning research, many aspects of this
approach remain an active area of research[7]. One of the main challenges is the large number of
iterations required to train agents in complex environments. Real-world robots often face time,
resource, and safety constraints, so modeling environments and algorithms in simulations is an
important part of research [8-11].
Thus, modeling the behavior of autonomous robotic systems using reinforcement learning is
an important area of modern science that allows for the creation of more flexible, reliable, and
adaptive systems [12]. This approach contributes to the development of artificial intelligence
and robotics, making innovative solutions possible for many areas of our lives[13].
To build a mathematical model of the behavior of autonomous robotic systems using
reinforcement learning methods, let us consider the main elements of this system. In general, RL
is described as the interaction between an agent and the environment through the Markov
Decision Process, MDP [14]. The Markov decision-making process is modeled as a five:
V ( s )=max a ¿ (1)
′
Where 𝑉(𝑠) is the expected amount of remuneration for the state 𝑠; 𝑎 is an action performed
by an agent; 𝑅(𝑠, 𝑎) is remuneration received upon performance of an action 𝑎 in the state
𝑠; 𝛾 is discount factor (from 0 to 1), which reflects the importance of future remuneration; 𝑃(𝑠′|𝑠,
𝑎) is is the probability of transition to the state 𝑠′ from the state 𝑠 when performing the action
𝑎; 𝑠′ is next state; .
The main elements RL :
Politics (𝜋) is an agent's strategy that determines what actions it performs in different states
of the environment.
𝜋(𝑎|𝑠) = 𝑃(𝑎|𝑠) (2)
where 𝜋(𝑎|𝑠) is probability of choosing an action 𝑎 in a state of 𝑠. The goal of reinforcement
learning is to find the optimal policy 𝜋 that maximizes the expected reward for the agent.
Q-learning method is one of the most common reinforcement learning algorithms. This
method is based on updating the Q-function through the interaction of the agent with the
environment:
Q ( s , a )=Q ( s , a )+a( r +γ max Q ( s ' , a ' )−Q ( s , a ))( 3 )
𝑎
where 𝑄(𝑠, 𝑎) is is the current value of the function 𝑄 for the state 𝑠 and action 𝑎. It represents
an estimate of the expected long-term reward if you act from this state and perform the action 𝑎;
𝑎 is learning rate, which determines how much the new value affects the old one. It varies from 0
to 1; 𝑟 is is the immediate reward that the agent receives after performing the action 𝑎 in the
state 𝑠; 𝛾 is discount factor, which determines the importance of future remuneration. The
values are 𝛾 also varies from 0 to 1; 𝑠′ is the next state the agent enters after the action is
performed 𝑎; is the maximum value of the function 𝑄 for all possible actions 𝑎′ in
the following state 𝑠′. This is the maximum expected reward that an agent can receive based on
the state 𝑠′ and acting optimally; 𝑄(𝑠′, 𝑎′) − 𝑄(𝑠, 𝑎) is the difference between the new estimate
and the current estimate, also known as the temporal difference error.
The Q-learning algorithm is repeated until the Q-values are close to the optimal values. As a
result, the agent can choose actions based on maximizing the Q-value.
2.The main research
The task: an autonomous robotic system, a robot agent, must perform certain actions in the
environment in order to move to a given point, avoid obstacles, etc. The testing environment
will be a simulation of the real world in which the robot operates. The environment
determines the state in which the robot is located and the reward for each action it performs.
Figure 1 shows a neural network model for modeling the behavior of autonomous robotic
systems using Reinforcement Learning (RL) techniques. This diagram represents a simple
neural network consisting of three main blocks: an input layer, a hidden layer, and an output
layer.
Let's analyze each of these blocks separately:
Input Layer
• Description: The input layer is the first layer of a neural network. It is responsible for
receiving the input data.
• Function: Each node (neuron) in this layer represents one input parameter or feature
from the data set. For example, if a model uses five input parameters (such as sensor data
or image pixels), there will be five nodes in this layer.
• Transitions: The outputs of the input layer are passed to the hidden layer. Nodes in this
layer usually have no activation functions.
Hidden Layer
• Description: This is an intermediate layer between the input and output layers. In this
model, there is one hidden layer.
• Function: The hidden layer processes the input data using the Rectified Linear Unit
activation function.
• Transitions: The output from the hidden layer goes to the output layer. Each node in the
hidden layer processes the data it receives from the previous layer and passes it to the
next one.
Output Layer
• Description: The output layer is the final layer in a neural network model.
• Function: This layer is responsible for generating the final result or prediction. The
number of nodes in the output layer depends on the task. For example, there may be two
output nodes for a two-class classification, one for a regression.
• Transitions: The output layer takes the data from the hidden layer and uses it to generate
the final result by applying an activation function (e.g., Softmax for classification).
There are arrows between all the layers that symbolize the transfer of data between them.
These arrows show how data flows through the model sequentially: from the input layer to the
hidden layer and finally to the output layer. The connections between layers are fully connected,
which means that every node in one layer is connected to every node in the next layer.
Input data arrives at the input layer, where it is passed to the hidden layer for processing.
After processing, the output data is passed to the output layer, which generates the final result or
prediction.
Figure 1: A neural network model for an autonomous robotic system using reinforcement
learning methods.
The next step is to create the structure of the neural network. The development environment
will be Pycharm, using the Python programming language and the 'PyTorch' library, we will
write a neural network structure for modeling an autonomous robotic system, Figure 2.
First, we import the 'torch' library for working with tensors, 'torch.nn' for creating neural
networks, 'torch.optim' for SGD training optimizers, Adam.
Figure 2: Neural network for an agent with reinforcement learning on PyTorch.
In the class ‘RobotAgent’:
• __init__ (constructor): Initializes the layers of the neural network.
• self.fc1: The first fully connected layer that accepts the input size vector state_size and
transforms it into a vector with 128 features.
• self.fc2: The second fully connected layer that transforms a vector with 128 features
into another vector with 128 features.
• self.fc3: The third fully connected layer, which takes a vector with 128 features and
converts it into a vector of size action_size, which corresponds to the number of
possible actions.
Function ‘forward’:
• forward: Performs a direct pass through the neural network. This is the main function
that determines how data passes through the network layers.
• torch.relu: Applies the Rectified Linear Unit activation function after each of the first
two layers, which allows the model to detect non-linear dependencies.
• torch.softmax: An activation function that converts the output values of the last layer
into probabilities for each action. The outputs will reflect the probability of choosing
each of the possible actions.
To train the model, reinforcement learning is used, which requires large computing
resources and an iterative approach, Figure 3.
Figure 3: Optimization of an agent neural network with reinforcement learning on PyTorch.
Where:
• optimizer = optim.Adam(agent.parameters(), lr=0.001): The Adam optimizer is used to
update the model parameters. The learning rate is set to 0.001.
• - Learning cycle: In each episode, the agent interacts with the environment, choosing
actions based on probabilities computed by the network. It then receives a reward
from the environment, which is used to compute a loss function.
• optimizer.step(): Updates model parameters based on calculated gradients.
This neural network model allows an autonomous robotic system to learn and improve its
behavior through interaction with its environment using reinforcement learning techniques.
The model adapts to new situations and gradually improves its skills to achieve specified goals,
such as moving to a point or avoiding obstacles.
Result
The study confirmed the effectiveness of reinforcement learning methods for modeling the
behavior of autonomous robotic systems. The developed neural network model allowed the
agent to successfully learn through interaction with the environment, demonstrating the ability
to adapt to changing conditions and improve its strategies to achieve its goals.
Table 1 and Figure 4 show the progress of the neural network training for the autonomous
robotic system. The results show that as the number of episodes increased, the average reward
of the agent gradually increased and the number of steps required to complete tasks decreased.
From episode 1 to episode 50, there was a significant decrease in the average reward, indicating
the difficulty of the initial stages of learning. However, from episode 100 onwards, the average
reward began to increase, and in episode 350 it reached a maximum value of +100, which is an
indicator of successful training of the system.
A significant proportion of the episodes were completed successfully starting from episode
100, which confirms the gradual improvement of the agent's behavioral strategy. The obtained
results confirm the effectiveness of the developed neural network model and reinforcement
learning methods for autonomous robotic systems.
Figure 4: Performance During Training of a neural network for an autonomous robotic system.
Table 1
Progress of neural network training for an autonomous robotic system
Episode Average reward Number of steps Successful episode
(Yes/No)
1 -50 10 No
50 -20 30 No
100 10 50 Yes
150 30 60 Yes
200 50 80 Yes
250 70 90 Yes
300 90 100 Yes
350 100 110 Yes
Conclusions
The study confirmed that Reinforcement Learning methods are effective for modeling the
behavior of autonomous robotic systems. Thanks to these methods, the agent was able to learn
through interaction with the environment, adapt to changing conditions, and improve its
strategies to achieve its goals.
The developed neural network model, which consists of input, hidden, and output layers,
allowed the agent to gradually accumulate knowledge about the environment and determine the
optimal actions. This ensured the agent's ability to learn independently and improve behavioral
strategies in complex environments.
The use of simulations made it possible to quickly test new approaches, create accurate
models of the environment, and significantly accelerate the learning process of autonomous
systems.
References
[1] H. Yun, Y. Cho, J. Lee, A. Ha та J. Yun, “Generative Model-Based Simulation of Driver
Behavior When Using Control Input Interface for Teleoperated Driving in
Unstructured Canyon Terrains”, у Towards Autonomous Robotic Systems. Cham: Springer
Nat. Switz., 2023, pp. 482–493. https://doi.org/10.1007/978-3-031-433603_39.
[2] P. Fergus та C. Chalmers, “Deep Reinforcement Learning”, у Computational Intelligence
Methods and Applications. Cham: Springer Int. Publishing, 2022, pp. 255–264:
https://doi.org/10.1007/978-3-031-04420-5_11.
[3] Lorenz U. Bestärkendes Lernen als Teilgebiet des Maschinellen Lernens. Reinforcement
Learning. Berlin, Heidelberg, 2020. P. 1–11. URL: https://doi.org/10.1007/978-3-662-61651-
2_1.
[4] S. Hai-Jew, “Finding Automated (Bot, Sensor) or Semi-Automated (Cyborg) Social Media
Accounts Using Network Analysis and NodeXL Basic”, Robotic Systems. IGI Glob., 2020, pp.
1250–1289. https://doi.org/10.4018/978-1-7998-1754-3.ch060.
[5] J. A. Abdulsaheb та D. J. Kadhim, “Classical and Heuristic Approaches for Mobile Robot
Path Planning: A Survey”, Robotics, vol. 12, no. 4, pp. 93, June 2023.
https://doi.org/10.3390/robotics12040093.
[6] G. Ryzhakova, O. Malykhina, V. Pokolenko, O. Rubtsova, O. Homenko, I. Nesterenko, T.
Honcharenko. Construction Project Management with Digital Twin
Information System”, International Journal of Emerging Technology and Advanced
Engineering, 2022, Vol. 12, Issue 10, pp. 19-28.
https://doi.org/10.46338/ijetae1022_03
[7] I. Sung, B. Choi та P. Nielsen, “Reinforcement Learning for Resource Constrained
Project Scheduling Problem with Activity Iterations and Crashing”, IFAC-
PapersOnLine, vol. 53, no. 2, pp. 10493–10497, 2020.
https://doi.org/10.1016/j.ifacol.2020.12.2794.
[8] S. Dolhopolov, T. Honcharenko, O. Terentyev, K. Predun and A.Rosynskyi. Information
system of multi-stage analysis of the building of object models on a construction site, IOP
Conference Series: Earth and Environmental Science, 1254
(2023) 012075, doi:10.1088/1755-1315/1254/1/012075.
https://iopscience.iop.org/article/10.1088/1755-1315/1254/1/012075/pdf
[9] D. Chernyshev; S. Dolhopolov; T. Honcharenko; H. Haman; T. Ivanova; M. Zinchenko.
Integration of Building Information Modeling and Artificial Intelligence Systems to Create
a Digital Twin of the Construction Site, 2022 IEEE 17th
International Conference on Computer Sciences and Information Technologies (CSIT), pp.
36-39, 2022. DOI: 10.1109/CSIT56902.2022.10000717
[10] T. Honcharenko, R. Akselrod, A. Shpakov, O. Khomenko. Information system based on
multi-value classification of fully connected neural network for construction management,
IAES International Journal of Artificial Intelligence, 2023, № 12(2), Р.593-601
https://ijai.iaescore.com/index.php/IJAI/article/view/21864
[11] Yeremenko, B., Mazurenko, R., Stetsyk, O. & Buhrov, A. (2023). Intelligent Management of
Traffic Flows in Large Cities. In TRANSBALTICA XIII: Transportation Science and
Technology (pp. 33–42). Springer International Publishing.URL:
https://doi.org/10.1007/978-3-031-25863-3_4.
[12] Z. Kakish, K. Elamvazhuthi та S. Berman, “Using Reinforcement Learning to Herd a Robotic
Swarm to a Target Distribution”, у Distributed Autonomous Robotic
Systems. Cham: Springer Int. Publishing, 2022, pp. 401–
414. https://doi.org/10.1007/978-3-030-92790-5_31.
[13] Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Ruizhi
Chen, Ling Li, Yunji, “Learning Controllable Elements Oriented Representations for
Reinforcement Learning”, Neurocomputing, pp. 126455, June
(2023) https://doi.org/10.1016/j.neucom.2023.126455.
[14] R. Sivaraman, “MARKOV PROCESS AND DECISION ANALYSIS”, J. MECHANICS
CONTINUA MATH. SCI., vol. 15, no. 7, July 2020.
https://doi.org/10.26782/jmcms.2020.07.00002