Behavioral model of autonomous robotic systems using reinforcement learning methods Oleksii Matsiievskyi1, Igor Achkasov1, Yevhenii Borodavka1,, Roman Mazurenko1 1 Kyiv National University of Construction and Architecture, 31, Air Force Avenue, Kyiv, 03037, Ukraine Abstract This study is devoted to modeling the behavior of autonomous robotic systems using reinforcement learning (RL) methods. With the rapid development of robotics, computing, artificial intelligence, and machine learning, it is becoming increasingly important to develop new approaches that allow autonomous robots to adapt to dynamic and unpredictable environments. Unlike traditional control methods, RL allows robots to autonomously learn optimal strategies through interaction with the environment, receiving rewards for correct actions and penalties for mistakes. This study discusses the key components and challenges of applying RL in real robotic systems, including environmental complexity, efficient modeling, and scalability. The study also presents a neural network model specifically designed for robotic agents and demonstrates its effectiveness through simulations. The results confirm that RL-based models significantly increase the adaptability and reliability of autonomous robots in achieving predefined goals, such as obstacle avoidance and target navigation. Keywords ⋆1 Autonomous Robotic Systems, Reinforcement Learning (RL), Neural Network Models, Artificial Intelligence in Robotics, Dynamic Environments Introduction The behavior of autonomous robotic systems is one of the most promising and challenging tasks of modern science and technology. The rapid development of technologies in the field of robotics, computing, artificial intelligence, and machine learning is contributing to the emergence of new methods and approaches to solve problems related to the autonomous operation of robots in the real world. Modern robots must not only execute predefined commands but also adapt their behavior to environmental conditions, make decisions in complex and unpredictable situations, while ensuring high accuracy, reliability, and safety [1]. ITTAP’2024: 4th International Workshop on Information Technologies: Theoretical and Applied Problems, October 23- 25, 2024, Ternopil, Ukraine, Opole, Poland ∗ Corresponding author. † These authors contributed equally. matsievskiyolexiy@gmail.com (O. Matsiievskyi); achckasov.i@ukr.net (I Achkasov); yevgeniy.borodavka@gmail.com(Y. Borodavka); mazurenkodev@gmail.com (R. Mazurenko) 0009-0008-2341-8166 (O. Matsiievskyi); 0000-0002-7049-0530 (I Achkasov); 0000-0002-7476-9387 (Y. Borodavka); 0000-0003-3954-9423 (R. Mazurenko) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings One of the key approaches to achieving this goal is the use of RL reinforcement learning methods [2]. Reinforcement learning allows robots to independently learn optimal behavioral strategies through interaction with the environment, receiving rewards for correct actions and penalties for mistakes. The essence of reinforcement learning is that the agent does not have predefined rules or behavioral patterns [3]. Instead, it gradually accumulates knowledge about the environment, determining which actions are best for achieving goals. The importance of this approach lies in the ability of agents to adapt to changing conditions, which cannot be achieved using traditional programming methods. In the context of autonomous robotic systems, reinforcement learning is of particular importance because it allows robots to interact with the physical world, taking into account its dynamism and uncertainty [4]. For example, autonomous vehicles must not only follow traffic rules, but also take into account the behavior of other road users, changes in weather conditions, and road conditions. Classical approaches to robot control [5], such as hard-coded rules or scheduling algorithms, are often insufficient in complex dynamic environments. This is because realworld conditions may differ significantly from those planned at the stage of algorithm development. This is where reinforcement learning demonstrates its advantage, as the robot can learn from its own mistakes and improve its behavioral strategy based on feedback. The application of reinforcement learning to robotic systems also contributes to the development of new methods and models of interaction with physical objects and people. For example, autonomous robots can learn to recognize facial expressions, gestures, or other signs that indicate human intentions and adjust their actions accordingly [6]. Despite the significant progress in reinforcement learning research, many aspects of this approach remain an active area of research[7]. One of the main challenges is the large number of iterations required to train agents in complex environments. Real-world robots often face time, resource, and safety constraints, so modeling environments and algorithms in simulations is an important part of research [8-11]. Thus, modeling the behavior of autonomous robotic systems using reinforcement learning is an important area of modern science that allows for the creation of more flexible, reliable, and adaptive systems [12]. This approach contributes to the development of artificial intelligence and robotics, making innovative solutions possible for many areas of our lives[13]. To build a mathematical model of the behavior of autonomous robotic systems using reinforcement learning methods, let us consider the main elements of this system. In general, RL is described as the interaction between an agent and the environment through the Markov Decision Process, MDP [14]. The Markov decision-making process is modeled as a five: V ( s )=max a ¿ (1) ′ Where 𝑉(𝑠) is the expected amount of remuneration for the state 𝑠; 𝑎 is an action performed by an agent; 𝑅(𝑠, 𝑎) is remuneration received upon performance of an action 𝑎 in the state 𝑠; 𝛾 is discount factor (from 0 to 1), which reflects the importance of future remuneration; 𝑃(𝑠′|𝑠, 𝑎) is is the probability of transition to the state 𝑠′ from the state 𝑠 when performing the action 𝑎; 𝑠′ is next state; . The main elements RL : Politics (𝜋) is an agent's strategy that determines what actions it performs in different states of the environment. 𝜋(𝑎|𝑠) = 𝑃(𝑎|𝑠) (2) where 𝜋(𝑎|𝑠) is probability of choosing an action 𝑎 in a state of 𝑠. The goal of reinforcement learning is to find the optimal policy 𝜋 that maximizes the expected reward for the agent. Q-learning method is one of the most common reinforcement learning algorithms. This method is based on updating the Q-function through the interaction of the agent with the environment: Q ( s , a )=Q ( s , a )+a( r +γ max Q ( s ' , a ' )−Q ( s , a ))( 3 ) 𝑎 where 𝑄(𝑠, 𝑎) is is the current value of the function 𝑄 for the state 𝑠 and action 𝑎. It represents an estimate of the expected long-term reward if you act from this state and perform the action 𝑎; 𝑎 is learning rate, which determines how much the new value affects the old one. It varies from 0 to 1; 𝑟 is is the immediate reward that the agent receives after performing the action 𝑎 in the state 𝑠; 𝛾 is discount factor, which determines the importance of future remuneration. The values are 𝛾 also varies from 0 to 1; 𝑠′ is the next state the agent enters after the action is performed 𝑎; is the maximum value of the function 𝑄 for all possible actions 𝑎′ in the following state 𝑠′. This is the maximum expected reward that an agent can receive based on the state 𝑠′ and acting optimally; 𝑄(𝑠′, 𝑎′) − 𝑄(𝑠, 𝑎) is the difference between the new estimate and the current estimate, also known as the temporal difference error. The Q-learning algorithm is repeated until the Q-values are close to the optimal values. As a result, the agent can choose actions based on maximizing the Q-value. 2.The main research The task: an autonomous robotic system, a robot agent, must perform certain actions in the environment in order to move to a given point, avoid obstacles, etc. The testing environment will be a simulation of the real world in which the robot operates. The environment determines the state in which the robot is located and the reward for each action it performs. Figure 1 shows a neural network model for modeling the behavior of autonomous robotic systems using Reinforcement Learning (RL) techniques. This diagram represents a simple neural network consisting of three main blocks: an input layer, a hidden layer, and an output layer. Let's analyze each of these blocks separately: Input Layer • Description: The input layer is the first layer of a neural network. It is responsible for receiving the input data. • Function: Each node (neuron) in this layer represents one input parameter or feature from the data set. For example, if a model uses five input parameters (such as sensor data or image pixels), there will be five nodes in this layer. • Transitions: The outputs of the input layer are passed to the hidden layer. Nodes in this layer usually have no activation functions. Hidden Layer • Description: This is an intermediate layer between the input and output layers. In this model, there is one hidden layer. • Function: The hidden layer processes the input data using the Rectified Linear Unit activation function. • Transitions: The output from the hidden layer goes to the output layer. Each node in the hidden layer processes the data it receives from the previous layer and passes it to the next one. Output Layer • Description: The output layer is the final layer in a neural network model. • Function: This layer is responsible for generating the final result or prediction. The number of nodes in the output layer depends on the task. For example, there may be two output nodes for a two-class classification, one for a regression. • Transitions: The output layer takes the data from the hidden layer and uses it to generate the final result by applying an activation function (e.g., Softmax for classification). There are arrows between all the layers that symbolize the transfer of data between them. These arrows show how data flows through the model sequentially: from the input layer to the hidden layer and finally to the output layer. The connections between layers are fully connected, which means that every node in one layer is connected to every node in the next layer. Input data arrives at the input layer, where it is passed to the hidden layer for processing. After processing, the output data is passed to the output layer, which generates the final result or prediction. Figure 1: A neural network model for an autonomous robotic system using reinforcement learning methods. The next step is to create the structure of the neural network. The development environment will be Pycharm, using the Python programming language and the 'PyTorch' library, we will write a neural network structure for modeling an autonomous robotic system, Figure 2. First, we import the 'torch' library for working with tensors, 'torch.nn' for creating neural networks, 'torch.optim' for SGD training optimizers, Adam. Figure 2: Neural network for an agent with reinforcement learning on PyTorch. In the class ‘RobotAgent’: • __init__ (constructor): Initializes the layers of the neural network. • self.fc1: The first fully connected layer that accepts the input size vector state_size and transforms it into a vector with 128 features. • self.fc2: The second fully connected layer that transforms a vector with 128 features into another vector with 128 features. • self.fc3: The third fully connected layer, which takes a vector with 128 features and converts it into a vector of size action_size, which corresponds to the number of possible actions. Function ‘forward’: • forward: Performs a direct pass through the neural network. This is the main function that determines how data passes through the network layers. • torch.relu: Applies the Rectified Linear Unit activation function after each of the first two layers, which allows the model to detect non-linear dependencies. • torch.softmax: An activation function that converts the output values of the last layer into probabilities for each action. The outputs will reflect the probability of choosing each of the possible actions. To train the model, reinforcement learning is used, which requires large computing resources and an iterative approach, Figure 3. Figure 3: Optimization of an agent neural network with reinforcement learning on PyTorch. Where: • optimizer = optim.Adam(agent.parameters(), lr=0.001): The Adam optimizer is used to update the model parameters. The learning rate is set to 0.001. • - Learning cycle: In each episode, the agent interacts with the environment, choosing actions based on probabilities computed by the network. It then receives a reward from the environment, which is used to compute a loss function. • optimizer.step(): Updates model parameters based on calculated gradients. This neural network model allows an autonomous robotic system to learn and improve its behavior through interaction with its environment using reinforcement learning techniques. The model adapts to new situations and gradually improves its skills to achieve specified goals, such as moving to a point or avoiding obstacles. Result The study confirmed the effectiveness of reinforcement learning methods for modeling the behavior of autonomous robotic systems. The developed neural network model allowed the agent to successfully learn through interaction with the environment, demonstrating the ability to adapt to changing conditions and improve its strategies to achieve its goals. Table 1 and Figure 4 show the progress of the neural network training for the autonomous robotic system. The results show that as the number of episodes increased, the average reward of the agent gradually increased and the number of steps required to complete tasks decreased. From episode 1 to episode 50, there was a significant decrease in the average reward, indicating the difficulty of the initial stages of learning. However, from episode 100 onwards, the average reward began to increase, and in episode 350 it reached a maximum value of +100, which is an indicator of successful training of the system. A significant proportion of the episodes were completed successfully starting from episode 100, which confirms the gradual improvement of the agent's behavioral strategy. The obtained results confirm the effectiveness of the developed neural network model and reinforcement learning methods for autonomous robotic systems. Figure 4: Performance During Training of a neural network for an autonomous robotic system. Table 1 Progress of neural network training for an autonomous robotic system Episode Average reward Number of steps Successful episode (Yes/No) 1 -50 10 No 50 -20 30 No 100 10 50 Yes 150 30 60 Yes 200 50 80 Yes 250 70 90 Yes 300 90 100 Yes 350 100 110 Yes Conclusions The study confirmed that Reinforcement Learning methods are effective for modeling the behavior of autonomous robotic systems. Thanks to these methods, the agent was able to learn through interaction with the environment, adapt to changing conditions, and improve its strategies to achieve its goals. The developed neural network model, which consists of input, hidden, and output layers, allowed the agent to gradually accumulate knowledge about the environment and determine the optimal actions. This ensured the agent's ability to learn independently and improve behavioral strategies in complex environments. The use of simulations made it possible to quickly test new approaches, create accurate models of the environment, and significantly accelerate the learning process of autonomous systems. References [1] H. Yun, Y. Cho, J. Lee, A. Ha та J. Yun, “Generative Model-Based Simulation of Driver Behavior When Using Control Input Interface for Teleoperated Driving in Unstructured Canyon Terrains”, у Towards Autonomous Robotic Systems. Cham: Springer Nat. Switz., 2023, pp. 482–493. https://doi.org/10.1007/978-3-031-433603_39. [2] P. Fergus та C. Chalmers, “Deep Reinforcement Learning”, у Computational Intelligence Methods and Applications. Cham: Springer Int. Publishing, 2022, pp. 255–264: https://doi.org/10.1007/978-3-031-04420-5_11. [3] Lorenz U. Bestärkendes Lernen als Teilgebiet des Maschinellen Lernens. Reinforcement Learning. Berlin, Heidelberg, 2020. P. 1–11. URL: https://doi.org/10.1007/978-3-662-61651- 2_1. [4] S. Hai-Jew, “Finding Automated (Bot, Sensor) or Semi-Automated (Cyborg) Social Media Accounts Using Network Analysis and NodeXL Basic”, Robotic Systems. IGI Glob., 2020, pp. 1250–1289. https://doi.org/10.4018/978-1-7998-1754-3.ch060. [5] J. A. Abdulsaheb та D. J. Kadhim, “Classical and Heuristic Approaches for Mobile Robot Path Planning: A Survey”, Robotics, vol. 12, no. 4, pp. 93, June 2023. https://doi.org/10.3390/robotics12040093. [6] G. Ryzhakova, O. Malykhina, V. Pokolenko, O. Rubtsova, O. Homenko, I. Nesterenko, T. Honcharenko. Construction Project Management with Digital Twin Information System”, International Journal of Emerging Technology and Advanced Engineering, 2022, Vol. 12, Issue 10, pp. 19-28. https://doi.org/10.46338/ijetae1022_03 [7] I. Sung, B. Choi та P. Nielsen, “Reinforcement Learning for Resource Constrained Project Scheduling Problem with Activity Iterations and Crashing”, IFAC- PapersOnLine, vol. 53, no. 2, pp. 10493–10497, 2020. https://doi.org/10.1016/j.ifacol.2020.12.2794. [8] S. Dolhopolov, T. Honcharenko, O. Terentyev, K. Predun and A.Rosynskyi. Information system of multi-stage analysis of the building of object models on a construction site, IOP Conference Series: Earth and Environmental Science, 1254 (2023) 012075, doi:10.1088/1755-1315/1254/1/012075. https://iopscience.iop.org/article/10.1088/1755-1315/1254/1/012075/pdf [9] D. Chernyshev; S. Dolhopolov; T. Honcharenko; H. Haman; T. Ivanova; M. Zinchenko. Integration of Building Information Modeling and Artificial Intelligence Systems to Create a Digital Twin of the Construction Site, 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), pp. 36-39, 2022. DOI: 10.1109/CSIT56902.2022.10000717 [10] T. Honcharenko, R. Akselrod, A. Shpakov, O. Khomenko. Information system based on multi-value classification of fully connected neural network for construction management, IAES International Journal of Artificial Intelligence, 2023, № 12(2), Р.593-601 https://ijai.iaescore.com/index.php/IJAI/article/view/21864 [11] Yeremenko, B., Mazurenko, R., Stetsyk, O. & Buhrov, A. (2023). Intelligent Management of Traffic Flows in Large Cities. In TRANSBALTICA XIII: Transportation Science and Technology (pp. 33–42). Springer International Publishing.URL: https://doi.org/10.1007/978-3-031-25863-3_4. [12] Z. Kakish, K. Elamvazhuthi та S. Berman, “Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution”, у Distributed Autonomous Robotic Systems. Cham: Springer Int. Publishing, 2022, pp. 401– 414. https://doi.org/10.1007/978-3-030-92790-5_31. [13] Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Ruizhi Chen, Ling Li, Yunji, “Learning Controllable Elements Oriented Representations for Reinforcement Learning”, Neurocomputing, pp. 126455, June (2023) https://doi.org/10.1016/j.neucom.2023.126455. [14] R. Sivaraman, “MARKOV PROCESS AND DECISION ANALYSIS”, J. MECHANICS CONTINUA MATH. SCI., vol. 15, no. 7, July 2020. https://doi.org/10.26782/jmcms.2020.07.00002