<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IAES International Journal of Artificial Intelligence</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Behavioral model of autonomous robotic systems using reinforcement learning methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksii Matsiievskyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Achkasov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yevhenii Borodavka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Mazurenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Autonomous Robotic Systems</institution>
          ,
          <addr-line>Reinforcement Learning (RL), Neural Network Models, Artificial Intelligence in Robotics, Dynamic Environments</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyiv National University of Construction and Architecture</institution>
          ,
          <addr-line>31, Air Force Avenue, Kyiv, 03037</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>12</volume>
      <issue>2</issue>
      <fpage>593</fpage>
      <lpage>601</lpage>
      <abstract>
        <p>This study is devoted to modeling the behavior of autonomous robotic systems using reinforcement learning (RL) methods. With the rapid development of robotics, computing, artificial intelligence, and machine learning, it is becoming increasingly important to develop new approaches that allow autonomous robots to adapt to dynamic and unpredictable environments. Unlike traditional control methods, RL allows robots to autonomously learn optimal strategies through interaction with the environment, receiving rewards for correct actions and penalties for mistakes. This study discusses the key components and challenges of applying RL in real robotic systems, including environmental complexity, efficient modeling, and scalability. The study also presents a neural network model specifically designed for robotic agents and demonstrates its effectiveness through simulations. The results confirm that RL-based models significantly increase the adaptability and reliability of autonomous robots in achieving predefined goals, such as obstacle avoidance and target navigation.</p>
      </abstract>
      <kwd-group>
        <kwd>⋆1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The behavior of autonomous robotic systems is one of the most promising and challenging
tasks of modern science and technology. The rapid development of technologies in the field of
robotics, computing, artificial intelligence, and machine learning is contributing to the
emergence of new methods and approaches to solve problems related to the autonomous
operation of robots in the real world. Modern robots must not only execute predefined
commands but also adapt their behavior to environmental conditions, make decisions in
complex and unpredictable situations, while ensuring high accuracy, reliability, and safety [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        One of the key approaches to achieving this goal is the use of RL reinforcement learning
methods [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Reinforcement learning allows robots to independently learn optimal behavioral
strategies through interaction with the environment, receiving rewards for correct actions and
penalties for mistakes.
      </p>
      <p>
        The essence of reinforcement learning is that the agent does not have predefined rules or
behavioral patterns [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Instead, it gradually accumulates knowledge about the environment,
determining which actions are best for achieving goals. The importance of this approach lies in
the ability of agents to adapt to changing conditions, which cannot be achieved using traditional
programming methods.
      </p>
      <p>
        In the context of autonomous robotic systems, reinforcement learning is of particular
importance because it allows robots to interact with the physical world, taking into account its
dynamism and uncertainty [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For example, autonomous vehicles must not only follow traffic
rules, but also take into account the behavior of other road users, changes in weather conditions,
and road conditions.
      </p>
      <p>
        Classical approaches to robot control [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], such as hard-coded rules or scheduling algorithms,
are often insufficient in complex dynamic environments. This is because realworld conditions
may differ significantly from those planned at the stage of algorithm development. This is where
reinforcement learning demonstrates its advantage, as the robot can learn from its own mistakes
and improve its behavioral strategy based on feedback.
      </p>
      <p>
        The application of reinforcement learning to robotic systems also contributes to the
development of new methods and models of interaction with physical objects and people. For
example, autonomous robots can learn to recognize facial expressions, gestures, or other signs
that indicate human intentions and adjust their actions accordingly [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Despite the significant progress in reinforcement learning research, many aspects of this
approach remain an active area of research[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. One of the main challenges is the large number of
iterations required to train agents in complex environments. Real-world robots often face time,
resource, and safety constraints, so modeling environments and algorithms in simulations is an
important part of research [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8-11</xref>
        ].
      </p>
      <p>Thus, modeling the behavior of autonomous robotic systems using reinforcement learning is
an important area of modern science that allows for the creation of more flexible, reliable, and
adaptive systems [12]. This approach contributes to the development of artificial intelligence
and robotics, making innovative solutions possible for many areas of our lives[13].</p>
      <p>To build a mathematical model of the behavior of autonomous robotic systems using
reinforcement learning methods, let us consider the main elements of this system. In general, RL
is described as the interaction between an agent and the environment through the Markov
Decision Process, MDP [14]. The Markov decision-making process is modeled as a five:
V ( s )=max a ¿
′
(1)</p>
      <p>Where () is the expected amount of remuneration for the state ;  is an action performed
by an agent; (, ) is remuneration received upon performance of an action  in the state
;  is discount factor (from 0 to 1), which reflects the importance of future remuneration; (′|,
) is is the probability of transition to the state ′ from the state  when performing the action
; ′ is next state; .</p>
      <p>The main elements RL :</p>
      <p>Politics () is an agent's strategy that determines what actions it performs in different states
of the environment.</p>
      <p>(|) = (|)
(2)
where (|) is probability of choosing an action  in a state of . The goal of reinforcement
learning is to find the optimal policy that maximizes the expected reward for the agent.</p>
      <p>Q-learning method is one of the most common reinforcement learning algorithms. This
method is based on updating the Q-function through the interaction of the agent with the
environment:</p>
      <p>Q ( s , a )=Q ( s , a )+ a ( r + γ max Q ( s ' , a ' )−Q ( s , a ))( 3 )</p>
      <p>where (, ) is is the current value of the function  for the state  and action . It represents
an estimate of the expected long-term reward if you act from this state and perform the action ;
 is learning rate, which determines how much the new value affects the old one. It varies from 0
to 1;  is is the immediate reward that the agent receives after performing the action  in the
state ;  is discount factor, which determines the importance of future remuneration. The
values are  also varies from 0 to 1; ′ is the next state the agent enters after the action is
performed ; is the maximum value of the function  for all possible actions ′ in
the following state ′. This is the maximum expected reward that an agent can receive based on
the state ′ and acting optimally; (′, ′) − (, ) is the difference between the new estimate
and the current estimate, also known as the temporal difference error.</p>
      <p>The Q-learning algorithm is repeated until the Q-values are close to the optimal values. As a
result, the agent can choose actions based on maximizing the Q-value.</p>
    </sec>
    <sec id="sec-2">
      <title>2.The main research</title>
      <p>The task: an autonomous robotic system, a robot agent, must perform certain actions in the
environment in order to move to a given point, avoid obstacles, etc. The testing environment
will be a simulation of the real world in which the robot operates. The environment
determines the state in which the robot is located and the reward for each action it performs.</p>
      <p>Figure 1 shows a neural network model for modeling the behavior of autonomous robotic
systems using Reinforcement Learning (RL) techniques. This diagram represents a simple
neural network consisting of three main blocks: an input layer, a hidden layer, and an output
layer.</p>
      <p>Let's analyze each of these blocks separately:
Hidden Layer
•
•
•
•
•
•</p>
      <p>Description: The input layer is the first layer of a neural network.It is responsible for
receiving the input data.</p>
      <p>Function: Each node (neuron) in this layer represents one input parameter or feature
from the data set. For example, if a model uses five input parameters (such as sensor data
or image pixels), there will be five nodes in this layer.</p>
      <p>Transitions: The outputs of the input layer are passed to the hidden layer. Nodes in this
layer usually have no activation functions.</p>
      <p>Description: This is an intermediate layer between the input and output layers. In this
model, there is one hidden layer.</p>
      <p>Function: The hidden layer processes the input data using the Rectified Linear Unit
activation function.</p>
      <p>Transitions: The output from the hidden layer goes to the output layer. Each node in the
hidden layer processes the data it receives from the previous layer and passes it to the
next one.</p>
      <p>Output Layer
• Description: The output layer is the final layer in a neural network model.
• Function: This layer is responsible for generating the final result or prediction. The
number of nodes in the output layer depends on the task. For example, there may be two
output nodes for a two-class classification, one for a regression.
• Transitions: The output layer takes the data from the hidden layer and uses it to generate
the final result by applying an activation function (e.g., Softmax for classification).</p>
      <p>There are arrows between all the layers that symbolize the transfer of data between them.
These arrows show how data flows through the model sequentially: from the input layer to the
hidden layer and finally to the output layer. The connections between layers are fully connected,
which means that every node in one layer is connected to every node in the next layer.</p>
      <p>Input data arrives at the input layer, where it is passed to the hidden layer for processing.
After processing, the output data is passed to the output layer, which generates the final result or
prediction.</p>
      <p>The next step is to create the structure of the neural network. The development environment
will be Pycharm, using the Python programming language and the 'PyTorch' library, we will
write a neural network structure for modeling an autonomous robotic system, Figure 2.</p>
      <p>First, we import the 'torch' library for working with tensors, 'torch.nn' for creating neural
networks, 'torch.optim' for SGD training optimizers, Adam.
• __init__ (constructor): Initializes the layers of the neural network.
• self.fc1: The first fully connected layer that accepts the input size vector state_size and
transforms it into a vector with 128 features.
• self.fc2: The second fully connected layer that transforms a vector with 128 features
into another vector with 128 features.
• self.fc3: The third fully connected layer, which takes a vector with 128 features and
converts it into a vector of size action_size, which corresponds to the number of
possible actions.</p>
      <p>Function ‘forward’:
• forward: Performs a direct pass through the neural network. This is the main function
that determines how data passes through the network layers.
• torch.relu: Applies the Rectified Linear Unit activation function after each of the first
two layers, which allows the model to detect non-linear dependencies.
• torch.softmax: An activation function that converts the output values of the last layer
into probabilities for each action. The outputs will reflect the probability of choosing
each of the possible actions.</p>
      <p>To train the model, reinforcement learning is used, which requires large computing
resources and an iterative approach, Figure 3.
Result
• optimizer = optim.Adam(agent.parameters(), lr=0.001): The Adam optimizer is used to
update the model parameters. The learning rate is set to 0.001.
• - Learning cycle: In each episode, the agent interacts with the environment, choosing
actions based on probabilities computed by the network. It then receives a reward
from the environment, which is used to compute a loss function.</p>
      <p>• optimizer.step(): Updates model parameters based on calculated gradients.</p>
      <p>This neural network model allows an autonomous robotic system to learn and improve its
behavior through interaction with its environment using reinforcement learning techniques.
The model adapts to new situations and gradually improves its skills to achieve specified goals,
such as moving to a point or avoiding obstacles.</p>
      <p>The study confirmed the effectiveness of reinforcement learning methods for modeling the
behavior of autonomous robotic systems. The developed neural network model allowed the
agent to successfully learn through interaction with the environment, demonstrating the ability
to adapt to changing conditions and improve its strategies to achieve its goals.</p>
      <p>Table 1 and Figure 4 show the progress of the neural network training for the autonomous
robotic system. The results show that as the number of episodes increased, the average reward
of the agent gradually increased and the number of steps required to complete tasks decreased.
From episode 1 to episode 50, there was a significant decrease in the average reward, indicating
the difficulty of the initial stages of learning. However, from episode 100 onwards, the average
reward began to increase, and in episode 350 it reached a maximum value of +100, which is an
indicator of successful training of the system.</p>
      <p>A significant proportion of the episodes were completed successfully starting from episode
100, which confirms the gradual improvement of the agent's behavioral strategy. The obtained
results confirm the effectiveness of the developed neural network model and reinforcement
learning methods for autonomous robotic systems.</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>The study confirmed that Reinforcement Learning methods are effective for modeling the
behavior of autonomous robotic systems. Thanks to these methods, the agent was able to learn
through interaction with the environment, adapt to changing conditions, and improve its
strategies to achieve its goals.</p>
      <p>The developed neural network model, which consists of input, hidden, and output layers,
allowed the agent to gradually accumulate knowledge about the environment and determine the
optimal actions. This ensured the agent's ability to learn independently and improve behavioral
strategies in complex environments.</p>
      <p>The use of simulations made it possible to quickly test new approaches, create accurate
models of the environment, and significantly accelerate the learning process of autonomous
systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ha та J.</given-names>
            <surname>Yun</surname>
          </string-name>
          , “
          <article-title>Generative Model-Based Simulation of Driver Behavior When Using Control Input Interface for Teleoperated Driving in Unstructured Canyon Terrains”, у Towards Autonomous Robotic Systems</article-title>
          . Cham: Springer Nat. Switz.,
          <year>2023</year>
          , pp.
          <fpage>482</fpage>
          -
          <lpage>493</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -433603_
          <fpage>39</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fergus та C. Chalmers</surname>
          </string-name>
          , “Deep Reinforcement Learning”,
          <source>у Computational Intelligence Methods and Applications</source>
          . Cham: Springer Int. Publishing,
          <year>2022</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>264</lpage>
          : https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -04420-5_
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Lorenz</surname>
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Bestärkendes</surname>
          </string-name>
          <article-title>Lernen als Teilgebiet des Maschinellen Lernens</article-title>
          .
          <source>Reinforcement Learning</source>
          . Berlin, Heidelberg,
          <year>2020</year>
          . P. 1-
          <fpage>11</fpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>662</fpage>
          -61651-
          <issue>2</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hai-Jew</surname>
          </string-name>
          , “
          <article-title>Finding Automated (Bot, Sensor) or Semi-Automated (Cyborg) Social Media Accounts Using Network Analysis and NodeXL Basic”</article-title>
          ,
          <source>Robotic Systems. IGI Glob</source>
          .,
          <year>2020</year>
          , pp.
          <fpage>1250</fpage>
          -
          <lpage>1289</lpage>
          . https://doi.org/10.4018/978-1-
          <fpage>7998</fpage>
          -1754-3.
          <year>ch060</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Abdulsaheb та D. J. Kadhim</surname>
          </string-name>
          , “
          <article-title>Classical and Heuristic Approaches for Mobile Robot Path Planning: A Survey”</article-title>
          ,
          <source>Robotics</source>
          , vol.
          <volume>12</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>93</fpage>
          ,
          <year>June 2023</year>
          . https://doi.org/10.3390/robotics12040093.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ryzhakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Malykhina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pokolenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Rubtsova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Homenko</surname>
          </string-name>
          , I. Nesterenko,
          <string-name>
            <given-names>T.</given-names>
            <surname>Honcharenko</surname>
          </string-name>
          .
          <article-title>Construction Project Management with Digital Twin Information System”</article-title>
          ,
          <source>International Journal of Emerging Technology and Advanced Engineering</source>
          ,
          <year>2022</year>
          , Vol.
          <volume>12</volume>
          , Issue 10, pp.
          <fpage>19</fpage>
          -
          <lpage>28</lpage>
          . https://doi.org/10.46338/ijetae1022_
          <fpage>03</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Choi та P.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          , “
          <article-title>Reinforcement Learning for Resource Constrained Project Scheduling Problem with Activity Iterations and Crashing”</article-title>
          ,
          <source>IFACPapersOnLine</source>
          , vol.
          <volume>53</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>10493</fpage>
          -
          <lpage>10497</lpage>
          ,
          <year>2020</year>
          . https://doi.org/10.1016/j.ifacol.
          <year>2020</year>
          .
          <volume>12</volume>
          .2794.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dolhopolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Honcharenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Terentyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Predun</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosynskyi</surname>
          </string-name>
          .
          <article-title>Information system of multi-stage analysis of the building of object models on a construction site</article-title>
          ,
          <source>IOP Conference Series: Earth and Environmental Science</source>
          ,
          <volume>1254</volume>
          (
          <year>2023</year>
          ) 012075, doi:10.1088/
          <fpage>1755</fpage>
          -1315/1254/1/012075. https://iopscience.iop.org/article/10.1088/
          <fpage>1755</fpage>
          -1315/1254/1/012075/pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chernyshev</surname>
          </string-name>
          ; S. Dolhopolov; T. Honcharenko; H. Haman; T. Ivanova;
          <string-name>
            <given-names>M.</given-names>
            <surname>Zinchenko</surname>
          </string-name>
          .
          <source>Integration of Building Information Modeling and Artificial Intelligence Systems to Create a Digital Twin of the Construction Site</source>
          ,
          <source>2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT)</source>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>2022</year>
          . DOI:
          <volume>10</volume>
          .1109/CSIT56902.
          <year>2022</year>
          .10000717
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Honcharenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Akselrod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shpakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khomenko</surname>
          </string-name>
          .
          <article-title>Information system based on multi-value classification of fully connected neural network for construction management,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>