Optimal Planning in Systems Consisting of Rational Agents

                         Igor Sinitsyn1, Anatoliy Doroshenko1 and Serhii Pashko1,*
                         1
                          Institute of Software Systems of the National Academy of Sciences of Ukraine, Glushkov prosp. 40, build. 5, Kyiv, 03187,
                         Ukraine


                                         Abstract
                                         The paper is devoted to mathematical methods of planning in systems consisting of rational agents. An
                                         agent is an autonomous object that has sources of information about the environment and influences this
                                         environment. A rational agent is an agent who has a goal and uses optimal behavioral strategies to achieve
                                         it. It is assumed that there is a utility function, which is defined on the set of possible sequences of actions
                                         of the agent and takes values in the set of real numbers. The goal of a rational agent is to maximize the
                                         utility function. If rational agents form a system, then they have a common goal and act in an optimal way
                                         to achieve it. Agents use the optimal solution of the optimization problem, which corresponds to the goal
                                         of the system. The problem of linear programming is considered, in which the number of product sets
                                         produced by the system is maximized. To solve the nonlinear problem of optimizing the production plan,
                                         the conditional gradient method is used, which at each iteration uses a posteriori estimation of the error of
                                         the solution and can stopping the calculation process after reaching the required accuracy. Since the
                                         rational agents that are part of the system can have separate optimality criteria, multi-criteria optimization
                                         problems appear. The article discusses methods for solving such problems, among which is a human-
                                         machine procedure assiciated to the conditional gradient method and at each iteration uses information
                                         from the decision maker (DM). The difficulties of this approach are that the DM is not able to make
                                         decisions many times under the condition of a significant number of iterations of the nonlinear
                                         programming method. The article proposes to replace DM with an artificial neural network. Non-linear
                                         and stochastic programming methods are used to find optimal parameters of this network.


                                         Keywords 1
                                         Method, planning, rational agent, system, multi-criteria optimization, neural network


                         1. Introduction
                         In recent years, an agent-oriented approach has been successfully developed within the framework
                         of the theory of artificial intelligence. Agents are considered rational or intelligent and can form
                         multi-agent systems and act to achieve a common goal. Agent systems are studied from different
                         points of view: from the standpoint of the theory of conflict processes, information theory, social
                         psychology, software engineering, in the context of concepts of electronics. The proposed paper
                         considers the aspect of optimizing the actions of agents as part of a multi-agent system.
                             The paper [1] shows that the main types of activities related to the management of systems of
                         rational agents and individual agents are cooperation (formation of agent systems), planning and
                         coordination of agent actions, system placement and recognition. Problems of cooperation, planning
                         and coordination of agents' actions are studied in detail in the monograph [2]. Recognition tasks are
                         studied in [3 – 6], placement tasks are studied in papers [7, 8]. In [9], the concept of a rational agent
                         is defined and used.
                             To solve optimization planning problems, that is, to find optimal action plans in multi-agent
                         systems consisting of rational agents, mathematical programming methods are used, in particular
                         methods of linear, nonlinear, stochastic, discrete programming [10 – 14]. Multi-criteria optimization

                         14th International Scientific and Practical Conference from Programming UkrPROG’2024, May 14-15, 2024, Kyiv, Ukraine
                         *
                           Corresponding author.
                         †
                           These authors contributed equally.
                           ips2014@ukr.net (I. Sinitsyn); a-y-doroshenko@ukr.net (A. Doroshenko); pashko1955@gmail.com (S. Pashko)
                            0000-0002-4120-0784 (I. Sinitsyn); 0000-0002-8435-1451 (A. Doroshenko); 0000-0002-0453-4128 (S. Pashko)
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
problems and corresponding mathematical methods are important for multi-agent systems [15, 16].
In paper [16], human-machine procedures are used to solve multi-criteria problems, in which the
method of nonlinear programming is combined with the work of a DM. The difficulty of this
approach is that the DM is not able to make decisions multiple times under the condition of a
significant number of iterations of the nonlinear programming method.
   This paper considers the problem of optimal planning of actions of a multi-agent system
consisting of rational agents. Planning optimization problems and corresponding mathematical
methods for their solution are described. In multi-criteria optimization problems, it is proposed to
use an artificial neural network instead of DM, and methods of determining the optimal values of
the parameters of such a network are proposed.

2. Agents and multi-agent systems
The concept of a rational agent is widely used in the theory of artificial intelligence, economics,
game theory and has a unifying meaning, allowing to define the main tasks facing agents and multi-
agent systems, the relationships between these tasks, etc. In [9], an autonomous object that
perceives the environment with the help of sensors and influences this environment with the help
of executive mechanisms is considered an agent. The agent's program works in a computing device
and receives data from sensors, recognizes and analyzes the data, calculates the optimal strategy of
the agent's behavior, and issues commands to executive mechanisms. An agent can be a computer
program, a robot, or a person. The following definition of a rational agent is given in [9]: “For each
possible perceptual sequence, a rational agent must choose the actions that are expected to
maximize its performance, given the facts provided by the perceptual sequence and all the built-in
knowledge the agent has”.
    Usually, there are different permissible sequences of actions of an agent that lead to a goal. In
this case, it is appropriate to consider that there is a utility function defined on a set of action
sequences of an agent (or a system of agents), which takes values from a set of real numbers. A
rational agent is an agent who, for the sake of achieving a goal, uses the optimal behavior strategy,
maximizing the utility function. A multi-agent system is a system consisting of rational agents who
have a common goal and use an optimal strategy to achieve it. It can be assumed that an
optimization problem is formulated for the system and agents form a behavior strategy using the
optimal solution of this problem. Let us give examples of systems consisting of rational agents:

   •   the system of state administration bodies, enterprises, institutions and organizations of the
       country's defense-industrial complex;
   •   the system of economies of different countries developing in cooperation;
   •   a group of drones chasing a target.

    Note that the system of agents may include a central agent that performs some of the
management functions.
    In the literature, there are definitions of agents of other types that differ from rational agents. In
[2], the concept of an intelligent agent is defined, which should have the following properties:

   •   reactivity, i.e. the ability to perceive the state of the surrounding environment and act
       accordingly;
   •   proactivity, i.e. the ability to identify one's own initiative;
   •   social activity, i.e. the ability to interact with other agents to achieve a goal.

   Agents are considered autonomous. Autonomy means that the agent's behavior is determined
not only by the environment, but also to a large extent by the properties of the agent.
   The definition of an agent is called "weak" if it contains only the described features, that is,
autonomy, reactivity, proactivity, social activity [17]. If, in addition to the above signs, there are
additional ones in the definition of the agent, then the definition is called "strong". Additional
properties may include:

   •    desires, i.e. situations desirable for the agent;
   •    intentions, i.e. what needs to be done to satisfy desires or to fulfill obligations to other
        agents;
   •    goals, i.e. a set of intermediate and final goals of the agent;
   •    obligations to other agents;
   •    knowledge, i.e. a part of knowledge that does not change during the agent's existence;
   •    beliefs, i.e. a part of the agent's knowledge that can change.

    We assume that agents have the above and possibly other properties to the extent necessary to
solve problems and use the resulting solutions.
    Planning is the development of a method of action of a multi-agent system and individual agents
in the future depending on the situations that may arise, the choice of an effective method of action,
optimal allocation of resources. Centralized planning, distributed development of a centralized plan,
distributed development of a distributed plan are possible [18]. Centralized planning is performed
by a dedicated agent that has the necessary resources and information.
    Distributed development of a distributed plan means that there is no dedicated agent, agents
build individual plans interacting with each other. In this case, it is considered that the agents have
limited computing capabilities, information about the local environment, and limited
communication capabilities. Examples include groups of autonomous vehicles, mobile sensor
networks, routing in data networks, transportation systems, multiprocessor computing, energy
systems [19].
    Central planning is discussed next.

3. Optimization planning models in multi-agent systems and
   corresponding numerical methods
   Consider optimization problems of planning in multi-agent systems. Such problems can be
applied, in particular, in the process of planning the activities of the defense-industrial complex of
Ukraine. The defense-industrial complex is a system consisting of interrelated research centers,
state organizations, and industrial enterprises that produce goods for military purposes. We
consider each element of this system to be a rational agent. Suppose there is a dedicated agent (for
example, a ministry) that has the information needed for planning.
   Let the system contain m rational agents that produce several types of products. Some agents
may not produce final products while performing managerial functions. Let n be the number of
types of products, x ij denote the value of the j -th product produced by the i -th agent, let α j be
the given specific weight of the j -th product in the set of final products, x = ( x11 , x12 ,..., xmn ). It is
necessary to maximize the number of manufactured sets of final products

                                                  m

                                                  ∑x   ij
                                     min i =1               → max                                     (1)
                                     j =1,...,n   αj        x

subject to
                              n

                            ∑ a x ≤ b , i = 1,..., m, k = 1,..., l,
                             j =1
                                      ijk ij              ik                                                              (2)

                          x min
                              ≤ xij ≤ xijmax , i = 1,.., m, j = 1,..., n.                       (3)
                            ij

Here aijk is the cost of the k -th resource for the production of the j -th product of unit value at the
i -th enterprise, bik is the stock of the k -th resource at the i -th enterprise, l is the number of
types of resources, xijmin , xijmax are the minimum and maximum permissible quantities of the j -th
product produced at the i -th enterprise, 0 ≤ xijmin ≤ xijmax < ∞.
   Let X be the set of vectors x that satisfy conditions (2), (3). Using an additional variable y, we
write the problem (1) – (3) as follows:
                                        y → max,                                               (4)
                                                                  x, y
                                               m

                                           ∑x  i =1
                                                          ij
                                                               ≥ y,         j = 1,..., n,                                 (5)
                                                αj
                                                                 x∈ X.                        (6)
    Problem (4) – (6) is a linear programming problem and is solved by linear programming
methods. Note that the numbers appearing in problems (1) - (3) and (4) - (6) represent part of the
agents' knowledge and beliefs, and the objective function expresses the intentions of the system.
    Suppose, in the problem (1) – (3), instead of the objective function (1), some objective function
 f (x) is used. It is necessary to solve the problem
                                        f ( x) → max.                                         (7)
                                                                      x∈X
We consider the function f (x) to be concave (that is, convex upwards) on the set X and smooth
on some neighborhood of the set X . Smoothness means that the function is defined and has
continuous partial derivatives with respect to all variables, and an open set containing the X is
called a neighborhood of X . To solve this problem, it is advisable to use the conditional gradient
method (Frank and Wolf method) [10, 11], which consists of the following. Let's choose x 0 ∈ X . At
the s -th step ( s = 0,1,2,... ) we calculate
                                     x s = arg max ∇f ( x s ), x ,                          (8)
                                                               x∈ X

                                     ts = arg max f ( x s + t ( x s − x s )),                                             (9)
                                                   0 ≤ t ≤1
                                               x   s +1
                                                          = x s + t s ( x s − x s ).                                      (10)

Here ∇f means the gradient of the function                                            f,    a, b is the scalar product of vectors
                                                                               n
a = (a1 ,..., an ) and b = (b1 ,..., bn ), i.e. a, b = ∑ a j b j .
                                                                               j =1

   We denote by x the optimal solution of problem (7). The following inequality holds
                      ∗


                            f ( x∗ ) − f ( x s ) ≤ ∇f ( x s ), x s − x s .                                                (11)

Indeed, the concavity of the function f (x) implies an inequality
                                    f ( x∗ ) − f ( x s ) ≤ ∇f ( x s ), x∗ − x s .
Obviously,

                                         ∇f ( xs ), x∗ ≤ ∇f ( xs ), x s .
Last two inequalities imply (11).
   The partial boundaries of the sequence {x s } are the maximum points of the function f (x) on

the set X . In [10] it is proved that

                                   f ( x∗ ) − f ( x s ) = O(1/ s);                                 (12)
this estimate cannot be improved. It follows from (12) and the results of [12] that the conditional
gradient method is not suboptimal. It is advisable to use this method for solving problem (7) due to
the possibility of applying estimate (11), which allows stopping the computational process after
reaching the required accuracy, and due to its simplicity and satisfactory speed of convergence in
practice. The conditional gradient method is used in the next section as a component of multi-
criteria optimization.
    If the function f is not smooth or instead of the gradient (generalized gradient) of the function
 f its stochastic analogue is known, non-smooth optimization and stochastic programming
methods [11 – 13] are used to solve problem (7). If, instead of problem (7), a discrete programming
problem is considered, branch-and-bound methods, heuristic algorithms, random search methods
with local optimization, and polynomial approximate schemes can be used [14].
    The system of economies of different countries interacting with each other through the export
and import of goods, services, and capital can be considered a system of rational agents, provided
that the governing bodies of the countries make rational economic decisions in market conditions.
There are problems of optimizing the interaction of a separate economy with other economies and
with the world market. Corresponding mathematical models and numerical methods are considered
in works [20, 21].

4. Multi-criteria optimization and human-machine procedures
    In addition to a common goal, each rational agent of a multi-agent system can have its own
optimality criterion. The problem is to find a quality solution that satisfies all agents. Let the
extremal problem for the i -th agent have the form
                                     f i ( x) → max,                                       (13)
                                                  x∈X
      where i ∈ I , I = {1,..., m}, x = ( x1 ,..., xn ), X is a convex closed bounded set of n -dimensional
Euclidean space. We assume that the criteria are non-negative and normalized, that is, each criterion
 f i (x) is replaced by f i ( x) / f *i , where f *i = max f i ( x).
                                                         x∈X
    To solve a multi-criteria extremal problem, the convolution method can be applied, replacing
the set of criteria (13) with a single criterion. Such replacement can be performed in several ways
[15]:
   • The minimum of values α i f i (x) is maximized:
                                  minαi fi ( x)           → max.
                                   i∈I                   x∈X

   •    The weighted sum of values f i (x) is maximized:
                                    m

                                   ∑α f ( x) → max .
                                   i =1
                                          i   i
                                                         x∈ X

   •    Product is maximized:
                                    m

                                   ∏ α f ( x)
                                   i =1
                                          i   i          →
                                                         x∈ X
                                                                max .
                                                  m
Here α i are given non-negative numbers,          ∑α = 1.
                                                  i =1
                                                          i

    The method of the main criterion, the method of objective programming and the method of
concessions are described in [15]. The main criterion method consists of maximizing one selected
criterion, all others are considered bounded from below:
                                          f k ( x) → max,
                                                       x
                                                    min
                                  f i ( x) ≥ f i , i ∈ I \ {k},
                                              x∈ X.
         min
Here f i     are given numbers, k ∈ I .
   The objective programming method consists of setting the desired values of criteria f1 ,..., f m and
solving an extremal problem
                                                                1/ p
                              ⎛ m                   p⎞
                              ⎜ ∑ α i f i ( x) − f i ⎟                 →         min .
                                                                       x∈ X
                              ⎝ i =1                 ⎠
   Here p ≥ 1 is given number.
   In the concession method, the criteria are numbered in order of decreasing importance. In the first
step, the first criterion is maximized, f1* = max f1 ( x), where X 1 = X . In the second step, the set
                                                       x∈ X1
                               *
 X 2 = {x : x ∈ X 1 , f1 ( x) ≥ f − Δ1} is determined, where Δ1 is the amount of concession for the
                              1
first criterion, and the problem f 2* = max f 2 ( x) is solved. In the third step, the set
                                                       x∈ X 2
                                 *
X 3 = {x : x ∈ X 2 , f 2 ( x) ≥ f − Δ 2 } is determined, where Δ 2 is the concession amount for the
                                2
                                               *
second criterion, and the problem f3 = max f 3 ( x) is solved. The process continues until the last
                                                   x∈ X 3
m -th step.
   Consider the human-machine method of multi-criteria optimization [16]. This method uses a
non-linear programming method, for which the directions of movement and step sizes are
determined using a DM. We will consider the problem of multi-criteria optimization as follows:
                       f ( x) = u( f ( x),..., f ( x)) → max .                              (14)
                                           1                m
                                                                           x∈X
Here u is the utility function, X is a convex closed bounded set of n -dimensional Euclidean
space. We assume that the function f (x) is concave on X . The following conditions [16] are
sufficient for the concavity of the function f on the set X :

   •    the function u is concave, and the functions
                                                                       f i , i = 1,..., m, are linear;

   •    the functions u and
                                     f i , i = 1,..., m, are concave, and the function u is nondecreasing in
        each variable.

Let us assume that the conditions are fulfilled that guarantee the existence of continuous partial
derivatives of the function u ( f1 ( x),..., f m ( x)) and the validity of the rule for their calculation [22].
    To solve problem (14), we will use the conditional gradient method (8) – (10). According to the
rule for calculating partial derivatives, we have
                                                                                     s
                                                                           m
                                                                  ⎛ ∂u ⎞
                  ∇f ( x ) = ∇u ( f1 ( x ),..., f m ( x )) = ∑ ⎜⎜ ⎟⎟ ∇f i ( x s ).
                          s                s                     s

                                                             i =1 ⎝ ∂f i ⎠
                s
Here, (∂u ∂f i ) − the i -th partial derivative of u − is calculated at the point ( f1 ( x s ),..., f m ( x s )),
and the gradient ∇f i ( x s ) is calculated at the point x s . The linear function ∇f ( x s ), x , included in
                                                                       s
(8), is not fully known, since the values of (∂u ∂f i ) are not known. Note that in (8) the value of
 ∇f ( x s ), x can be multiplied by a positive number. In particular, it can be divided by any positive
                      s
coefficient (∂u ∂fi ) . (You can also choose a negative coefficient, such a case is treated similarly).
                                                                                                    s
Without loss of generality, we consider that the first coefficient (∂u ∂f1 ) is chosen as the divisor.
Let's write (8) as follows:
                                                          m
                                    x s = arg max ∑ wis ∇f i ( x s ), x ,                                         (15)
                                               x∈X       i =1


where wis =
              (∂u ∂f i )s .
              (∂u ∂f1 )s
   In (15), all quantities, except for wis , are considered known. The values of wis , showing the
relative importance of the first and i -th criteria, are determined using DM, after which optimization
(15) is performed. In [16], two ways of determining wis are proposed.
   Suppose the function u satisfies the conditions of the theorem on the total increment of a
function of several variables [22]. Let us assume that at point ( f1 ( x s ),..., f m ( x s )) criterion f1
received an increment Δ1 , criterion f i − an increment Δ i , and the values of the remaining criteria
did not change. We have
                              Δu = u ( f1 ( x s ) + Δ1 , f 2 ( x s ),..., f i −1 ( x s ),
                                                                                                        s
         f i ( x s ) + Δi , f i +1 ( x s ),..., f m ( x s )) − u( f1 ( x s ),..., f m ( x s )) = (∂u ∂f1 ) Δ1 +
                                                 s
                                                           (
                                  + (∂u ∂f i ) Δ i + o (Δ1 ) 2 + (Δ i ) 2 .       )
Let DM choose the values Δ1 , Δ i so that Δu = 0. In this case, we have approximately
(∂u ∂f1 )s Δ1 + (∂u ∂fi )s Δi = 0, that is
                                                                 Δ1
                                                     wis = −        .                                             (16)
                                                                 Δi
   The second way of calculating wis is based on the fact that the gradient of a function indicates
the direction of its fastest growth. Let all criteria except the first and i -th remain unchanged, the
first and i -th grow by amounts δ1 , δ i respectively, and the vector (δ1 ,0,...,0, δ i ,0,...,0) indicates
the direction of the fastest growth. In this case, approximately
                                                                δi
                                                     wis =         .                                              (17)
                                                                δ1
   At step (9) of the conditional gradient method, the problem is solved
                u( f1 ( x s + t ( x s − x s )),..., f m ( x s + t ( x s − x s )) )     →       max .              (18)
                                                                                      0≤t ≤1
Here, the objective function is a function of one variable t . One graph shows the curves
 f j ( x s + t ( x s − x s )), j = 1,2,..., m, on the segment [0,1]. Analyzing the constructed graphs, the
DM selects the best value of t from the segment [0,1].
   So, the DM calculates the values of Δ i or δ i , the optimal value of t in problem (18) and,
possibly, selects the starting point x 0 . All other calculations are performed by a computer program.
In [16], an example of the application of the described human-machine method of multi-criteria
optimization to the organization of the educational process at one of the university faculties is
given. Human-machine procedures can be combined not only with the conditional gradient method,
but also with other methods of nonlinear programming.
   Obviously, DM cannot always qualitatively perform the tasks assigned to it in the process of
multi-criteria optimization. In addition, the nonlinear programming method can perform a large
number of iterations, which is unacceptable for DM. Therefore, it is advisable to use an artificial
intelligence system instead of DM. In particular, it is possible to use an artificial neural network that
calculates the values of wis , using formulas (16) and (17), or calculates the gradient of the function
f in another way, and the optimal value of t in problem (18). Such networks are considered in the
next section

5. Use of artificial neural networks in optimization methods
    Artificial neural networks (hereinafter referred to as neural networks) have become widespread
                                                                                   ⎛ r            ⎞
as part of artificial intelligence systems. We consider a function y = f ⎜           ∑    w j x j
                                                                                                  ⎟ to be a
                                                                                   ⎜              ⎟
                                                                                   ⎝ j =1         ⎠
mathematical model of an artificial neuron, where the real numbers x1,..., xr are the inputs of the
neuron, a real number y is the output, real numbers of w1 ,..., wr − weighting coefficients. The
scheme of an artificial neuron is shown in Figure 1.
      A neural network combining a large number of artificial neurons is capable of solving complex
problems. We assume that the network has n inputs and one output and calculates some function
 f ( x1 ,..., xn ). Let I n = [0;1]n be the n -dimensional unit cube. The following theorem was proved in
[23].
   Theorem 1. For any integer n ≥ 2 there exist such continuous real functions ψ ij (x ), defined on
the unit segment I 1 = [0;1], that each defined on the n -dimensional unit cube I
                                                                                      n
                                                                                          continuous real
function f ( x1 ,..., xn ) is represented in the form
                    2 n +1 ⎛ n              ⎞
f ( x1 ,..., xn ) = ∑ ϕ i ⎜⎜ ∑ψ ij ( x j ) ⎟⎟,                                                    (19)
                     i =1  ⎝ j =1           ⎠
where the functions ϕi ( y) are real and continuous.


Figure 1: Scheme of an artificial neuron.


   Note that the functions ψ ij ( x j ) do not depend on the function f . Formula (19) can be
presented in the form of a neural network (Figure 2), where artificial neurons are represented by
rectangles; in this case, the weighting coefficients equal to one. So, the values of any continuous
real function f ( x1 ,..., xn ), defined on the n -dimensional unit cube I n , can be calculated using a
neural network containing 2n 2 + 3n + 2 artificial neurons in which the values of the functions
ψ ij , ϕi , f are calculated.
Figure 2: Scheme of a neural network for calculating the values of a continuous real.
          function f ( x1 ,..., xn ) .

   We call a function σ (t ) sigmoid if the condition
                                             ⎧1, t → +∞,
                                  σ (t ) → ⎨
                                             ⎩0, t → −∞
is satisfied. For example, σ (t ) = 1/(1 + e ) is a sigmoid function. Let С ( I n ) denote the space of
                                            −t


continuous functions on I n . The following theorem was proved in [24].
   Theorem 2. Let σ be any continuous sigmoid function. Finite sums of the form
                                        N
                                               (
                               G ( x) = ∑ α j σ y j , x + θ j   )                              (20)
                                        j =1

(where α j , θ j are real numbers, y j are vectors with real components, y j = ( y j1 ,..., y jn )) are
dense in С ( I n ). In other words, given any f ∈ С ( I n ) and ε > 0 there is a sum G (x) of the form
(20) for which
                               G( x) − f ( x) < ε , ∀x ∈ I n .
     Formula (20) can be represented in the form of a neural network (Figure 3), containing N + 1
artificial neurons, which are used to calculate the values of the function G. Theorems 1, 2 justify
the suitability of neural networks for calculating any continuous functions. The neural network,
which is built on the basis of Theorem 1 and accurately calculates the value of the function f ,
contains a limited number of neurons. Functions ϕ i ,ψ ij , the values of which are calculated in
neurons, can be different and not have continuous derivatives. The neural network, which is built
on the basis of Theorem 2 and approximately calculates the value of the function f , contains
artificial neurons that use a single sigmoid function σ . The number of artificial neurons in such a
neural network is not limited. If you choose a sigmoid function σ (t ), that has a continuous
derivative, then the quantity G will have continuous partial derivatives for all quantities α j ,θ j
and for all components of vectors y j . This circumstance can play an important role in the process of
finding optimal values of these parameters.


    Figure 3: Scheme of a neural network for calculating the values of the function
          G ( x1 ,..., xn ) .

   Suppose the probability distribution is given on the set X and x is a random element [25]; here
x ∈ X , X is a closed bounded set of n -dimensional Euclidean space. Let α = (α1 ,...,α N ),
                                                                                          N
                                                                                               (   )
y = ( y11 ,..., yN1 , y12 ,..., yNn ), θ = (θ1 ,...,θ N ), q(α , y,θ , x) = ∑α jσ y j , x + θ j − f ( x). Let's
                                                                                       j =1
choose the parameters α , y,θ as the optimal solution of the problem
                          F (α , y,θ ) = Eq 2 (α , y,θ , x)            →            min .              (21)
                                                                      α , y ,θ

Here E is a sign of mathematical expectation.
   Suppose that the sigmoid function σ (t ) has a continuous derivative and the conditions
sufficient for differentiation under the sign of mathematical expectation are met. We have
                             ∂
                           ∂α j
                                                      { (
                                  F (α , y,θ ) = 2E qσ y j , x + θ j ,               )}
                          ∂
                        ∂y jk
                                                 {                (
                              F (α , y,θ ) = 2 E qα j xkσ ʹ y j , x + θ j ,               )}
                           ∂
                          ∂θ j
                                                  {           (
                               F (α , y,θ ) = 2 E qα jσ ʹ y j , x + θ j ,              )}
                                       j = 1,..., N , k = 1,..., n.
Here σ ʹ(t ) is the derivative of function σ (t ). So, it is easy to calculate the vector g (α , y,θ , x), the
mathematical expectation of which is equal to the gradient ∇F (α , y,θ ) of the function F (α , y,θ ),
therefore, stochastic programming methods [11 – 13] can be used to solve problem (21). In the case
when the objective function of problem (21) is not convex, with the help of these methods we will
obtain a local minimum. In this case, you should use the method of random search with local
optimization [14], in which local optima are calculated using the stochastic programming method.
   Another approach to determining parameters α , y,θ consists in solving the minimax problem
[26]
                        F1 (α , y,θ ) = max q (α , y,θ , x)               →          min .
                                         x∈X                             α , y ,θ

   Suppose that the objective function u ( f1 ( x),..., f m ( x)) of problem (14) can be calculated by a
computer program. Let us choose f ( x) = u( f1 ( x),..., f m ( x)) and calculate the parameters α , y,θ
of function (20) using one of the described methods. We can use the appropriate neural network
instead of DM to solve the multi-criteria optimization problem (14).

6. Conclusions
    The article deals with the problem of optimal planning of actions in multi-agent systems
consisting of rational agents. Definitions of the concepts of agent, rational agent, system consisting
of rational agents are given. Properties of agents and ways of developing action plans of a multi-
agent system are considered. The focus is on centralized planning. Optimization problems of
planning and the conditional gradient method are described. Methods for solving multicriteria
optimization problems are considered, including a human-machine procedure in which the DM
takes part. It is proposed to use an artificial neural network instead of DM. Methods are proposed
for determining the optimal values of the parameters of such a network.

References
[1] S.V. Pashko, I.P. Sinitsyn, Optimal solutions in systems consisted of rational agents, Artificial
     Intelligence 2 (2023) 16–26. doi:10.15407/jai2023.02.016. (in Ukrainian).
[2] M. Wooldridge, An introduction to multiagent systems, John Wiley & Sons, Ltd, United
     Kingdom, 2009.
[3] M.I. Shlesinger, V.A. Glavach, Ten lectures on statistical and structural recognition, Nauk.
     Dumka, Kyiv, 2004. (in Russian).
[4] A.M. Gupal, I.V. Sergienko, Optimal recognition procedures, Nauk. Dumka, Kyiv, 2008. (in
     Russian).
[5] A.M. Gupal, S.V. Pashko, I.V. Sergienko, Efficiency of Bayesian classification procedure,
     Cybernetics and Systems Analysis 31 (1995) 543–554.
[6] I.V. Sergienko, A.M. Gupal, S.V. Pashko, Complexity of classification problems, Cybernetics
     and Systems Analysis 32 (1996) 519–533.
[7] S.V. Pashko, Optimal placement of multy-sensor system for threat detection, Cybernetics and
     Systems Analysis 54 (2018) 249–257. doi:10.1007/s10559-018-0026-z.
[8] S. Pashko, A. Molyboha, M. Zabarankin, S. Gorovyy, Optimal sensor placement for underwater
     threat detection, Naval Research Logistics 55 (2008) 684–699. doi:10.1002/nav.20311.
[9] S. Russell, P. Norvig, Artificial intelligence: a modern approach, 4th Edn., Pearson, Hoboken,
     NJ, 2021.
[10] B.T. Polyak Introduction to optimization, Nauka, Moscow, 1979. (in Russian).
[11] V.S. Mikhalevich, A.M. Gupal, V.I. Norkin, Methods of non-convex optimization, Nauka,
     Moscow, 1987. (in Russian).
[12] A.S. Nemirovskii, D.B. Yudin, Complexity of problems and effectiveness of optimization
     methods, Nauka, Moscow, 1979. (in Russian).
[13] Yu.M. Ermoliev, Methods of stochastic programming, Nauka, Moscow, 1976. (in Russian).
[14] C. Papadimitriou, K. Steiglitz, Combinatorial optimization: Algorithms and Complexity,
     Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1982.
[15] M.E. Kondruk, M.M. Malyar, Multi-criteria optimization of linear systems, DVNZ «Uzhgorod
     National University», Uzhhorod, 2019. (in Ukrainian).
[16] A.M. Geoffrion, J.S. Dyer, A. Fienberg, An Interactive Approach for Multi-Criterion
     Optimization, with an Application to the Operation of an Academic Department, Management
     Science 19 (1972) 357–368.
[17] V.I. Gorodetsky, M.S. Grushinsky, A.V. Khabalov, Multiagent systems: review of the current
     state of theory and practice, 1998. URL: https://www.slideshare.net/ rudnichenko/mas-
     10320580. 2015. (in Russian).
[18] E.H. Durfee, Distributed problem solving and planning, in: ECCAI Advanced Course on
     Artificial Intelligence, Springer, Berlin, Heidelberg, 2001, pp. 118–149.
[19] J. Shamma, Cooperative control of distributed multi-agent systems, John Wiley & Sons, 2008.
[20] S.V. Pashko, Optimization of Capital Investment Distribution in an Open Economy on the Basis
     of the “Input–Output” Model, Cybernetics and Systems Analysis 58 (2022) 225–232.
     doi:10.1007/s10559-022-00454-1.
[21] A.M. Gupal, S.V. Pashko, Optimizing the Capital Investment Distribution Based on a Dynamic
     Mathematical Model, Cybernetics and Systems Analysis 60 (2024) 375–382. doi:10.1007/s10559-
     024-00678-3.
[22] G.M. Fikhtengolts, Course of Differential and Integral Calculus, Vol. 1, Nauka, Moscow, 1969.
     (in Russian).
[23] A.N. Kolmogorov, On the representation of continuous functions of several variables in the
     form of superpositions of continuous functions of one variable and addition, DAN USSR 114
     (1957) 953–956. (in Russian).
[24] G. Cybenko, Approximation by Superpositions of a Sigmoidal function, Mathematics of
     Control, Signals and Systems 2 (1989) 303–314.
[25] A.N. Shiryaev, Probability, Nauka, Moscow, 1980. (in Russian).
[26] V.F. Demyanov, V.N. Malozemov, Introduction to minimax, Nauka, Moscow, 1972. (in Russian).