=Paper= {{Paper |id=Vol-3426/paper29 |storemode=property |title=The Stochastic Game Model for Solving Self-Organization Problem of the Hamilton Graph Cycle |pdfUrl=https://ceur-ws.org/Vol-3426/paper29.pdf |volume=Vol-3426 |authors=Petro Kravets,Mykola Prodaniuk,Volodymyr Pasichnyk |dblpUrl=https://dblp.org/rec/conf/momlet/KravetsPP23 }} ==The Stochastic Game Model for Solving Self-Organization Problem of the Hamilton Graph Cycle== https://ceur-ws.org/Vol-3426/paper29.pdf

The Stochastic Game Model for Solving Self-organization
Problem of the Hamilton Graph Cycle
Petro Kravets, Mykola Prodaniuk and Volodymyr Pasichnyk
Lviv Polytechnic National University, 12 Bandera Street, Lviv, 79013, Ukraine

Abstract
A new application of a stochastic game model for solving the problem of self-organization of
the Hamiltonian graph cycle is proposed. To do this, game agents are placed at the vertices of
an undirected graph, the pure strategies of which are options for choosing one of the incident
edges. All agents’ random choice of strategies forms a set of local paths that begin at each
vertex of the graph. Current player payouts are defined as loss-making functions that depend
on the strategies of neighboring players who control adjacent vertices of the graph. These
functions are formed from a penalty for choosing opposing strategies by neighboring players
and a penalty for strategies that have shortened the length of the local path.
Computer simulation of a stochastic game provided a pattern of self-organization of agents'
strategies in the form of several local cycles or a global Hamiltonian cycle into the graph,
depending on the ways in which the current losses of players are formed.
The study results can be used in practice for game solutions of NP-complete problems,
transport, and communication problems, building authentication protocols in distributed
information systems, and for collective decision-making under uncertainty.
Keywords
Self-organization, Behavioral pattern, graph, Hamiltonian cycle, stochastic agent game,
Markov recurrent method.

1. Introduction
The rational functioning of natural and artificially distributed systems is possible due to the
coordinated work of their active elements or agents. In such systems, the agent has a local behavior
strategy determined by the survival and reproduction criteria in a changing environment. In the absence
of coordination, the agents' actions will be chaotic, and their populations have no chance of survival.
Coordination of agents' actions should be aimed at forming a global strategy for the behaviors of the
system as a whole – achieving a certain balance of local goals of the system of agents, the deviation
from which becomes disadvantageous to individual agents or groups of agents [1, 2].
Coordination can be centralized (vertical) or decentralized (horizontal). Vertical coordination has
one center or hierarchically subordinate centers, and horizontal coordination is realized through the
coordination of actions directly between all agents or within small groups of agents. Hierarchical
centralized coordination requires a high level of the social organization of agents. In centralized
systems, information flows flow rapidly, mostly from top to bottom, entropy and reaction time in such
systems are minimal, but the cost of maintaining existence can be high. Decentralized coordination does
not require a high level of the social organization of agents and is more democratic since information
flows are mainly formed at the grassroots levels of the system, costs are minimized, but the entropy and
time to produce a coordinated response will be much higher than in centralized systems. Decentralized
coordination is a more natural form of coexistence of agents during their organizational evolution [3,
4].
Decentralized coordination of agents' actions is formed during their direct local interaction and
multi-round exchange of information among themselves. The result of purposeful decentralized
coordination is the unauthorized achievement of global coordination or the self-organization of agents,
which manifests itself as established behavioral patterns in the horror of the entire decentralized system.
MoMLeT+DS 2023:5th International Workshop on Modern Machine Learning Technologies and Data Science, June 3, 2023, Lviv, Ukraine
EMAIL: Petro.O.Kravets@lpnu.ua, (P. Kravets); Mykola.M.Prodaniuk@lpnu.ua (M. Prodaniuk); Volodymyr.V.Pasichnyk@lpnu.ua (V.
Pasichnyk)
ORCID: 0000-0001-8569-423X (P. Kravets); 0000-0001-9544-3792 (M. Prodaniuk); 0000-0002-5231-6395 (V. Pasichnyk)
©️ 2023 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
Self-organization is a purposeful spatiotemporal process of creating, ordering, or improving the
structure and functions of a complex open dynamic system due to its internal factors without organizing
external influence. The necessary conditions for the self-organization of complex systems are
multivariance, alternativeness, and randomness of agents' actions, guided by the process of self-learning
and adaptation to uncertainties [5–7].
The structure of self-organization is a regular (nonchaotic) distribution of structural elements of a
system or their states in time and space, which allows a holistic description of the system without
characterizing all its constituent elements. Patterns are manifested at the macro level and are a
consequence of local interactions of elements at the micro level [8].
Behavioral patterns determine the form of orderly actions of agents that arise from chaos through a
process of coordination. The behavioral pattern is a systemic, globally coordinated behavior of a group
of agents, which arises based on their local interaction during multi-step adaptive learning [9, 10].
The structural model of a decentralized system is conveniently represented as a graph, the vertices
of which denote agents, and the edges determine the possibility of observing the internal states of
neighboring agents and the local exchange of information between them. Then, depending on the
strategies of agents' behavior, self-organization patterns can manifest themselves in the form of
combinatorial entities within the graph – skeleton trees, painted vertices, Eulerian or Hamiltonian
cycles, etc. [11, 12].
In practice, those combinatorial formations on graphs whose construction problem belongs to the
class of NP-complete have been used, that is, it does not allow finding an exact solution by full
enumeration of variants for large-order graphs in an acceptable polynomial time [13,14]. One such
problem is the search for Hamiltonians on the cycle of the graph. A Hamiltonian cycle in a graph is a
continuous path that passes through all vertices of a graph only once. For example, the travelling
salesman problem (the Travelling Salesman Problem) is to find the minimal Hamiltonian cycle in the
loaded graph [14, 15].
Not every graph has a Hamiltonian cycle. Unlike Euler cycles, no reliable conditions are known as
to whether a given graph has a Hamiltonian cycle. In some cases, they are guided by statements about
the dependence of the existence of Hamiltonian cycles on the powers of vertices of the graph [14, 16].
Precise or approximate (heuristic) methods can be used to search for Hamiltonian cycles. Although
heuristic methods do not guarantee an optimal solution, they translate the problem into a polynomial
complexity class, optimizing the enumeration of possible options.
This class also includes the strode of a stochastic game described in this article for the self-
organization of the Hamiltonian cycle of an undirected graph. Its possible advantage is that thanks to
self-learning and adaptation to uncertainties, it can find the Hamiltonian cycle in deterministic and
random graphs. The disadvantages include a relatively small, power order of the rate of convergence,
due to a priori uncertainty of a distributed system.

2. Practical applications of Hamiltonian cycles
Hamiltonian graph cycles are widely used in various fields [17]. They are used to solve various
variants of the travelling salesman problem for optimal routing [18–22], modelling assemblies and
genomes according to its selected fragments [23], identification by fingerprints [24], communication
management [25], construction of cryptographic authentication protocols with zero knowledge
disclosure [26] and others.
Zero-knowledge disclosure is an interactive method by which one party (provides evidence) can
prove to another party (verifying evidence) that it knows some meaning (statement, object, or secret
key) without telling the party. This method was first proposed in 1985 by authors S. Goldwasser, S.
Micali, and C. Rackoff in “The Complexity of Knowledge of interactive evidence systems”.
Zero-knowledge proof has the following properties:
1. Completeness: if the statement 𝑥 is true, then A will convince B of this.
2. Correctness: if the statement 𝑥 is false, even a dishonest A cannot convince, except for 𝑥 a very
small probability.
3. Zero Disclosure: If a statement 𝑥 is true, then anyone even dishonest A will know nothing,
except the fact, that the statement 𝑥 is true.
Let a graph consisting of n-vertices numbered 1, 2 , .., 𝑛 and it has a Hamiltonian cycle. Then,
enumerating all permutations in a symmetric group, we can be found in the Hamiltonian cycle. Since
then, such a method of complete enumeration of options can hardly be realized in a reasonable time. It
is known that from luck finding the Hamiltonian cycle in a graph is NP-complete.

3. Overview of methods for finding Hamiltonian graph cycles

The search for the Hamiltonian cycle is a special case of the travelling salesman problem, which
finds the shortest Hamiltonian path of the loaded graph.
Combinatorial nodes of the search for Hamiltonian cycles of the graph are divided into those that
provide exact solutions, and heuristics, which give approximate solutions [27, 28].
The methods of the first group are based on a complete enumeration of options or on deterministic
calculations and can be applied to problems of relatively small dimensions. Heuristic methods use some
improvements, and intuitive guesses to reduce the number of enumerated options. Some of them are
probabilistic methods that rely on a random selection of options with the ability to select and memorize
better solutions. Heuristic methods provide an approximate optimal solution to the problem in an
acceptable polynomial time.
The formulation of the travelling salesman problem as a linear programming problem was first
performed by N. Deo and S. L. Hakimi in 1965.
Let [𝑐𝑖,𝑗 > 0|𝑖, 𝑗 = 1. . 𝑛] be a matrix of weights (or distances) of edges of order graph 𝑛, and
[𝑥𝑖,𝑗 |𝑖, 𝑗 = 1. . 𝑛] be a matrix of displacements between vertices of the graph. The element 𝑥𝑖,𝑗 = 1, if
an edge between vertices 𝑖 and 𝑗 (𝑖 ≠ 𝑗) is included in the Hamiltonian path, and 𝑥𝑖,𝑗 = 0 – otherwise.
The traveling salesman’s tasks can be formulated as follows.
The objective function that minimizes the length of the Hamiltonian cycle is:

𝑓(𝑥) = ∑𝑛𝑖=1 ∑𝑛𝑗=1 𝑐𝑖,𝑗 𝑥𝑖,𝑗 → 𝑚𝑖𝑛. (1)
𝑥
System of restrictions:
∑𝑛𝑖=1 𝑥𝑖,𝑗 = 1 (1 ≤ 𝑗 ≤ 𝑛) there can be only one entrance to each vertex;
∑𝑛𝑗=1 𝑥𝑖,𝑗 = 1 (1 ≤ 𝑖 ≤ 𝑛) there can be only one exit from each vertex;
∑𝑖∈𝑆 ∑𝑗∉𝑆 𝑥𝑖,𝑗 ≥ 1 (𝑆 ≠ ∅, 𝑆 ⊂ {1, . . . , 𝑛}) is the exclusion of isolated cycles that together cover all
vertices of the graph, there must be only one (Hamiltonian) cycle; for any subset of vertices 𝑆, there
exists at least one arc that leads to other vertices of the graph.
𝑥𝑖,𝑗 ∈ {0,1} (1 ≤ 𝑖, 𝑗 ≤ 𝑛) is the desired matrix, the elements of which denote the Hamiltonian path.
The formalization of the problem (1), although it provides an exact solution by integer programming
methods, for example, by the method of branches and boundaries, is cumbersome, contains many
variables and constraints, and requires many calculations. A number of other methods can be used to
accurately solve the traveling salesman problem, for example, dynamic programming, Lagrange
multipliers, the modified method of branches and boundaries (Little's method), clipping planes
(Danzing's-Falkerson-Johnson method).
Heuristic methods provide a solution close to optimal for large-dimensional problems within
acceptable time limits. These include methods of nearest neighbor, greedy, insertion, local optimization
of k-opt to improve the initial solution, Lin-Kernighan, decomposition, and cross-linking [29], elastic
network, artificial neural networks, genetic, ant, and others.
Similar exact and approximate methods are used to solve the problem of finding Hamiltonian cycles
of an unloaded graph.
Accurate methods include brute force, backtracking, algebraic method, Roberts-Flores method,
linear programming, integer programming (Gomori, branch and boundary method), dynamic
programming and others.
Since the problem of determining Hamiltonian cycles is NP-complete, the application of the brute
force method can only be used for small-order graphs.
To speed up the enumeration of options, you can use the backtracking method (search with return),
which excludes from consideration a significant number of options by one check, building a decision
tree and traversing it in depth [30].
First, select an arbitrary vertex of the graph, for example, the first and one of the edges incidents to
it, along which we proceed to the next vertex. We remember the vertices passed. Suppose that the k -
th vertex of the loop has already been found. If is equal to the number 𝑛 of vertices of the graph and
there is an edge that connects the 𝑘 -th vertex to the first, then the loop is found. If there are edges
extending from the 𝑘 -th vertex to the still unviewed vertices, then we include one of them in the solution
and continue the search from it. If such edges do not exist, then go back one step to the (𝑘 − 1)-th vertex
and continue the search. So, all the vertices of the Hamiltonian cycle will be found.
In [31] a new method and corresponding polynomial algorithm for solving the problem of finding
the Hamiltonian cycle of a graph are proposed. Based on a given graph, another graph of the shortest
paths is constructed. To construct a graph of shortest paths, Dijkstra's algorithm is used. The search for
the Hamiltonian cycle in the graph is reduced to the search for a closed path in the shortest path graph.
The enumeration space for solutions to a problem consists of solutions that are constructed from each
vertex of the graph in the shortest path graph.
An algebraic method based on multiple symbolic multiplications of a modified adjacency matrix 𝐵
of order 𝑛 (single elements of an adjacency matrix are replaced by the literal notation of graph vertices)
by a matrix of sums of internal (without extreme vertices) products of notation of vertices of nonzero
chains between vertices of a graph: 𝑃𝑖+1 = 𝐵 ⋅ 𝑃𝑖 [11]. Multiplication is performed until a matrix Pn−1
containing all Hamiltonian chains between all pairs of vertices is computed. Hamiltonian cycles are
obtained from those chains whose edges 𝑃𝑛−1 or vertices are connected by arcs. The disadvantage of
this method is that for large orders of the graph, each element of the matrix 𝑃𝑖 will consist of many
constituents’ sub-elements, which require large amounts of memory to store.
Unlike the algebraic method, which ensures obtaining all existing Hamiltonian cycles of a graph,
the Roberts-Flores enumeration method works with only one path, which is extended until a
Hamiltonian cycle is found, or it turns out that this path cannot lead to a Hamiltonian cycle [11]. After
that, the path is modified in some systematic way and the cycle search continues for it. By reducing the
number of computations required, this method is efficient for large graphs.
For practical applications, heuristic methods for finding Hamiltonian cycles are more effective,
which reduces the complete enumeration of options. These include the nearest neighbor, greedy,
annealing simulations, banned search, Hopfield's artificial neural network, evolutionary, genetic, ant
colony, and others.
In the context of self-organization, those heuristic methods that have the property of self-learning
are important. Among these, it is worth noting the methods of neural networks, genetics, ant colony,
and stochastic play, which have polynomial complexity.
The problem of finding the Hamiltonian cycle can be solved using the Hopfield neural network,
which implements learning without a teacher [32]. To do this, the conditions of problem (1) must be
translated into parameters of connections between neurons.
For the Hopfield network to determine the Hamiltonian cycle, the following requirements must be
imposed:
1. The neural network should consist of 𝑁 = 𝑛 × 𝑛 neurons, which can be considered as square
matrix of 𝑛 rows and 𝑛 columns, where 𝑛 is the order of the graph.
2. There are connections between all pairs (𝑘, 𝑙) of neurons to which weightings are attributed 𝑊𝑘,𝑙 .
3. Not all grid weights can be negative at the same time.
4. The active neuron in each column specifies the corresponding vertex of the graph (or another
route city for the traveling salesman problem).
5. The network response must contain only one active neuron in each row and each column. For this,
the network weights must be constructed so that each neuron interferes with the activation of other
neurons in its row and in its column.
6. To minimize the path length, it is necessary that the neuron in the 𝑗 -th column the more actively
interferes with the activation of neurons in the (𝑗 + 1) -th and (𝑗 − 1) -th columns, the greater the
distance between them (required to solve the traveling salesman problem).
All these conditions are satisfied by the following formula for calculating the weight between the
neuron corresponding to the x -city (row) at the 𝑖 -th position (column) and the neuron corresponding
to the y -city (row) at the j -th position (column):
𝑊𝑘,𝑙 = 𝑊𝑥𝑖,𝑦𝑗 = −𝐴𝛿𝑥𝑦 (1 − 𝛿𝑖𝑗 ) − 𝐵𝛿𝑖𝑗 (1 − 𝛿𝑥𝑦 ) − 𝐶𝑑(𝑥, 𝑦)(𝛿𝑖,𝑗+1 + 𝛿𝑖,𝑗−1 ) + 𝐷,
where 𝑘 = (𝑥, 𝑖), 𝑙 = (𝑦, 𝑗) are the coordinates of the connections between the neurons of the matrix;
𝐴, 𝐵, 𝐶, 𝐷 are some constants; 𝑑(𝑥, 𝑦) is the distance between cities 𝑥 and 𝑦; 𝛿𝑥𝑦 is the Kronecker
symbol taking the value 1 if 𝑥 = 𝑦 and the value 0 is otherwise. As it is easy to see, the first term is
equal − A for all connections in the same line (𝑥 = 𝑦), except the connection of the neuron with itself
(at 𝑖 = 𝑗). The second term is equal −B for all relationships in the same column (𝑖 = 𝑗), except the
relation to itself (𝑥 = 𝑦). The third term is proportional to the distance between cities x and y if these
cities are adjacent in the route (𝑖 = 𝑗 − 1 or 𝑖 = 𝑗 + 1).
Starting from the initial random state, the Hopfield network can provide a suboptimal solution to the
problem (for the traveling salesman problem, the resulting cycle may differ from the optimal one). The
experiment can be conducted several times and choose the best solution.
Heuristic nodes based on genetic algorithms and models are the evolutionary process of natural
reflection, inheritance, and mutation. Each Hamiltonian cycle is treated as a chromosome. At the initial
stage, there is a set of such chromosomes (the initial population), and at each subsequent cycle, a new
one is produced from the existing population by pairwise crossing and mutations [27, 33].
A simplified genetic algorithm consists of the following steps:
1. Creation of an initial population with 𝑁 chromosomes of length 𝑛 + 1, where 𝑛 is the order of
the graph.
2. Calculation of fitness functions (route lengths) 𝑆𝑖 (𝑖 = 1 … 𝑁) for all individuals of the population.
3. Realization of the original selection of the parents of chromosomes (with the best value of 𝑆𝑖 ),
they’re pairwise crossing, random mutation, and the formation of a new generation of chromosomes.
4. If at least one Hamiltonian path is found, then go to step 5, otherwise return to step 2.
5. Choose the best solution found to the problem (for a loaded graph in the traveling salesman
problem).
The genetic algorithm allows you to get more than one solution in a single application and select the
best one. You can run several runs of the algorithm and calculate the average value of the best results.
As the graph order increases, the probability of obtaining an optimal solution decrease.
The ant algorithm is based on the behavior of an ant colony and simulates the evaporation of ovation
of pheromones. Ants find their way between an anthill and a food source by the smell of pheromones
they leave behind. The more ants use the same path, the higher the concentration of pheromones on it.
This “ant logic” allows you to choose a shorter path between the end points of the route [27, 34].
For each ant, the transition from point 𝑖 to point 𝑗 is determined by its memory 𝑀 (the list of points
that can still be visited), the visibility 𝜇𝑖,𝑗 between points (the presence of an edge 𝑒𝑖,𝑗 in the graph),
and the pheromone trail 𝜏𝑖,𝑗 . The selection of the next vertex of the graph is carried out with probability.
𝛼 𝛽
𝜇𝑖,𝑗 𝜏𝑖,𝑗
𝑝 = 𝜒(𝑗 ∈ 𝑀) ⋅ 𝛼 𝜏 𝛽 ,
∑𝑘∈𝑀 𝜇𝑖,𝑘 𝑖,𝑘

where 𝜒(𝑗) ∈ {0,1} is the indicator function of the event; 𝛼, 𝛽 are the weight parameters.
For the considered neural, ant, and genetic algorithms, it is problematic to choose the initial values
of the parameters that will ensure their convergence to the optimal solution.
Even though the described heuristic methods provide the search for an approximate optimal
Hamiltonian cycle, their methodological value is in demonstrating different implementations of self-
learning of active systems for solving NP-complete problems without the need to use classical
algorithms. Such methods implement “soft” calculations that do not require traditional programming
and can be easily adapted to solve other problems. The stochastic game method also has similar
properties.
In this article, we propose a new application of the stochastic game method [35] for the self-
organization of Hamiltonian cycles of an unloaded graph.
4. The purpose of the work
The aim of the work is to solve the stochastic game problem of self-organization of strategies for
constructing a Hamiltonian cycle of an unloaded undirected graph. Comprehension of the goal is
provided using the adaptive method in solving a stochastic game, proper adjustment of its parameters,
planning a computer experiment, developing a software model and a stochastic game, analyzing the
results and making recommendations for their practical application.

5. Formulation of the game problem of finding Hamiltonian cycles

Let the undirected graph 𝐺 = (𝑉, 𝐸) be given by a finite set of vertices V and edges E .The graph
is ordered, that is, it’s edges is numbered in ascending order, for example, clockwise. We assume that
the graph is simple (has no loops and multiple edges) and is connected (does not contain isolated
vertices). Furthermore, given the possibility of covering the cycle of all vertices without loss of
generality, it can be assumed that the minimum degree of vertices of a deterministic graph is greater
than 1 (graph without leaves).
At each vertex of the graph, we place the game agent 𝐴𝑖 , who can choose one of the incident vertices
of 𝑖 ∈ 𝑉 edges belonging to the local set 𝐸𝑖 ⊆ 𝐸. To do this, each agent with a number 𝑖 ∈ 𝑉 has 𝑁𝑖 ≥
2 pure strategies 𝑋𝑖 = (𝑥 𝑖 [1], 𝑥 𝑖 [2], … , 𝑥 𝑖 [𝑁𝑖 ]), where 𝑥 𝑖 ∈ 𝐸𝑖 is the edge number and the 𝑁𝑖 value is
determined by the power of the corresponding vertex. Hereinafter, superscript is not a power of a
number, except for symbols with a minus sign.
The variants of the possible choice can be given by an oriented graph of players' strategies. Let the
arcs coming out of the vertex with the number 𝑖, denote the strategies of the i -th player regarding the
choice of one of the neighboring players, and those arcs that enter the 𝑖 -th vertex denote the dependence
of the winnings or losses of the 𝑖 -th player on the strategies of neighboring players. The correspondence
of given undirected graph and formed on its basis-oriented graph of strategies is shown in Figure 1.

A1
1

4 A4

2 3 A2 A3

The given graph Graph of strategies

Figure 1: Correspondence of graphs

The choice of pure strategies is made by the players independently and synchronously at discrete
moments in time 𝑛 = 1,2, …. The choice made is estimated by the current loss 𝜁𝑛𝑖 , which is formed
depending on how much the agent's action led to an approach to the target solution of 𝑛 = 1,2, … the
problem.
To do this, each game agent 𝑖 ∈ 𝑉 makes an independent choice of one of 𝑁𝑖 own pure strategies
𝑥𝑛 = 𝑥 𝑖 ∈ 𝑋𝑖 according to conditional probabilities.
𝑖

𝑝𝑛𝑖 (𝑗) = 𝑃{𝑥𝑛𝑖 = 𝑥 𝑖 (𝑗)|𝑥𝑡𝑖 , 𝜁𝑡𝑖 (𝑡 = 1,2, … , 𝑛 − 1)}, 𝑗 = 1. . 𝑁𝑖 ,

where {𝑥𝑡𝑖 |𝑡 = 1,2, … , 𝑛 − 1} is the background of the strategies chosen by the player with the
number 𝑖; {𝜁𝑡𝑖 |𝑡 = 1,2, … , 𝑛 − 1} is the background of the losses received for this.
The probability set 𝑝𝑛𝑖 = (𝑝𝑛𝑖 (1), 𝑝𝑛𝑖 (2), … , 𝑝𝑛𝑖 (𝑁𝑖 )) ∀𝑖 ∈ 𝑉 specifies the vectors of mixed
strategies of players who take values 𝑝𝑛𝑖 ∈ 𝑆𝑁𝑖 on 𝑁𝑖 -dimensional unit simplexes:
𝑁
𝑆𝑁𝑖 = {𝑝 |∑𝑗=1
𝑖
𝑝(𝑗) = 1; 𝑝(𝑗) ≥ 0 (𝑗 = 1. . 𝑁𝑖 )}.

To select pure strategies {𝑥𝑛𝑖 }, a random mechanism is used, built on a discrete dynamic distribution,
which is given by vectors of mixed strategies 𝑝𝑛𝑖 :

𝑥𝑛𝑖 = {𝑥 𝑖 (𝑙)|𝑙 = 𝑎𝑟𝑔 𝑚𝑖𝑛 ∑𝑙𝑘=1 𝑝𝑛𝑖 [𝑘] > 𝜔 (𝑘, 𝑙 = 1. . 𝑁𝑖 )}, (2)
𝑙

where 𝜔 ∈ [0, 1] is a real random variable with uniform distribution.
After completing the selection of pure strategies by all players, they receive random current losses
𝑉
𝜁𝑛𝑖 = 𝜁𝑛𝑖 (𝑥𝑛 𝑖 ), whose values are a function of the combined strategies 𝑥 𝑉𝑖 ∈ 𝑋 𝑉𝑖 = × 𝑋𝑗 of
𝑗∈𝐷𝑖
neighboring players from subsets 𝑉𝑖 ⊆ 𝑉 (𝑉𝑖 ≠ ∅) ∀𝑖 ∈ 𝑉. It is assumed that current losses have a
constant mathematical expectation and a limited second point, which are not known to agents a priori.
Depending on the selected criteria and initial parameters of the game method, several autonomous
loops or one global loop may occur in the graph.
To create autonomous disjoint cycles of a graph, the condition must be satisfied that the choice of
an edge in the direction of the 𝑖 -th agent (∀𝑖 ∈ 𝑉) can be carried out only by one of its neighbors agent
from the set 𝑉𝑖 .
𝑗
Let 𝑘𝑛𝑖 = ∑𝑗∈𝑉𝑖 𝜒(𝑥𝑛 → 𝑥𝑛𝑖 ) be the current number of neighboring agents whose strategies are
directed to the vertex of the graph where the 𝑖 -th agent is located (in the direction of the 𝑖 -th agent).
The heterogeneity of directing strategies within a local subset Vi is considered by the difference penalty:
𝑗
𝜉𝑛𝑖 [1] = 𝐶𝑖−1 ∑𝑗∈𝑉𝑖|𝑘𝑛𝑖 − 𝑘𝑛 |, (3)

where 𝜉𝑛𝑖 [1] ∈ 𝑅+1 is a real number; 𝐶𝑖 = |𝑉𝑖 | is the number of agents of the local subset 𝑉𝑖 .
This criterion provides the possibility of forming cycles of movement of agents on the graph.
To form a Hamiltonian cycle, each agent must choose a strategy (incident to its vertex edge) to
maximize the length of the local path emanating from the agent-controlled vertex of the graph. Let 𝑟𝑛𝑖
be a set of vertices (or sequence of numbers of vertices) that defines such a path with a root vertex with
a number 𝑖, formed at the current time n by the strategies (directions of movement) 𝑥𝑛𝑖 of players. The
path 𝑟𝑛𝑖 consists of the values 𝑚 ≔ 𝑥𝑛𝑚 determined by the players' pure strategies 𝑥𝑛𝑚 . The recurrent
procedure for determining the current path that starts at the vertex with number 𝑖, will be:
{𝑚}∩𝑟 𝑖 (𝑡−1)≠∅
𝑟𝑛𝑖 (𝑡) = 𝑟𝑛𝑖 (𝑡 − 1) ⋃𝑚=𝑖 𝑛 {𝑚| 𝑚 ≔ 𝑥𝑛𝑚 }, ∀𝑖 ∈ 𝑉, (4)
where 𝑟𝑛𝑖 (0) = ∅.
The notation of the operation of combining sets into (4) should be read so that this operation is
performed repeatedly, starting with the value 𝑚 = 𝑖 until the element 𝑟𝑛𝑖 reappears during the formation
of the path 𝑚 ∈ 𝑟𝑛𝑖 (𝑡 − 1). For a connected graph with vertex powers greater than 1, this condition will
always hold – following the players' strategies from the 𝑖 -th vertex (∀𝑖 ∈ 𝑉), entry into a local loop or
a global Hamiltonian cycle will occur. The minimal local loop covers two edge-connected vertices of
the graph and is formed by the opposite strategies of two neighboring players. The maximum cycle
covers all vertices of the graph and is one of the Hamiltonian cycles.
The deviation of the length of the local path from the maximum possible is considered by the penalty:

𝜉𝑛𝑖 [2] = 𝐶 −1 (𝐶 − 𝑆(𝑟𝑛𝑖 )), (5)
where 𝜉𝑛𝑖 [2] ∈ 𝑅+1 is a real number; 𝐶 = |𝑉| is a number of game agents; 𝑆(𝑟𝑛𝑖 ) = |𝑟𝑛𝑖 | is the length of
the local path (the number of vertices of the graph on the way to entering the cycle), starting at the 𝑖 -
th vertex (0 ≤ 𝑆(𝑟𝑛𝑖 ) ≤ 𝐶).
This criterion approximates the length of the path 𝑟𝑛𝑖 , leaving the vertex 𝑖, to the number of vertices
𝑉 of the graph 𝐺 (to the length of the Hamiltonian cycle).
Penalties (3) and (5) can be used individually or comprehensively. The total current losses of players
are calculated after the selection of moving options 𝑥𝑛𝑖 ∀𝑖 ∈ 𝑉 is completed:
𝜁𝑛𝑖 = 𝜆𝜉𝑛𝑖 [1] + (1 − 𝜆)𝜉𝑛𝑖 [2], (6)
where 𝜆 ∈ [0,1] is the weighting factor that determines the share of penalty criteria in the formation of
agents' losses.
Due to the random selection of pure strategies, current losses 𝜁𝑛𝑖 ∀𝑖 ∈ 𝑉 are random variables with
a priori unknown stochastic characteristics. To formulate criteria for player behavior, time-averaged
losses are used:
1
𝛧𝑛𝑖 = ∑𝑛𝑡=1 𝜁𝑡𝑖 , 𝑛 = 1,2, … , ∀𝑖 ∈ 𝑉 . (7)
𝑛

Strategies of players should be aimed at minimizing their own functions of average losses (7):
𝑙𝑖𝑚 Ζ𝑖𝑛 → 𝑚𝑖𝑛
𝑖
∀𝑖 ∈ 𝑉. (8)
𝑛→∞ 𝑥𝑛
The stochastic game of finding a Hamiltonian cycle is that each player 𝑖 ∈ 𝑉 must learn to choose
pure strategies {𝑥𝑛𝑖 } (2) at points in time 𝑛 = 1,2, … based on the observation of locally determined
current losses {𝜁𝑛𝑖 } (6) so as to ensure that the system of criteria (8) is met.
The method of forming a sequence of strategies {𝑥𝑛𝑖 } ∀𝑖 ∈ 𝑉 (𝑛 = 1,2, …) of stochastic game will
determine the fulfillment of one of the conditions of collective optimality, for example, Nash, Pareto or
another [36].
The learning process of a stochastic game, which leads to self-organization of strategies for moving
game agents, is evaluated by the following characteristics.
1. The system function of average losses, which is the averaging of individual functions of average
losses of players (7):
𝛧𝑛 = 𝐶 −1 ∑𝑖∈𝑉 𝛧𝑛𝑖 . (9)
2. The strategy coordination coefficient, which is the relative number of coordinated strategies of
the players:
𝐾𝑛 = (𝑛𝐶)−1 ∑𝑛𝑡=1 ∑𝑖∈𝑉 𝜒(|𝜁𝑖𝑡 | ≤ 𝛿), (10)
where 𝜒(∗) ∈ {0,1} is the indicator function of the event: if the condition holds, then 𝜒(∗) = 1,
otherwise – 𝜒(∗) = 0; 0 < 𝛿 << 1 is a small positive real number.
The self-organization of the Hamiltonian cycle will be indicated by a decrease in the function 𝛧𝑛 ≥
0 of system losses and an increase in the coordination coefficient 𝐾𝑛 ∈ [0,1] of agents' strategies.

6. Method for solving a stochastic game.

Formation of sequences {𝑥𝑛𝑖 } with the desired properties will be performed using the recurrent
method in changing the vectors of mixed strategies [37, 38]:
𝑖 𝑁𝑖
𝑝𝑛+1 = 𝜋𝜀𝑛+1 {𝑝𝑛𝑖 − 𝛾𝑛 𝑅(𝑝𝑛𝑖 , 𝑥𝑛𝑖 , 𝜁𝑛𝑖 )}, (11)
is a projector on unit  -simplex 𝑆𝜀 𝑖 ⊆ 𝑆𝑁𝑖 [37]; 𝛾𝑛 is a monotonically descending
𝑁 𝑖 𝑁
where 𝜋𝜀𝑛+1
sequence of positive values, which regulates the step size of the method; 𝑅 is a method step; 𝜀𝑛 is a
monotonically decreasing sequence of positive quantities that regulates the rate of expansion of 𝜀 -
simplex.
The change in the elements of the vector of mixed strategies is constructed in such a way that when
choosing a strategy 𝑥𝑛𝑖 (𝑗), the element 𝑝𝑛𝑖 (𝑗) decreases in proportion to the amount of the current loss
𝜁𝑛𝑖 . The other elements of the vector of mixed strategies do not change or grow proportionally of 𝜁𝑛𝑖 .
After recalculating the vectors of mixed strategies, they are normalized by the method of projection on
the unit  -simplex. As a result, smaller displacements of the vectors of mixed states on the unit  -
simplex.
For theses in the recurrent method, we use the results of the theory of stochastic approximation [38].
For this, suppose that the mathematical expectation of random losses 𝑀{𝜁𝑛𝑖 (𝑥)} = 𝑙𝑖 (𝑥) is constant for
all 𝑥 ∈ 𝑋 = ⊗ 𝑋𝑖 . Then the average loss function of the matrix game is calculated as:
𝑖∈𝑉
𝐿𝑖 (𝑝𝐷𝑖 ) = ∑𝑥 𝑉𝑖 ∈𝑋𝐷𝑖 𝑙𝑖 (𝑥 𝑉𝑖 ) ∏𝑗∈𝑉𝑖;𝑥 𝑗∈𝑥 𝑉𝑖 𝑝 𝑗 (𝑥 𝑗 ) (12)

where 𝑝 𝑉𝑖 ∈ 𝑆 𝑉𝑖 = ∏𝑗∈𝑉𝑖 𝑆𝑁𝑗 ; 𝑝 𝑖 ∈ 𝑆𝑁𝑖 .
The goal of the players is to minimize their average loss functions for mixed strategies 𝑝 𝑖 :
𝐿𝑖 (𝑝𝑉𝑖 ) → 𝑚𝑖𝑛
𝑖
.
𝑝
Let the expectation of the motion vector of method (11) be the gradient of the mean loss function
(12):
𝑀{𝑅(𝑝𝑛𝑖 , 𝑥𝑛𝑖 , 𝜁𝑛𝑖 )} = 𝛻𝑝𝑖 𝐿𝑖 (𝑝𝑉𝑖 ).
Considering that
𝜁𝑛𝑖
𝛻𝑝𝑖 𝐿𝑖 = 𝑀 { 𝛵 𝑖 )𝑝𝑖 𝑒(𝑥𝑛𝑖 )|𝑝𝑛𝑖 = 𝑝𝑖 },
𝑒 (𝑥𝑛 𝑛

where 𝑒(𝑥𝑛𝑖 ) is the unit vector–indicator of choice of pure strategy 𝑥𝑛𝑖 ∈ 𝑋𝑖 , based on stochastic
approximation we obtain a gradient method for solving the game problem:

𝑁 𝜁 𝑖 𝑒(𝑥 𝑖 )
𝑖
𝑝𝑛+1 𝑖
= 𝜋𝜀𝑛+1 {𝑝𝑛𝑖 − 𝛾𝑛 т𝑛 𝑖 𝑛 𝑖 }. (13)
𝑒 (𝑥𝑛 )𝑝𝑛
The parameter 𝛾𝑛 reduces the step size of the method to achieve optimal collective solutions of the
game: ‖𝑝𝑛𝑖 − 𝑝∗𝑖 ‖ → 0, and the parameter  n extends the  -simplex: 𝑆𝜀𝑛+1
𝑁𝑖
→ 𝑆𝑁𝑖 .
The values of these parameters can be calculated as follows:
𝛾𝑛 = 𝛾𝑛−𝛼 , 𝜀𝑛 = 𝜀𝑛−𝛽 , (14)
where 𝛾 > 0, 𝛼 ∈ (0,1], 𝜀 > 0, 𝛽 > 0.
The convergence of stochastic game strategies to collective-optimal values 𝑝∗𝑖 is determined by the
ratios of parameters  n and  n (14), which must meet the fundamental conditions of the stochastic
approximation [37, 38].
As shown in [39], to ensure the root mean square (rms) convergence of the game method (13) to the
Nash point in the positive environment, the following relations must be fulfilled:
0 < 𝛽 < 𝛼 < 1. (15)
The theoretical order of the asymptotic rate of convergence of method (13) is 𝑛−𝜃 , where 𝜃 =
1
𝑚𝑖𝑛{ 𝛽, 𝛼 − 𝛽, 1 − 𝛼} is the order parameter. The maximum value of parameter 𝜃 = is reached for
2
1
𝛼−𝛽 = .
2
1 1
The stochastic game begins with untrained mixed agent strategies: 𝑝0𝑖 = ( , … , ) ∀𝑖 ∈ 𝑉. In
𝑁𝑖 𝑁𝑖
moments of time 𝑛 = 1, 2, … mixed strategies are dynamically rearranged according to (13) for adaptive
selection of pure strategies.
One step of repeating stochastic play is that at a point in time 𝑛 each player 𝑖 ∈ 𝑉 chooses a pure
strategy 𝑥𝑛𝑖 (2) and by time 𝑛+1 receives a current loss 𝜁𝑛𝑖 (6), which is used to calculate the new mixed
𝑖
strategy 𝑝𝑛+1 (13).
The stochastic agent game implements adaptive learning by trial and error, which requires a
significant number of trials to find the desired solution. The learning process can be significantly
accelerated by using a computer implementation of a stochastic game with the appropriate adjustment
of its parameters.

7. A game algorithm for finding the Hamiltonian cycle of a graph.

Step 1. Set initial values for parameters:
𝐶 = |𝑉 | – the number of players, which is equal to the number of vertices of the graph;
𝑀𝐶×𝐶 – a matrix of adjacencies of the graph;
𝑁𝑖 – the number of pure strategies of the 𝑖 -th player, which is equal to the power of the 𝑖 -th vertex
of the graph;
𝑋𝑖 = (𝑥 𝑖 [1], 𝑥 𝑖 [2], … , 𝑥 𝑖 [𝑁𝑖 ]), 𝑖 = 1. . 𝐶 – vectors of players' pure strategies, where is the number
of the edge incident to the vertex of the graph;
𝑖 1 1
𝑝0 = ( , … , ), 𝑖 = 1. . 𝐶 – initial mixed player strategies.
𝑁𝑖 𝑁𝑖
𝛾 > 0 – a parameter of the learning step;
𝛼 ∈ (0,1] – an order of the learning step;
𝜀 – a parameter of 𝜀 -simplex;
𝛽 > 0 – an order of the expansion rate of 𝜀 -simplex;
𝜆 ∈ [0,1] – a weighting factor that determines the share of penalty criteria in the formation of player
losses.
𝑛 = 0 – an initial time moment;
𝑛𝑚𝑎𝑥 – a maximum number of the method steps.
Step 2. Select action options 𝑥𝑛𝑖 ∈ 𝑋𝑖 , 𝑖 = 1. . 𝐶 according to (2).
Step 3. Get the value of current losses 𝜁𝑛𝑖 according to (6).
Step 4. Calculate the parameter values 𝛾𝑛 , 𝜀𝑛 according to (14).
Step 5. Calculate the elements of the mixed strategy vectors 𝑝𝑛𝑖 , 𝑖 = 1. . 𝐶 according to (13).
Step 6. Calculate the self-organization characteristics of the stochastic game 𝑍𝑛 according to (9) and
𝐾𝑛 according to (10).
Step 7. Set the next time moment 𝑛: = 𝑛 + 1.
Step 8. If 𝑛 < 𝑛𝑚𝑎𝑥 , then go to step 2, otherwise the algorithm stops.

8. Results of a computer experiment

First, consider an undirected full-connected graph 𝐺 = (𝑉, 𝐸) without loops. In the program, the
graph is conveniently specified either by the matrix of adjacencies or by the incident matrix. In each
vertex of the graph, we place one agent. In this case, the count will control 𝐶 = |𝑉| the agents. In a
repetitive stochastic game, one of the agents can knockout and rat one of the incident edges of the graph.
Linked by strategies with the combined choice of several agents, forms a local path. The task of the
game is the adaptive fusion of local paths of agents into one global Hamiltonian cycle.
To solve a stochastic game, we use the stochastic game method (13) with the following parameters:
0.999
𝜆 = 0.5, 𝛾 = 1, 𝜀 = , 𝛼 = 0.5, 𝛽 = 0.25. In addition to the values of parameters satisfying
𝑁𝑖
condition (15), for a stochastic game to converge, it is necessary that the graph has a Hamiltonian cycle.
The number of vertices of the graph and the powers of its vertices will significantly affect the order
of the rate of convergence of the graph and the rate of convergence of the game method. The functions
of average player losses 𝛧𝑛 (9) and strategy coordination ratio 𝐾𝑛 (10) characterize the course of a
stochastic game of agent movement. Graphs of these functions for different values of order 𝐶 = |𝑉 |
(number of vertices) of a full-connected graph are shown in Figure 2 on a logarithmic scale. The image
is bounded by the coordinates of the rectangular output area of the graphs.

𝐶=5 𝐶=8 𝐶=10

a) b) c)
Figure 2: Self-organization indicators of a stochastic game for different values of the order of the full
graph
A decrease in the average loss function 𝛧𝑛 indicates that the target conditions (6) of the convergence
of the stochastic game are met. The approximation of the value of the coordination coefficient 𝐾𝑛 to 1
(that is, to logarithmic zero) indicates the self-organization of the players' strategies. For the given
parameters of the game method, increasing the order C of a full-connected graph leads to a significant
increase in the number of learning steps of the stochastic game. Thus, for 𝐶 = 5, the rapid growth of
the coordination coefficient begins at about 𝑛 = 103 steps of learning the stochastic game, for 𝐶 = 10
it takes a little more than 𝑛 = 104 steps, and for 𝐶 = 15 it takes about 𝑛 = 105 steps.
In addition to the order of a Hamiltonian graph, the convergence time of a stochastic game will also
be determined by its connectivity and the parameters of the game method. Reducing the connectivity
of the graph accelerates the self-organization of cycles.
Depending on the selected criteria and initial parameters of the game method, several autonomous
loops or one global loop may appear in the graph.
If the parameter 𝜆 of the complex penalty (6) takes values close to 1, then the dominant influence
on the course of the game will be the penalty criterion (3), which minimizes the selection of each vertex
of the graph by neighboring players. The application of this criterion can lead to the formation of several
separate (autonomous) cycles in the graph, as shown in Figure 3. For the parameter values given above,
we obtain solutions of a stochastic game in pure strategies. The stochastic game provides a multivariate
solution from the problem of constructing graph cycles.

{2, 3, 2}, {1, 4, 5, 1} {1, 2, 3, 1}, {4, 5, 8, 7, 9, 6, 4}
1 1

2 5
7

8 9

5 6

2 3
3 4

a) b)

{1, 2, 6, 10, 11, 12, 9, 5, 1}, {3, 4, 8, 7, 3 } {1, 2, 3, 4, 5, 1}, {6, 7, 12, 11, 6}, {8, 9, 10, 15, 14,
1 4 13, 8}
1
5 8
6
9 12
2 11 5
7 10
12 15
10 11

6 7 13 14

8 9
2 3
3 4

c) d)
Figure 3: Self-organized isolated cycles of graphs

In Figure 3 a shows two possible variants of self-organizing isolated loops for a full-connected
graph. An edge (2, 3) is a degenerate variant of a loop (minimal loop) where agent 2 has chosen vertex
3 and agent 3 has chosen vertex 2. Figures 3b – 3d shows variants of isolated cycles for inferior graphs
of different structures.
To determine the Hamiltonian cycle, you must set the parameter close to 0. Then criterion (5) will
have a predominant effect on the course of the game 𝜆, which maximizes the lengths of local paths and
leads to their asymptotic fusion in time into one global, Hamiltonian cycle.
Variants of Hamiltonian cycles, as a result of self-organization of the stochastic game, are shown in
Figure 4 for different graphs. The game problem has a multivariate solution in pure strategies.

{1, 2, 5, 3, 4, 1} {1, 2, 5, 8, 9, 7, 4, 6, 3, 1}
1 1

2 5
7

8 9

5 6

3 4 2 3

a) b)

{1, 2, 6, 5, 9, 10, 11, 12, 8, 7, 3, 4, 1} {1, 2, 3, 8, 7, 12, 13, 14, 15, 11, 6, 10, 9, 4, 5, 1}
1 4 1

5 8 6

9 12 2 11 5
7 10
12 15

10 11
13 14

6 7
8 9

2 3 3 4

c) d)
Figure 3: Self-organized Hamiltonian graph cycles

From the obtained results the simple interaction of agents within local subsets in the course of
learning a stochastic game leads to a complex, coordinated behavior of the system, resulting in self-
organizing patterns in the form of several autonomous or one Hamilton in the cycle of the graph. In the
process of learning, the stochastic game moves from the initial chaotic choice of agents' strategies to
their purposeful choice in the form of cycles.
Complicate the problem by expanding it to random graphs [40]. We apply the game method (13) to
find the Hamiltonian cycle of a random graph. To do this, we assign to each vertex (or game agents)
the probability 𝑞 𝑖 of failure. Restorative failure of the vertex leads to the temporary loss of all its
incident ribs. The agent corresponding to such a vertex skips the current move of the stochastic game,
and neighboring players do not consider his choice strategy (since it is absent) to calculate their own
current losses.
Criterion (3) is calculated only for those players (or vertices of the graph) and adjacent to them who
have not failed for the current step of the game. Criterion (5) for determining the length of the local
path, in addition to the condition of entering the cycle, additionally considers the condition of reaching
the player who refused and was the first to happen on the way.
Several implementations of player strategies for a random full-connected graph are shown in Figure
5. In addition to vertex failures, it is similarly possible to introduce failures of the edges of the graph.
In case of failures, a temporary violation of the connectivity of the graph is allowed. Dashed lines with
arrows indicate strategies that participants in the game can choose when forming a path towards the
rejected players. Used as a limiting condition for calculating current penalties (5).
In Figure 6 the graphs of the coefficient 𝐾𝑛 of coordination of players' strategies in the course of
game self-organization of Hamilton cycles of a random graph, which is an implementation of a full-
connected graph with |𝑉| = 5 vertices, are given. The weighting factor of criteria (3) and (5) in
convolution (6) is 𝜆 = 0.5. The failure probabilities 𝑞 𝑖 = 𝑞∀𝑖 ∈ 𝑉 are given the same for all vertices
of the graph.
The graphs in Figure 6 are obtained for the following values of failure probabilities 𝑞 ∈
{0; 0.05; 0.1; 0.15; 0.2}. For higher probability failures, the convergence time of the game method
increases significantly.

1 1

2 5 2 5

3 4 3 4

a) b)

1 1

2 5 2 5

3 4 3 4

c) d)
Figure 5: Implementations of a random graph of player strategies

Figure 6: The dependence of the coordination coefficient on the probability of failure of vertices of a
random graph

As can be seen in Figure 6, inmove players lead to an increase in the number of searches for their
steps necessary for self-organization of the Hamilton cycle. This is due to the fact that the game method
must additionally adapt to random implementations of a given graph.
To isolate the Hamiltonian cycle of a random graph foreach vertex, the conditional expectation of
the edge number chosen by the corresponding agent at the current time is determined rounded to an
integer value:
𝑁
𝑥̄ 𝑛𝑖 = 𝑖𝑛𝑡(∑𝑖=1
𝑖
𝑝𝑛𝑖 𝑥𝑛𝑖 |𝑥𝑡𝑖 , 𝜁𝑡𝑖 (𝑡 = 1,2, … , 𝑛 − 1)).
During the adaptive game, at the n →  limit values of the mathematical expectations, the edge
numbers are sent to one of the deterministic Hamiltonian cycles of the graph, for example, shown in
Figure 4a.
The stochastic game method cannot compete in efficiency with the well-known meta odes of
constructing Hamiltonian cycles and determinates graphs. It has other advantages and purposes – it can
find Hamiltonian cycles of random graphs under conditions of uncertainty (the probability of failure of
vertices of a graph is not known a priori). On the other hand, the method of stochastic play is a good
illustration of self-organization Hamiltonian cycle based on the collection and processing of local data
without exchanging information between all players. The solution of a stochastic game manifests itself
in the form of a global pattern of self-organization of player strategies, which is one of the Hamiltonian
cycles of the graph. The slow (power) convergence of the game is explained by its stochastic nature
and the lack of information among players about the full structure of the graph or its random
implementations. A stochastic game simulates the evolutionary process of self-organization of the
Hamiltonian cycle through self-training of game agents.
Consequently, the self-organization of the considered stochastic game consists in the formation of
patterns of strategies of game agents in the form of Hamiltonian cycles that arise in a deterministic or
random graph in the process of learning the recurrent method (13) based on local interaction between
agents, which leads to global coordination of the entire distributed system.

9. Conclusions

1. The complex problem of self-organization of Hamiltonian cycles of undirected graph based on
model of stochastic game is solved.
2. Global Hamiltonian cycles arise because of purposeful locally conditioned choice of conflict-free
strategies during adaptive learning of a stochastic game.
3. Self-organization of Hamiltonian cycles of a graph is possible if the constraints on the parameters
of the game method are derived from the general conditions of stochastic approximation.
4. The considered game method provides a power order of the speed of convergence, requires a
significant number of learning steps, since it works in conditions of incomplete a priori information.
5. Increasing the graph order leads to the deployment of the search process over a longer period and
requires proper configuration of the parameters of the stochastic game.
6. The method of stochastic play as a method of random tests with adaptive data processing requires
more game steps than known deterministic methods, but it can work with random graphs with a
priori unknown distributions.
7. Compared to deterministic graphs, the search for Hamiltonian cycles in random graphs requires
more steps of the game method, since at each step of the game another implementation of the
connections between vertices of the graph is possible.
8. The stochastic game method of self-organization of Hamiltonian cycles can be used to build
cryptographic protocols for exchanging keys, in systems of proof with non-disclosure of knowledge,
for solving distributed flow and transport problems and collective decision-making under
uncertainty.
9. A promising study in this direction is the game simulation of self-organization of Sensory
networks, ring oscillation of signals in neural networks for the application of results in artificial
intelligence systems.

10. References

[1] S. Gamazine, J.-L. Deneubourg, N. R. Frank, J. Sneyd, G. Theraula, E. Bonabeau, Self-
Organization in Biological Systems, Princeton University Press, 2020.
[2] Z. Sun, Cooperative Coordination and Formation Control for Multi-agent Systems, Springer, 2018.
[3] P. Kravets, Game strategies for decision making in hierarchical systems: I. Mathematical model of
the stochastic game, System Research, and Information Technologies, 3 (2019) 63–75. doi:
10.20535/SRIT.2308-8893.2019.3.06.
[4] P. Kravets, Game strategies for decision making in hierarchical systems: II. Computer simulation
of the stochastic game, System Research, and Information Technologies, 4 (2019) 105–118. doi:
10.20535/SRIT.2308-8893.2019.4.11.
[5] W.J. Zhang (Ed.), Self-organization: Theories and Methods, USA: Nova Science Publishers, 2013.
[6] P. Kravets, Game model of self-organizing of multiagent systems, Bulletin of “Lviv polytechnic
national university”, Series: “Information systems and networks”, 829 (2015) 161–176.
[7] P. Kravets, Game self-organization of agent’s system with an individual estimation of strategies,
Bulletin of “Lviv Polytechnic national university”, Series: “Computer systems and networks”, 546
(2005) 75–85.
[8] F. Schweisguth, F. Corson, Self-Organization in Pattern Formation: Review, Developmental Cell,
49(5) (2019) 659–677. doi: 10.1016/j.devcel.2019.05.019.
[9] P. Kravets, R. Jurinets, Y. Kis, Patterns of self-organizing strategies in the game of mobile agents,
Bulletin of “Lviv polytechnic national university”, Series: “Information systems and networks”, 7
(2020) 24–34. doi: 10.23939/sisn2020.07.024.
[10] P. Kravets, Self-organizing strategies in the game of agent movement, Bulletin of “Lviv
polytechnic national university”, Series: “Information systems and networks”, 9 (2021) 131–141.
doi: 10. 23939.sisn2021.09.131.
[11] N. Christofides, Graph theory: an algorithmic approach, New York: Academic Press, 1975.
[12] K. R. Saoub, Graph Theory: An Introduction to Proofs, Algorithms, and Applications, Chapman
and Hall/CRC, 2021.
[13] M. R. Garey, D. S. Johnson, R. Endre, The Planar Hamiltonian Circuit Problem is NP-Complete,
SIAM Journal on Computing, 5(4) (1976) 704–714. doi: 10.1137/0205049.
[14] W. Alhalabi, O. Kitanneh, A. Alharbi, Z. Balfakih, A. Sarirete, Efficient solution for finding
Hamilton cycles in undirected graphs, SpringerPlus, 5(1192) (2016) 1–14. doi 10.1186/s40064-
016-2746-8.
[15] B. Korte, J. Vygen (Eds.), The Traveling Salesman Problem, in: Combinatorial Optimization:
Algorithm and Combinatorics, Springer, Berlin, Heidelberg, 21 (2008) 527–562. doi:
10.1007/978-3-540-71844-4_21.
[16] A. Kumar, D. Mishra, A survey on existing conditions of Hamiltonian graphs, Journal of
Mathematical Control Science and Applications, 7(1) (2021) 28–35.
[17] Ł. Waligóra, Application of Hamilton’s graph theory in new technologies, World Scientific News,
89 (2017) 71–81.
[18] J.H. Seo, H. Lee, M.S. Jang, Optimal Routing and Hamiltonian Cycle in Petersen-Torus Networks,
in: Proceedings of the Third 2008 International Conference on Convergence and Hybrid
Information Technology, 2008, pp. 303–308.
[19] F. Sharifov, G. Jun, G. Kandiba, Optimization of routes of aircraft performing Argo-aviation
works, Science-intensive technology, 3(23) (2014) 319–325.
[20] V. Lytvyn, D. Ugrin, Methods of solving problems of finding optimal tourist routes by ant colony
imitation algorithms, Bulletin of the National Technical University “Kharkiv Polytechnic Institute”
Collection of scientific works, Series: Computer Science and Modeling, Kharkiv: NTU “Kharkiv
Polytechnic Institute”, 21(1193) (2016) 47–60.
[21] Q. Zhang, R. Cheng, Z. Zheng, Energy-efficient renewable scheme for rechargeable sensor
networks, EURASIP Journal on Wireless Communications and Networking, 74 (2020) 1–13. doi:
10.1186/s13638-020-01687-4.
[22] N. Xiong, W. Wu, C. Wu, An improved Routing Optimization Algorithm Based on Travelling
Salesman Problem for Social Networks, Sustainability, 9 (2017) 1–15. doi: 10.3390/su9060985.
[23] P. Medvedev, M. Pop, What do Eulerian and Hamiltonian cycles have to do genome assembly?
PLoS Computational Biology, 17(5):e1008928 (2021) 1–5. doi: 10.1371/journal.pcbi.1008928.
[24] O. Melkozerova, S. Rassomakhin, Identification of fingerprints based on Hamiltonian cycle of
distribution of local features, Bulletin of V. N. Karazin Kharkiv National University, Series:
Mathematical modelling, Information technology, Automated control systems, 44 (2019) 51–65.
doi: 10.26565/2304-6201-2019-44-06.
[25] S. Kavun, I. Revak, Application of graph theory in communication management problems,
Scientific Bulletin of Lviv State University of Internal Affairs, 2 (2015) 225–240.
[26] B. Barak, An Intensive Introduction to Cryptography, Chapter 13: Zero knowledge proofs, 2021.
URL: https://intensecrypto.org/public/lec_14_zero_knowledge.html.
[27] L. Gulyanytsky, O. Mulesa, Applied methods of combinatorial optimization: Tutorial, Kyiv:
Publishing and printing center “Kyiv University”, 2016.
[28] Y. Peng, B. Choi, J. Xu, Graph Learning for Combinatorial Optimization: A Survey of State-of-
the-Art, Data Science and Engineering, 6 (2021) 119–141. doi: 10.1007/s41019-021-00155-3.
[29] R. Kutelmakh, B. Uhrynovskyi, Investigation of the efficiency of common edges decomposition
algorithm for solving large size traveling salesman problem, Young Scientist, 12(52) (2017) 1–5.
[30] J. Sleegers, D. Berg, Backtracking (the) Algorithms on the Hamiltonian Cycle Problem,
arXiv:2107.00314v1 [cs.DS], 2021. URL: https://arxiv.org/pdf/ 2107.00314v1.pdf.
[31] V. Prokopenkov, A new method for finding a Hamiltonian cycle on a graph, Bulletin of the
National Thecnical University “Kharkiv Polytechnic Institute”, Series: Strategic management,
portfolio management, and projects, 2 (2020) 43–49. doi: 10.20998/2413-3000.2020.2.6.
[32] T. Tambouratzis, Solving the Hamiltonian cycle problem via an artificial neural network,
Information Processing Letters, 75(6) (2000) 237–242. doi: 10.1016/S0020-0190(00)00116-2.
[33] E. Ponce-de-Leon, A. Ochoa, R. Santana, A genetic Algorithm for a Hamiltonian Path Problem,
in: Industrial and Engineering Application of Artificial Intelligence and Expert Systems, 2020, pp.
13–19.
[34] O. Muliarevych, V. Golembo, A modification of the ant colony method for solving the problem
of a salesman by a team of autonomous agents, Bulletin of” Lviv polytechnic national university”,
Series: “Computer systems and networks”, 717 (2011) 24 – 30.
[35] B.-S. Chen, Stochastic Game Strategies and their Applications, CRC Press, 2020.
[36] V. Ungureanu, Pareto-Nash-Stackelberg Game and Control Theory: Intelligent Paradigms and
Applications, Springer, 2018.
[37] K. Najim, A.S. Poznyak, Learning Automata: Theory and Applications, Elsevier, 2014.
[38] H.J. Kushner, G.G. Yin, Stochastic Approximation Algorithms and Recursive Algorithms and
Applications, Springer, 2013.
[39] P. Kravets, Convergence of the game gradient method in sign-positive environments, Bulletin of
” Lviv polytechnic national university”, Series: “Computer systems and networks”, 438 (2001) 83
– 89.
[40] A. Frieze, Hamilton Cycles in Random Graphs: a bibliography, arXiv:1901.07139v20 [math.CO],
2021. URL: https://arxiv.org/pdf/1901.07139v20.pdf