Anytime Action Selection Algorithms in Virtual
                   Soccer

Daniil Kivarin[0000−0002−0495−9994] and Michail Panteleyev[0000−0002−6077−760X]

Saint Petersburg Electrotechnical University ’LETI’, 197376, Saint-Petersburg, Russia
                       {dkivarin, mpanteleyev}@gmail.com


      Abstract. The article is devoted to the problem of action planning in
      dynamic multi-agent worlds. Multi-agent worlds or systems are sets of
      interacting intelligent agents performing targeted actions in a dynamic
      environment. They are used for many applications including transporta-
      tion, logistics, graphics, manufacturing etc. One of key objectives of intel-
      ligent agents in multi-agent worlds is real time planning in a constantly
      changing environment, because depth and completeness of the analysis of
      possible actions are limited and must adapt to current time limitations.
      The aims of the study are to consider existing approaches to action plan-
      ning and to experimentally analyze their work in typical situations from
      multi-agent system representing the counteraction of agents in virtual
      soccer match. The main approach to action planning considered in this
      article is the model of advanced iterative action planning for real time
      intelligent agents. This approach is based on using any-time algorithms
      to avoid problems with time restrictions. The proposed algorithms and
      models for action planning are considered taking into account decision-
      making under real-time constraints. They work is illustrated with an
      example of a concrete situation from virtual soccer. The operating time
      of the presented algorithms was experimentally measured and compared
      with theoretical values. According to analysis, complexity of algorithms
      used in assessment was estimated. These results allow us to develop an
      action planning algorithm as an any-time algorithm so that the agent
      can effectively use all available time in the situation.

      Keywords: intelligent agent · multi-agent systems · action planning ·
      utility assessment · virtual soccer · robocup


Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
2       D. Kivarin and M. Panteleyev

1    Introduction
Autonomous intelligent agents (IAs) capable of goal-directed acting in complex
dynamic multi-agent environments have been of great interest to researchers in
recent years [1]. When constructing such a systems one of the key problems is
an action selection in a constantly changing multi-agent worlds [2, 3].
    Robocup Soccer Server is currently the most popular platform for the study
IAs and multi-agent systems [4]. The platform allows to simulate team acting
scenarios in real time under limited perception of the world. Therefore, this
article discusses the task of evaluating the actions utility in virtual football.
    In [5] the action selection algorithm based on forward simulations is pre-
sented. The approach made it possible to optimize the time for the player to
turn body before kick in order to reduce the risk of the opponent stealing the
ball before the kick. However this algorithm belongs to a fairly narrow scenario
of the kick selection in RoboCup.
    A case-based reasoning approach for cooperative action selection, based on
the storage, retrieval, and adaptation of example cases is presented in [6]. A
retrieval technique that weighs up the similarity of a situation in terms of the
ball positional features, the uncontrollable features, and the cost of moving the
robots from the current situation is proposed.
    In [7] biologically inspired action control architecture as a combination of
hierarchical-vertical and non-hierarchical horizontal mechanisms is considered.
Motivational model of action selection for autonomous virtual humans is de-
scribed in [8]. The model provides overlapping hierarchical classifier systems,
working in parallel to generate coherent behavioral plans, which are associated
with the functionalities of a free flow hierarchy to give reactivity to the hierar-
chical system.
    The known approaches do not take into account the requirements of real
time and do not allow to adapt the decision-making process to changing time
constraints. In a dynamic multi-agent environment, the depth and completeness
of the analysis of possible options for action are limited and must adapt to
current time constraints.
    A promising approach to solving the action selection problem in real-time
is a model of advanced iterative action planning [9, 10]. In accordance with this
model, the refinement of the options for action possible in the situation under
consideration and the assessment of their utility are carried out within the frame-
work of dynamically identified time constraints. The approach involves the use
of anytime algorithms (ATA) [11], in which the quality of results increases with
increasing execution time.
    The purpose of this article is to refine the models and algorithms for actions
utility evaluating in virtual soccer, as well as their analysis for typical situations
that arise during a soccer match. Based on the analysis, dynamically changing
(situational) factors affecting the time complexity of the utility assessment are
identified. This allow to construct utility evaluating algorithm as ATA, so that
agent can effectively use all the time available in a particular situation for action
selection.
                      Anytime Action Selection Algorithms in Virtual Soccer         3

2   Algorithms for actions utility assessment
Let us consider the main concepts of approach according to [9, 10]:
 1. Current situation S ∗ – the world state, at the time IA chooses next actions;
 2. Set of possible agent actions {Acti } – actions the considered IA can perform
    in the current situation;
 3. Set of possible situations {Sn∗∗ } – possible world states resulting from the
    performing of selected action by the considered IA;
 4. Set of actual actors {AAk } – all possible IAs which can significantly affect
    the world state after performing of selected action by the considered IA;
 5. Set of partners {Tm } – the subset of actual actors achieving the same goals
    as considered IA;
 6. Set of possible partners’ actions {T Actm i } – actions the m-th partner from
    set of partners can perform in the current situation;
 7. Set of opponents {Ok } – the subset of actual actors opposing the goals of
    considered IA;
 8. Set of possible opponents’ actions {OActki } – actions the k -th opponent from
    set of opponents can perform in the current situation;
 9. Action utility U (Acti ) – the assessment shows how profitable the performing
    of this action in the current situation to achieve the goals of considered IA;
10. Situation utility U (S ∗∗ ) – the assessment which shows how profitable this
    situation to achieve the goals of considered IA.
The generalized algorithm for assessing the utility of action Acti in current
situation according to [9, 10] contains the following steps:
 1. Identification of sets {Tm } and {Ok } of actual actors in current situation;
 2. Identification of sets {T Actm                k
                                  i } and {OActi } of possible actions for each agent
    from {Tm } and {Ok };
 3. Identification of set {Sn∗∗ } of possible situations. Possible situations are pre-
    dicted as world states at the time then action Acti and all possible combi-
    nations of actions {T Actm                  k
                                i } and {OActi } are completed;
 4. Assessing the utility U (S ∗∗ ) of all possible situations from {Sn∗∗ };
 5. Assessing the utility U (Acti ) of action Acti as minimum value of utilities
    {Un (Sn∗∗ )} of possible situations.
The generalized algorithm for determining the most useful action in the current
situation according to [9, 10] contains the following steps:
 1. Generating the basic set of possible agent actions {Acti };
 2. Assessing the utility U (Acti ) of all actions from the basic set {Acti };
 3. The most useful agent action is defined as action from {Acti } with maximum
    value of utility U (Acti ).
    Each action has an a priory utility assessment dependent on action type and
its main parameters. For example, shot for the goal is more preferred than pass,
dribble forward is more preferred than dribble diagonally etc. A priory utility
4       D. Kivarin and M. Panteleyev

assessment practically doesn’t require time for computation (since it is retrieved
from IA’s memory) which corresponds to the ATA-based approach.
    After the considered IA has selected a specific action, in determining the set
of actual actors, the possibility of their influence on the situation development is
assessed. For example, if the IA is considering passing the ball to the left, then
players who are to his left and are near to the probably pass trajectory will be
included in the set of actual actors for this action. At the same time, the players
who are to the right of the player who has the ball in this case cannot directly
influence the situation development. So they will not be included in the set of
actual actors for this action (pass to the left).


3   Implementation of actions utility assessment by
    any-time algorithms

To illustrate the proposed approach, let’s consider the situation when the player
who owns the ball is close to the opponent’s goal (see Fig. 1). Designations of
objects in the figure:

 1. A – player, who owns the ball and assesses the utility of his next actions;
 2. B – ball;
 3. T 1 – teammate of player A;
 4. O1, O2 – players of opponent team;
 5. G – goalie of opponent team.


              Fig. 1. Players arrangement in action utility assessment.


  The basic set of considered player actions with their a priory utility assess-
ments is presented in Table 1.
                      Anytime Action Selection Algorithms in Virtual Soccer      5

                  Table 1. Basic set of actions of player with ball.

Player action            A priory utility
Act1 = Shot for the goal 90%
Act2 = Pass              60%
Act3 = Dribble           50%


   The conditions for including players in the set of actual actors for teammates
and opponents are presented in Table 2.


                 Table 2. Conditions for selection of actual actors.

Player action     Partners {Tm }                      Opponents {Ok }
Shot for the goal -                                   Opponents who can stop the shoot
                                                      for the goal
Pass             Partners who can take the pass       Opponents who can stop the pass
Dribble          Partners who can lead defenders away Opponents who can block the player,
                                                      take away the ball


    Possible actions of players included in the set of actual actors are presented
in Table 3.


                       Table 3. Set of actual actors actions.

Player action     Partner action             Opponent action
Shot for the goal -                          OAct1 = Try to take away the ball
                                             OAct2 = Stay on the defensive
Pass             T Act1 = open to pass       OAct1 = Try to take away the ball
                                             OAct2 = Stay on the defensive
Dribble          T Act1 = lead defender away OAct1 = Block player
                                             OAct2 = Try to take away the ball


    The transition from the current world state to the set of possible states after
agents perform certain actions can be represented in the form of bipartite graph
(see Fig. 2) [10].
6       D. Kivarin and M. Panteleyev


            Fig. 2. Tree of predicted situations (of possible world states).


    The root node of the graph corresponds to the current world state S*. Out-
going edges show the possible actions {Acti } of player who owns the ball. The
second stage of the tree contains splitter-nodes, from which originate edges that
denote combinations of actual actors possible actions when player performed
action Acti corresponding to the edge entering the node.
    Action utility is calculated on the basis of the utilities of possible situations
according to the minimax principle.
    Utility of situations achieved as a result of certain action performed by players
is evaluated on the basis of a set of many factors. The set of these factors
depends on the specific situation and can vary widely. This allows us to construct
utility assessment algorithms as ATAs. For example, in the simplest case the
situation utility can be assessment based on the distance between the ball and
the opponent’s goal. In more complex cases the following factors can be taken
into account: the numerical advantage of the players of our team in a certain
area of the field, their relative position, individual capabilities of the players and
other factors. In this way, the computational complexity of the algorithm for
actions utility evaluating depends on the number of relevant actors, as well as
on the number of actions that they can take.
    Since the actions of different agents can be performed in various combi-
nations, the number of possible situations (leaves in a tree) reached from one
splitter-node in the worst case is mn , where m is the maximum number of actions
of one actor, and n is the number of actual actors.
    The computational complexity of algorithm for determining the most useful
action in the current situation depends on two factors:
- Number of actions for the agent possible in current situation;
- Complexity of utility value calculating for one action.
    In the worst case, it can be estimated as O(mn ∗ k), where k is the number
of possible actions of the agent.
    Taking into account such high computational complexity and hard real-time
limitations for action selection, the utility calculation algorithm is constructed
as ATA. The construction of a possible worlds tree is performed in descending
                     Anytime Action Selection Algorithms in Virtual Soccer        7

order of the a priory (and then the current) utility of actions within the available
time.


4   Approach experimental assessment


The operating time of action utility assessment algorithm is determined by the
following formula:


                             T = taAA ∗ kA + taU ∗ ks                           (1)


    where taAA - upper estimate of the time to determine whether the agent
in the considered situation is an actual actor; kA - number of potential actual
actors; taU - time for predicting and assessing of one situation; kS - number of
predicted situations.
   The values taAA and taU should be calculated for certain computing plat-
form by experimental measurement of the execution time of the corresponding
program fragments.
    The value kA depends on the specific considered situation and is known before
the algorithm starts. The value kS is calculated after determining of actual actors
set.
    In this way, estimated time to action utility assessment can be calculated
at the beginning of the algorithm work. This will allow agents to rationally
distribute time.
    An experimental study for the situation discussed above was carried out on
a computer with an AMD A4-9120 processor, clock frequency 2.20 GHz. The
results of measuring the times taU and taAA of the computation of the key
fragments of the algorithm are presented in Table 4.


        Table 4. The execution time of the key fragments of the algorithm.

Fragment                                   Execution time, ms
Predicting and assessing of situation, taU 0,22
Actual actor determining, taAA             0,06


    Comparative estimates of theoretical and experimental time for calculating
utility assessment for a different number of actual actors are presented in Table 5.
8       D. Kivarin and M. Panteleyev

          Table 5. Comparison of theoretical and experimental estimates.

Number of actual actors Texp ,ms Ttheor ,ms
0                       0,00     0,00
1                       0,25     0,28
2                       0,50     0,56
3                       1,06     1,06
4                       1,16     1,13
5                       2,06     2,06
6                       4,09     3,88
7                       3,91     3,94
8                       6,91     7,50
9                       12,71    14,56
10                      13,44    14,63
11                      26,44    28,69
12                      52,72    56,75
13                      60,69    56,81
14                      104,75 112,87
15                      215,25 224,94
16                      210,71 225,00
17                      420,97 449,06
18                      810,28 897,13
19                      821,19 897,72
20                      1675,16 1793,25
21                      3466,31 3585,31


    The results of action utility assessment (in percent) for the considered exam-
ple (see Fig. 2) are presented in Table 6.


                         Table 6. Actions utility assessment.

Player action      Predicted situations Situation utility Action utility
Shoot for the goal S1∗∗                 65%               65%
                   S2∗∗                 75%
Pass               S3∗∗                 56%               56%
                   S4∗∗                 63%
Dribble            S5∗∗                 46%               46%
                   S6∗∗                 53%


5   Conclusions

Utility assessment of possible actions is the key task when an intelligent agent is
choosing next its actions. The algorithm for action utility assessment in multi-
agent worlds has high computational complexity which increases rapidly de-
                      Anytime Action Selection Algorithms in Virtual Soccer           9

pending on the number of actors and their possible actions in the considered
situation.
    In the course of the research, the model for action utility assessment proposed
in the model of advanced iterative action planning for real time intelligent agents
was clarified. The work of the proposed model was analyzed on the example of
a typical situation from the virtual soccer. Based on this analysis, conclusions
were drawn about the complexity of the used algorithms. These results allow us
to develop a utility assessment algorithm as an ATA so that intelligent agent
can effectively use all available time in the current situation.
    The obtained experimental results are focused on the environment of virtual
soccer. However, the approach can be extended to other applications of agent
systems, which constitutes the direction of further research.


References
1. Panteleyev M. G., Puzankov D. V. Intelligent agents and multi-agent systems: a
   monograph.: Publishing house SPbGETU ”LETI”, 2015. - 215 p. (in Russian)
2. Tyrrell T. Computational Mechanisms for Action Selection// PhD Thesis, Univer-
   sity of Edinburg, 1993, 212 P.
3. Modelling Natural Action Selection// Seth A.K., Prescott T.J., Bryson J.J. (eds.),
   Cambridge University Press, 2012, 560 P.
4. Official RoboCup site [Electronic resource]: access mode- http://www.robocup.org.
5. Mellmann H., Schlotter B. Advances on Simulation Based Selection of Actions for
   a Humanoid Soccer-Robot// In Proc. of the 12th Workshop on Humanoid Soccer
   Robots, 17th IEEE-RAS Int. Conf. on Humanoid Robots, Madrid, Spain.
6. Ros R., Arcos J. L., de Mantaras R.L., Veloso M. A case-based approach for co-
   ordinated action selection in robot soccer// Artificial Intelligence, 2009, V. 173, N
   9–10, pp. 1014-1039.
7. Öztürk P. Levels and Types of Action Selection: The Action Selection Soup// Adap-
   tive Behavior, 2009, Vol 17(6), pp. 537–554.
8. de Sevin, E. Thalmann, D. A motivational Model of Action Selection for Virtual
   Humans// Computer Graphics International (CGI), IEEE Computer Society Press,
   New York 2005.
9. Panteleyev M.G. Advanced Iterative Action Planning for Intelligent Real-Time
   Agents// Procedia Computer Science, 2019, Vol. 150, pp. 244-252
10. Panteleyev M.G. The Formal Model of Advanced Iterative Real-Time Action Plan-
   ning for Intelligent Agents// Proc. of the 14th National Conf. on AI KII-2014. Vol.
   1. – Kazan: RIC ”School”, 2014, pp. 323-333.
11. Zilberstein, S. Using Anytime Algorithms in Intelligent Systems// AI Magazine,
   1996, 17(3), pp. 73-83.