-

Anytime Action Selection Algorithms in Virtual Soccer

0 Saint Petersburg Electrotechnical University 'LETI' , 197376, Saint-Petersburg , Russia

The article is devoted to the problem of action planning in dynamic multi-agent worlds. Multi-agent worlds or systems are sets of interacting intelligent agents performing targeted actions in a dynamic environment. They are used for many applications including transportation, logistics, graphics, manufacturing etc. One of key objectives of intelligent agents in multi-agent worlds is real time planning in a constantly changing environment, because depth and completeness of the analysis of possible actions are limited and must adapt to current time limitations. The aims of the study are to consider existing approaches to action planning and to experimentally analyze their work in typical situations from multi-agent system representing the counteraction of agents in virtual soccer match. The main approach to action planning considered in this article is the model of advanced iterative action planning for real time intelligent agents. This approach is based on using any-time algorithms to avoid problems with time restrictions. The proposed algorithms and models for action planning are considered taking into account decisionmaking under real-time constraints. They work is illustrated with an example of a concrete situation from virtual soccer. The operating time of the presented algorithms was experimentally measured and compared with theoretical values. According to analysis, complexity of algorithms used in assessment was estimated. These results allow us to develop an action planning algorithm as an any-time algorithm so that the agent can e ectively use all available time in the situation.

intelligent agent multi-agent systems action planning utility assessment virtual soccer robocup

Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Introduction

Autonomous intelligent agents (IAs) capable of goal-directed acting in complex dynamic multi-agent environments have been of great interest to researchers in recent years [ 1 ]. When constructing such a systems one of the key problems is an action selection in a constantly changing multi-agent worlds [ 2, 3 ].

Robocup Soccer Server is currently the most popular platform for the study IAs and multi-agent systems [ 4 ]. The platform allows to simulate team acting scenarios in real time under limited perception of the world. Therefore, this article discusses the task of evaluating the actions utility in virtual football.

In [ 5 ] the action selection algorithm based on forward simulations is presented. The approach made it possible to optimize the time for the player to turn body before kick in order to reduce the risk of the opponent stealing the ball before the kick. However this algorithm belongs to a fairly narrow scenario of the kick selection in RoboCup.

A case-based reasoning approach for cooperative action selection, based on the storage, retrieval, and adaptation of example cases is presented in [ 6 ]. A retrieval technique that weighs up the similarity of a situation in terms of the ball positional features, the uncontrollable features, and the cost of moving the robots from the current situation is proposed.

In [ 7 ] biologically inspired action control architecture as a combination of hierarchical-vertical and non-hierarchical horizontal mechanisms is considered. Motivational model of action selection for autonomous virtual humans is described in [ 8 ]. The model provides overlapping hierarchical classi er systems, working in parallel to generate coherent behavioral plans, which are associated with the functionalities of a free ow hierarchy to give reactivity to the hierarchical system.

The known approaches do not take into account the requirements of real time and do not allow to adapt the decision-making process to changing time constraints. In a dynamic multi-agent environment, the depth and completeness of the analysis of possible options for action are limited and must adapt to current time constraints.

A promising approach to solving the action selection problem in real-time is a model of advanced iterative action planning [ 9, 10 ]. In accordance with this model, the re nement of the options for action possible in the situation under consideration and the assessment of their utility are carried out within the framework of dynamically identi ed time constraints. The approach involves the use of anytime algorithms (ATA) [ 11 ], in which the quality of results increases with increasing execution time.

The purpose of this article is to re ne the models and algorithms for actions utility evaluating in virtual soccer, as well as their analysis for typical situations that arise during a soccer match. Based on the analysis, dynamically changing (situational) factors a ecting the time complexity of the utility assessment are identi ed. This allow to construct utility evaluating algorithm as ATA, so that agent can e ectively use all the time available in a particular situation for action selection.

Algorithms for actions utility assessment

Let us consider the main concepts of approach according to [ 9, 10 ]: 1. Current situation S { the world state, at the time IA chooses next actions; 2. Set of possible agent actions fActig { actions the considered IA can perform in the current situation; 3. Set of possible situations fSn g { possible world states resulting from the performing of selected action by the considered IA; 4. Set of actual actors fAAkg { all possible IAs which can signi cantly a ect the world state after performing of selected action by the considered IA; 5. Set of partners fTmg { the subset of actual actors achieving the same goals as considered IA; 6. Set of possible partners' actions fT Actimg { actions the m-th partner from set of partners can perform in the current situation; 7. Set of opponents fOkg { the subset of actual actors opposing the goals of considered IA; 8. Set of possible opponents' actions fOActikg { actions the k -th opponent from set of opponents can perform in the current situation; 9. Action utility U (Acti) { the assessment shows how pro table the performing of this action in the current situation to achieve the goals of considered IA; 10. Situation utility U (S ) { the assessment which shows how pro table this situation to achieve the goals of considered IA.

The generalized algorithm for assessing the utility of action Acti in current situation according to [ 9, 10 ] contains the following steps: 1. Identi cation of sets fTmg and fOkg of actual actors in current situation; 2. Identi cation of sets fT Actimg and fOActikg of possible actions for each agent from fTmg and fOkg; 3. Identi cation of set fSn g of possible situations. Possible situations are predicted as world states at the time then action Acti and all possible combinations of actions fT Actimg and fOActikg are completed; 4. Assessing the utility U (S ) of all possible situations from fSn g; 5. Assessing the utility U (Acti) of action Acti as minimum value of utilities fUn(Sn )g of possible situations.

The generalized algorithm for determining the most useful action in the current situation according to [ 9, 10 ] contains the following steps: 1. Generating the basic set of possible agent actions fActig; 2. Assessing the utility U (Acti) of all actions from the basic set fActig; 3. The most useful agent action is de ned as action from fActig with maximum value of utility U (Acti).

Each action has an a priory utility assessment dependent on action type and its main parameters. For example, shot for the goal is more preferred than pass, dribble forward is more preferred than dribble diagonally etc. A priory utility assessment practically doesn't require time for computation (since it is retrieved from IA's memory) which corresponds to the ATA-based approach.

After the considered IA has selected a speci c action, in determining the set of actual actors, the possibility of their in uence on the situation development is assessed. For example, if the IA is considering passing the ball to the left, then players who are to his left and are near to the probably pass trajectory will be included in the set of actual actors for this action. At the same time, the players who are to the right of the player who has the ball in this case cannot directly in uence the situation development. So they will not be included in the set of actual actors for this action (pass to the left). 3

Implementation of actions utility assessment by any-time algorithms

To illustrate the proposed approach, let's consider the situation when the player who owns the ball is close to the opponent's goal (see Fig. 1). Designations of objects in the gure: 1. A { player, who owns the ball and assesses the utility of his next actions; 2. B { ball; 3. T 1 { teammate of player A; 4. O1; O2 { players of opponent team; 5. G { goalie of opponent team.

The basic set of considered player actions with their a priory utility assessments is presented in Table 1.

The conditions for including players in the set of actual actors for teammates and opponents are presented in Table 2.

Player action Partners fTmg Shot for the goal Pass Dribble

Possible actions of players included in the set of actual actors are presented in Table 3.

Player action Partner action Shot for the goal Pass Dribble

The transition from the current world state to the set of possible states after agents perform certain actions can be represented in the form of bipartite graph (see Fig. 2) [ 10 ].

The root node of the graph corresponds to the current world state S*. Outgoing edges show the possible actions fActig of player who owns the ball. The second stage of the tree contains splitter-nodes, from which originate edges that denote combinations of actual actors possible actions when player performed action Acti corresponding to the edge entering the node.

Action utility is calculated on the basis of the utilities of possible situations according to the minimax principle.

Utility of situations achieved as a result of certain action performed by players is evaluated on the basis of a set of many factors. The set of these factors depends on the speci c situation and can vary widely. This allows us to construct utility assessment algorithms as ATAs. For example, in the simplest case the situation utility can be assessment based on the distance between the ball and the opponent's goal. In more complex cases the following factors can be taken into account: the numerical advantage of the players of our team in a certain area of the eld, their relative position, individual capabilities of the players and other factors. In this way, the computational complexity of the algorithm for actions utility evaluating depends on the number of relevant actors, as well as on the number of actions that they can take.

Since the actions of di erent agents can be performed in various combinations, the number of possible situations (leaves in a tree) reached from one splitter-node in the worst case is mn, where m is the maximum number of actions of one actor, and n is the number of actual actors.

The computational complexity of algorithm for determining the most useful action in the current situation depends on two factors: - Number of actions for the agent possible in current situation; - Complexity of utility value calculating for one action.

In the worst case, it can be estimated as O(mn k), where k is the number of possible actions of the agent.

Taking into account such high computational complexity and hard real-time limitations for action selection, the utility calculation algorithm is constructed as ATA. The construction of a possible worlds tree is performed in descending order of the a priory (and then the current) utility of actions within the available time. 4

Approach experimental assessment

The operating time of action utility assessment algorithm is determined by the following formula:

T = taAA kA + taU ks (1) where taAA - upper estimate of the time to determine whether the agent in the considered situation is an actual actor; kA - number of potential actual actors; taU - time for predicting and assessing of one situation; kS - number of predicted situations.

The values taAA and taU should be calculated for certain computing platform by experimental measurement of the execution time of the corresponding program fragments.

The value kA depends on the speci c considered situation and is known before the algorithm starts. The value kS is calculated after determining of actual actors set.

In this way, estimated time to action utility assessment can be calculated at the beginning of the algorithm work. This will allow agents to rationally distribute time.

An experimental study for the situation discussed above was carried out on a computer with an AMD A4-9120 processor, clock frequency 2.20 GHz. The results of measuring the times taU and taAA of the computation of the key fragments of the algorithm are presented in Table 4.

Fragment Execution time, ms Predicting and assessing of situation, taU 0,22 Actual actor determining, taAA 0,06

Comparative estimates of theoretical and experimental time for calculating utility assessment for a di erent number of actual actors are presented in Table 5.

The results of action utility assessment (in percent) for the considered example (see Fig. 2) are presented in Table 6. Utility assessment of possible actions is the key task when an intelligent agent is choosing next its actions. The algorithm for action utility assessment in multiagent worlds has high computational complexity which increases rapidly depending on the number of actors and their possible actions in the considered situation.

In the course of the research, the model for action utility assessment proposed in the model of advanced iterative action planning for real time intelligent agents was clari ed. The work of the proposed model was analyzed on the example of a typical situation from the virtual soccer. Based on this analysis, conclusions were drawn about the complexity of the used algorithms. These results allow us to develop a utility assessment algorithm as an ATA so that intelligent agent can e ectively use all available time in the current situation.

The obtained experimental results are focused on the environment of virtual soccer. However, the approach can be extended to other applications of agent systems, which constitutes the direction of further research.

1. Panteleyev

M. G.

, Puzankov

D. V.

Intelligent agents and multi-agent systems: a monograph.: Publishing house SPbGETU "

LETI" , 2015 . - 215 p. (in Russian)

2. Tyrrell

. Computational Mechanisms for Action Selection// PhD Thesis , University of Edinburg, 1993 , 212 P.

Modelling

Natural Action Selection// Seth A.K., Prescott

T.J.

, Bryson J.J. (eds.) , Cambridge University Press, 2012 , 560 P.

cial RoboCup site [Electronic resource]: access mode- http://www.robocup.org.

5. Mellmann

, Schlotter B. Advances on Simulation Based Selection of Actions for a Humanoid Soccer-Robot// In Proc. of the 12th Workshop on Humanoid Soccer Robots, 17th IEEE-RAS Int. Conf. on Humanoid Robots , Madrid, Spain.

6. Ros

, Arcos

J. L.

, de Mantaras

R.L.

, Veloso

A case-based approach for coordinated action selection in robot soccer//

Arti cial Intelligence , 2009 , V. 173 , N 9 { 10 , pp. 1014 - 1039 .

ztu

rk P.

Levels and Types of Action Selection: The Action Selection Soup// Adaptive Behavior, 2009 , Vol 17 ( 6 ), pp. 537 { 554 .

8. de Sevin , E. Thalmann, D. A motivational Model of Action Selection for Virtual Humans/ / Computer Graphics International (CGI), IEEE Computer Society Press, New York 2005 .

9. Panteleyev M.G. Advanced Iterative Action Planning for Intelligent Real-Time Agents// Procedia Computer Science, 2019 , Vol. 150 , pp. 244 - 252

10. Panteleyev

M.G.

The Formal Model of Advanced Iterative Real-Time Action Planning for Intelligent Agents//

Proc. of the 14th National Conf. on AI KII-2014 . Vol. 1 . { Kazan: RIC "School" , 2014 , pp. 323 - 333 .

11. Zilberstein , S. Using Anytime Algorithms in Intelligent Systems// AI Magazine, 1996 , 17 ( 3 ), pp. 73 - 83 .