<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Anytime Action Selection Algorithms in Virtual Soccer</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Saint Petersburg Electrotechnical University 'LETI'</institution>
          ,
          <addr-line>197376, Saint-Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The article is devoted to the problem of action planning in dynamic multi-agent worlds. Multi-agent worlds or systems are sets of interacting intelligent agents performing targeted actions in a dynamic environment. They are used for many applications including transportation, logistics, graphics, manufacturing etc. One of key objectives of intelligent agents in multi-agent worlds is real time planning in a constantly changing environment, because depth and completeness of the analysis of possible actions are limited and must adapt to current time limitations. The aims of the study are to consider existing approaches to action planning and to experimentally analyze their work in typical situations from multi-agent system representing the counteraction of agents in virtual soccer match. The main approach to action planning considered in this article is the model of advanced iterative action planning for real time intelligent agents. This approach is based on using any-time algorithms to avoid problems with time restrictions. The proposed algorithms and models for action planning are considered taking into account decisionmaking under real-time constraints. They work is illustrated with an example of a concrete situation from virtual soccer. The operating time of the presented algorithms was experimentally measured and compared with theoretical values. According to analysis, complexity of algorithms used in assessment was estimated. These results allow us to develop an action planning algorithm as an any-time algorithm so that the agent can e ectively use all available time in the situation.</p>
      </abstract>
      <kwd-group>
        <kwd>intelligent agent</kwd>
        <kwd>multi-agent systems</kwd>
        <kwd>action planning utility assessment</kwd>
        <kwd>virtual soccer</kwd>
        <kwd>robocup</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        Autonomous intelligent agents (IAs) capable of goal-directed acting in complex
dynamic multi-agent environments have been of great interest to researchers in
recent years [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. When constructing such a systems one of the key problems is
an action selection in a constantly changing multi-agent worlds [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        Robocup Soccer Server is currently the most popular platform for the study
IAs and multi-agent systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The platform allows to simulate team acting
scenarios in real time under limited perception of the world. Therefore, this
article discusses the task of evaluating the actions utility in virtual football.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] the action selection algorithm based on forward simulations is
presented. The approach made it possible to optimize the time for the player to
turn body before kick in order to reduce the risk of the opponent stealing the
ball before the kick. However this algorithm belongs to a fairly narrow scenario
of the kick selection in RoboCup.
      </p>
      <p>
        A case-based reasoning approach for cooperative action selection, based on
the storage, retrieval, and adaptation of example cases is presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A
retrieval technique that weighs up the similarity of a situation in terms of the
ball positional features, the uncontrollable features, and the cost of moving the
robots from the current situation is proposed.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] biologically inspired action control architecture as a combination of
hierarchical-vertical and non-hierarchical horizontal mechanisms is considered.
Motivational model of action selection for autonomous virtual humans is
described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The model provides overlapping hierarchical classi er systems,
working in parallel to generate coherent behavioral plans, which are associated
with the functionalities of a free ow hierarchy to give reactivity to the
hierarchical system.
      </p>
      <p>The known approaches do not take into account the requirements of real
time and do not allow to adapt the decision-making process to changing time
constraints. In a dynamic multi-agent environment, the depth and completeness
of the analysis of possible options for action are limited and must adapt to
current time constraints.</p>
      <p>
        A promising approach to solving the action selection problem in real-time
is a model of advanced iterative action planning [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. In accordance with this
model, the re nement of the options for action possible in the situation under
consideration and the assessment of their utility are carried out within the
framework of dynamically identi ed time constraints. The approach involves the use
of anytime algorithms (ATA) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], in which the quality of results increases with
increasing execution time.
      </p>
      <p>The purpose of this article is to re ne the models and algorithms for actions
utility evaluating in virtual soccer, as well as their analysis for typical situations
that arise during a soccer match. Based on the analysis, dynamically changing
(situational) factors a ecting the time complexity of the utility assessment are
identi ed. This allow to construct utility evaluating algorithm as ATA, so that
agent can e ectively use all the time available in a particular situation for action
selection.</p>
    </sec>
    <sec id="sec-3">
      <title>Algorithms for actions utility assessment</title>
      <p>
        Let us consider the main concepts of approach according to [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]:
1. Current situation S { the world state, at the time IA chooses next actions;
2. Set of possible agent actions fActig { actions the considered IA can perform
in the current situation;
3. Set of possible situations fSn g { possible world states resulting from the
performing of selected action by the considered IA;
4. Set of actual actors fAAkg { all possible IAs which can signi cantly a ect
the world state after performing of selected action by the considered IA;
5. Set of partners fTmg { the subset of actual actors achieving the same goals
as considered IA;
6. Set of possible partners' actions fT Actimg { actions the m-th partner from
set of partners can perform in the current situation;
7. Set of opponents fOkg { the subset of actual actors opposing the goals of
considered IA;
8. Set of possible opponents' actions fOActikg { actions the k -th opponent from
set of opponents can perform in the current situation;
9. Action utility U (Acti) { the assessment shows how pro table the performing
of this action in the current situation to achieve the goals of considered IA;
10. Situation utility U (S ) { the assessment which shows how pro table this
situation to achieve the goals of considered IA.
      </p>
      <p>
        The generalized algorithm for assessing the utility of action Acti in current
situation according to [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] contains the following steps:
1. Identi cation of sets fTmg and fOkg of actual actors in current situation;
2. Identi cation of sets fT Actimg and fOActikg of possible actions for each agent
from fTmg and fOkg;
3. Identi cation of set fSn g of possible situations. Possible situations are
predicted as world states at the time then action Acti and all possible
combinations of actions fT Actimg and fOActikg are completed;
4. Assessing the utility U (S ) of all possible situations from fSn g;
5. Assessing the utility U (Acti) of action Acti as minimum value of utilities
fUn(Sn )g of possible situations.
      </p>
      <p>
        The generalized algorithm for determining the most useful action in the current
situation according to [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] contains the following steps:
1. Generating the basic set of possible agent actions fActig;
2. Assessing the utility U (Acti) of all actions from the basic set fActig;
3. The most useful agent action is de ned as action from fActig with maximum
value of utility U (Acti).
      </p>
      <p>Each action has an a priory utility assessment dependent on action type and
its main parameters. For example, shot for the goal is more preferred than pass,
dribble forward is more preferred than dribble diagonally etc. A priory utility
assessment practically doesn't require time for computation (since it is retrieved
from IA's memory) which corresponds to the ATA-based approach.</p>
      <p>After the considered IA has selected a speci c action, in determining the set
of actual actors, the possibility of their in uence on the situation development is
assessed. For example, if the IA is considering passing the ball to the left, then
players who are to his left and are near to the probably pass trajectory will be
included in the set of actual actors for this action. At the same time, the players
who are to the right of the player who has the ball in this case cannot directly
in uence the situation development. So they will not be included in the set of
actual actors for this action (pass to the left).
3</p>
    </sec>
    <sec id="sec-4">
      <title>Implementation of actions utility assessment by any-time algorithms</title>
      <p>To illustrate the proposed approach, let's consider the situation when the player
who owns the ball is close to the opponent's goal (see Fig. 1). Designations of
objects in the gure:
1. A { player, who owns the ball and assesses the utility of his next actions;
2. B { ball;
3. T 1 { teammate of player A;
4. O1; O2 { players of opponent team;
5. G { goalie of opponent team.</p>
      <p>The basic set of considered player actions with their a priory utility
assessments is presented in Table 1.</p>
      <p>The conditions for including players in the set of actual actors for teammates
and opponents are presented in Table 2.</p>
      <p>Player action Partners fTmg
Shot for the goal
Pass
Dribble</p>
      <p>Possible actions of players included in the set of actual actors are presented
in Table 3.</p>
      <p>Player action Partner action
Shot for the goal
Pass
Dribble</p>
      <p>
        The transition from the current world state to the set of possible states after
agents perform certain actions can be represented in the form of bipartite graph
(see Fig. 2) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>The root node of the graph corresponds to the current world state S*.
Outgoing edges show the possible actions fActig of player who owns the ball. The
second stage of the tree contains splitter-nodes, from which originate edges that
denote combinations of actual actors possible actions when player performed
action Acti corresponding to the edge entering the node.</p>
      <p>Action utility is calculated on the basis of the utilities of possible situations
according to the minimax principle.</p>
      <p>Utility of situations achieved as a result of certain action performed by players
is evaluated on the basis of a set of many factors. The set of these factors
depends on the speci c situation and can vary widely. This allows us to construct
utility assessment algorithms as ATAs. For example, in the simplest case the
situation utility can be assessment based on the distance between the ball and
the opponent's goal. In more complex cases the following factors can be taken
into account: the numerical advantage of the players of our team in a certain
area of the eld, their relative position, individual capabilities of the players and
other factors. In this way, the computational complexity of the algorithm for
actions utility evaluating depends on the number of relevant actors, as well as
on the number of actions that they can take.</p>
      <p>Since the actions of di erent agents can be performed in various
combinations, the number of possible situations (leaves in a tree) reached from one
splitter-node in the worst case is mn, where m is the maximum number of actions
of one actor, and n is the number of actual actors.</p>
      <p>The computational complexity of algorithm for determining the most useful
action in the current situation depends on two factors:
- Number of actions for the agent possible in current situation;
- Complexity of utility value calculating for one action.</p>
      <p>In the worst case, it can be estimated as O(mn k), where k is the number
of possible actions of the agent.</p>
      <p>Taking into account such high computational complexity and hard real-time
limitations for action selection, the utility calculation algorithm is constructed
as ATA. The construction of a possible worlds tree is performed in descending
order of the a priory (and then the current) utility of actions within the available
time.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Approach experimental assessment</title>
      <p>The operating time of action utility assessment algorithm is determined by the
following formula:</p>
      <p>T = taAA
kA + taU
ks
(1)
where taAA - upper estimate of the time to determine whether the agent
in the considered situation is an actual actor; kA - number of potential actual
actors; taU - time for predicting and assessing of one situation; kS - number of
predicted situations.</p>
      <p>The values taAA and taU should be calculated for certain computing
platform by experimental measurement of the execution time of the corresponding
program fragments.</p>
      <p>The value kA depends on the speci c considered situation and is known before
the algorithm starts. The value kS is calculated after determining of actual actors
set.</p>
      <p>In this way, estimated time to action utility assessment can be calculated
at the beginning of the algorithm work. This will allow agents to rationally
distribute time.</p>
      <p>An experimental study for the situation discussed above was carried out on
a computer with an AMD A4-9120 processor, clock frequency 2.20 GHz. The
results of measuring the times taU and taAA of the computation of the key
fragments of the algorithm are presented in Table 4.</p>
      <p>Fragment Execution time, ms
Predicting and assessing of situation, taU 0,22
Actual actor determining, taAA 0,06</p>
      <p>Comparative estimates of theoretical and experimental time for calculating
utility assessment for a di erent number of actual actors are presented in Table 5.</p>
      <p>The results of action utility assessment (in percent) for the considered
example (see Fig. 2) are presented in Table 6.
Utility assessment of possible actions is the key task when an intelligent agent is
choosing next its actions. The algorithm for action utility assessment in
multiagent worlds has high computational complexity which increases rapidly
depending on the number of actors and their possible actions in the considered
situation.</p>
      <p>In the course of the research, the model for action utility assessment proposed
in the model of advanced iterative action planning for real time intelligent agents
was clari ed. The work of the proposed model was analyzed on the example of
a typical situation from the virtual soccer. Based on this analysis, conclusions
were drawn about the complexity of the used algorithms. These results allow us
to develop a utility assessment algorithm as an ATA so that intelligent agent
can e ectively use all available time in the current situation.</p>
      <p>The obtained experimental results are focused on the environment of virtual
soccer. However, the approach can be extended to other applications of agent
systems, which constitutes the direction of further research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Panteleyev</surname>
            <given-names>M. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puzankov</surname>
            <given-names>D. V.</given-names>
          </string-name>
          <article-title>Intelligent agents and multi-agent systems: a monograph.: Publishing house SPbGETU "</article-title>
          <source>LETI"</source>
          ,
          <year>2015</year>
          . - 215 p.
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Tyrrell</surname>
            <given-names>T</given-names>
          </string-name>
          . Computational Mechanisms for Action Selection//
          <source>PhD Thesis</source>
          , University of Edinburg,
          <year>1993</year>
          , 212 P.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Modelling</given-names>
            <surname>Natural</surname>
          </string-name>
          Action Selection// Seth A.K.,
          <string-name>
            <surname>Prescott</surname>
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <source>Bryson J.J. (eds.)</source>
          , Cambridge University Press,
          <year>2012</year>
          , 560 P.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>O</given-names>
            <surname>cial RoboCup</surname>
          </string-name>
          <string-name>
            <surname>site</surname>
          </string-name>
          [Electronic resource]: access mode- http://www.robocup.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mellmann</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <source>Schlotter B. Advances on Simulation Based Selection of Actions for a Humanoid Soccer-Robot// In Proc. of the 12th Workshop on Humanoid Soccer Robots, 17th IEEE-RAS Int. Conf. on Humanoid Robots</source>
          , Madrid, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ros</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arcos</surname>
            <given-names>J. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Mantaras</surname>
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>A case-based approach for coordinated action selection in robot soccer//</article-title>
          <source>Arti cial Intelligence</source>
          ,
          <year>2009</year>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>173</year>
          <source>, N</source>
          <volume>9</volume>
          {
          <issue>10</issue>
          , pp.
          <fpage>1014</fpage>
          -
          <lpage>1039</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>O</given-names>
            <surname>ztu</surname>
          </string-name>
          <string-name>
            <given-names>rk P.</given-names>
            <surname>Levels</surname>
          </string-name>
          and Types of Action Selection: The Action Selection Soup// Adaptive Behavior,
          <year>2009</year>
          , Vol
          <volume>17</volume>
          (
          <issue>6</issue>
          ), pp.
          <volume>537</volume>
          {
          <fpage>554</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>de Sevin</surname>
            , E. Thalmann,
            <given-names>D.</given-names>
          </string-name>
          <article-title>A motivational Model of Action Selection for Virtual Humans/</article-title>
          / Computer Graphics International (CGI), IEEE Computer Society Press, New York
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Panteleyev M.G. Advanced Iterative Action Planning for Intelligent</surname>
          </string-name>
          Real-Time Agents// Procedia Computer Science,
          <year>2019</year>
          , Vol.
          <volume>150</volume>
          , pp.
          <fpage>244</fpage>
          -
          <lpage>252</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Panteleyev</surname>
            <given-names>M.G.</given-names>
          </string-name>
          <article-title>The Formal Model of Advanced Iterative Real-Time Action Planning for Intelligent Agents//</article-title>
          <source>Proc. of the 14th National Conf. on AI KII-2014</source>
          . Vol.
          <volume>1</volume>
          . { Kazan:
          <source>RIC "School"</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Zilberstein</surname>
          </string-name>
          , S. Using Anytime Algorithms in Intelligent Systems// AI Magazine,
          <year>1996</year>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>73</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>