=Paper= {{Paper |id=None |storemode=property |title=A Survey on Coordination Methodologies for Simulated Robotic Soccer Teams |pdfUrl=https://ceur-ws.org/Vol-627/mass_7.pdf |volume=Vol-627 |dblpUrl=https://dblp.org/rec/conf/mallow/AlmeidaLR10 }} ==A Survey on Coordination Methodologies for Simulated Robotic Soccer Teams== https://ceur-ws.org/Vol-627/mass_7.pdf
         A Survey on Coordination Methodologies for
              Simulated Robotic Soccer Teams
                                     Fernando Almeida∗‡ , Nuno Lau†‡ , Luı́s Paulo Reis§¶
                                       falmeida@di.estv.ipv.pt, lau@det.ua.pt, lpreis@fe.up.pt

                      ∗ DI/IPV - Department of Informatics, Polytechnic Institute of Viseu, Viseu, Portugal
      † DETI/UA - Electronics, Telecommunications and Informatics Department, University of Aveiro, Aveiro, Portugal
                    ‡ IEETA - Institute of Electronics and Telematics Engineering of Aveiro, Aveiro, Portugal
     § DEI/FEUP - Department of Informatics Engineering, Faculty of Engineering, University of Porto, Porto, Portugal
           ¶ LIACC - Artificial Intelligence and Computer Science Laboratory, University of Porto, Porto, Portugal



   Abstract—Multi-agent systems (MAS) are a research topic             V, VII and VI provide a discussion of developed techniques for
with ever-increasing importance. This is due to their inherently       simulated robotic soccer organized in different perspectives.
distributed organization that copes more naturally with real-life      Section VIII addresses the lessons learned from the survey.
problems whose solution requires people to coordinate efforts.
   One of its most prominent challenges consists on the creation         II. ROBOCUP: A TESTBED FOR COORDINATION
of efficient coordination methodologies to enable the harmonious
operation of teams of agents in adversarial environments. This            RoboCup was designed to meet the requirements of han-
challenge has been promoted by the Robot World Cup (RoboCup)           dling real complexities in a restricted world and provides
international initiative every year since 1995.                        standard challenges in a common platform to foster Artificial
   RoboCup provides a pragmatic testbed based on standard-             Intelligence and Intelligent Robotics research [17].
ized platforms for the systematic evaluation of developed MAS
coordination techniques. This initiative encompasses a simulated          Its most pragmatic goal is to develop a team of fully au-
robotic soccer league in which 11 against 11 simulated robots play     tonomous humanoid robot soccer players capable of winning
a realistic soccer game that is particularly suited for researching    a soccer game against the winner of the World Cup by 2050.
coordination methodologies.                                            This ambition although difficult to achieve, will surely drive
   This paper presents a comprehensive overview of the most            significant technological breakthroughs while trying [33].
relevant coordination techniques proposed up till now in the
simulated robotic soccer domain.                                          The main focus of RoboCup is Robotic Soccer (RoboCup-
   Index Terms—Coordination methodologies, MAS, simulated              Soccer), although other application domains exist focusing on
robotic soccer, RoboCup.                                               different scopes like disaster rescue, robotics education for
                                                                       young students and human assistance on everyday life tasks.
                     I. INTRODUCTION                                      The RoboCupSoccer domain has 5 leagues [11]: there is a
   The development of efficient methodologies (e.g. languages,         virtual (Simulation League) and several hardware (Small-Size,
models) for MAS coordination in adversarial environments               Medium-Size, Standard Platform and Humanoid) leagues.
is one of the most interesting scientific challenges promoted             This paper focuses on the RoboCupSoccer 2D Simula-
by the RoboCup [33] and is mainly supported by its soccer              tion League (RoboCupSoccer2D) although other simulation
simulation leagues. The main goal of coordination mechanisms           subleagues (3D, 3D Development and Mixed Reality) exist.
in these leagues is to adequately control a team of players and        This league enables a virtual soccer match between 2 teams
an optional coach to win matches against adversary teams.              of 11 simulated agents each with an optional online coach
   Soccer is an inherently coordinated game in which team              using a physical soccer simulation system. Agents have an
fitness directly relates to how well players can synchronize to        environment-aware body and can act autonomously to perform
perform tasks (e.g. passing). However, team coordination can           reactive or pro-active actions in an individual or sociable
be complex to achieve, mostly due to the multitude of variables        manner, although interaction is highly constrained as described
(e.g. players and ball positions) players must consider to make        in Section III. The environment is partially observable through
the best decision at each instant. Moreover, measuring its             non-symbolic sensors, stochastic, sequential, dynamic and
success quantitatively is difficult as it doesn’t necessarily relate   multi-agent without centralized control [11].
to the final match score (e.g. a team might play better than              This league presents 3 strategic research challenges for
the opposite but still lose), thus more data must be considered        multi-agent interaction [33]:
to perform an accurate assessment (e.g. ball possession).                 • Multi-agent learning of individuals (e.g. ball interception)
   The rest of the paper is organized as follows. Section II                 and teams (e.g. adapt player positioning to opponents);
describes the RoboCup initiative and its physical soccer simu-            • Teamwork to enable to real-time planning, replanning and
lator. Section III presents a general definition of coordination             execution of tasks in a dynamic adversary environment;
and its related issues in the robotic soccer domain. Sections IV,         • Agent modelling to reason about others (e.g. intentions).
                                  TABLE I
            L IST OF S OCCER S ERVER CORE ACTIONS BY CATEGORY                                                       opportunities to communicate fully in a safe offline situation
   Category                           Actions                                                                       (e.g. in the locker-room) while being able to act autonomously
   Movement                           Dash*, Turn, Move                                                             in real-time with little or no communication.
   Ball control                       Kick, Catch, Tackle                                                              One of the most important tasks for players is to select
   Perception control                 Turn neck, Change view, Attention to
   Communication                      Point to, Say                                                                 and initiate an appropriate (possibly cooperative) behavior in a
   Match information                  Score                                                                         given context, using (or not) knowledge from past experiences
   *Dash impacts players stamina which is continuously assessed through their energy (liveness), effort (movement
   efficiency) and recovery (energy renewal rate)
                                                                                                                    in order to help their team to win. Good coordination method-
                                                                                                                    ologies can help achieve this goal, although their success is
                                                                                                                    still highly dependent on players individual abilities (low-level
   Soccer Server is an open-source client/server physical soc-                                                      skills) to execute adequate competitive decisions.
cer simulation system [36][7] used in RoboCupSoccer2D. It                                                              The coordination difficulties enforced by the simulator are:
uses well defined protocols to enable communication between                                                            • Many multimodal information can be sensed at once,

clients (players and coaches) and itself to manage connections,                                                           making it difficult to process;
gather world perceptions and control clients actions.                                                                  • Environment’s unpredictability makes it difficult to pre-

   Firstly, all clients connect to the server and sending intro-                                                          dict future states;
ductory initialization data to which the server replies with the                                                       • Clients can’t rely on message reception due to commu-

current simulation settings (e.g. player characteristics). These                                                          nication unreliability;
settings can be tweaked in order to enhance the simulation.                                                            • Low-bandwidth makes it difficult to convey meaningful

   During the match, each team can have an online coach that                                                              knowledge in messages;
receives global error-free information about world objects and                                                         • Uncertainty in perceived world information may lead to

all the messages sent from the players and the referee. All                                                               conflicting behaviors between agents [39], due to invalid
communication is done exclusively via the server and coach-                                                               state knowledge representations.
to-players communication is highly restricted.                                                                         More specifically the simulated robotic soccer domain
   The simulator provides a set of players with distinguished                                                       presents researchers with the following types of challenges:
capabilities (heterogeneous players) from which the coach                                                              • Perception: Where, when and how should players use
must build a team to play a soccer match. During the match                                                                their vision? To whom should they listen to? How to
players receive tailored multimodal sensor information (aural,                                                            estimate information of others?
vision and body) according to their standpoint. This informa-                                                          • Communication: What, when and how should players ex-
tion is received through messages (hear, see and sense body)                                                              change information? How should exchanged information
sent regularly from the simulator, that can be inaccurate (e.g.                                                           be used?
vision accuracy varies inversely with objects distance). Based                                                         • Action: Which action should the player perform that is
on these perceptions, players can act upon the world to inflict                                                           best for the team? How to evaluate different types of
changes in it using the core actions depicted in Table I.                                                                 actions (e.g. pass vs dribble)? How to execute a given
   Also during the match, a referee (automated or human) can                                                              elementary (e.g. kick) or compound action (e.g. dribble)?
make rulings that change the play mode (e.g. free-kick) and are                                                        • Coordination: How to structure coordination dependen-
immediately relayed to all clients. The human referee is used                                                             cies between players? With whom should a player co-
to judge situations driven by player’s intentions (e.g. player                                                            ordinate his actions? How should actions be coordinated
obstruction) which are still difficult to evaluate automatically.                                                         with others? How to adapt coordination in real-time? How
   The simulation executes in discrete time steps (cycles).                                                               can the coach be used to coordinate team players?
Throughout each step players can take actions, restricted in
                                                                                                                       The answer to some of these questions and others more
number and by play mode (e.g. one kick per cycle), that will
                                                                                                                    specific will be discussed in the remaining sections.
be applied to objects (players and the ball) at the end of the
step. The next step is simulated by applying only the allowed                                                            IV. TECHNOLOGIES FOR COORDINATION
actions to the state information (e.g. update objects positions)                                                    A. Coordination by Communication
and eventually by solving conflicting situations (e.g. several
players might kick the ball simultaneously).                                                                           Sharing pertinent world information can be useful to achieve
   Some of the research developed has shown that robotic                                                            team coordination. In earlier Soccer Server versions communi-
soccer [1] and consequently RoboCup [35][34] can be used                                                            cation constraints were relaxed and allowed the transmission of
effectively to study MAS and coordination techniques in                                                             long messages. This extremely permissive condition motivated
particular. In most cases these techniques can be generalized                                                       the development of techniques that relied on sharing lots of
to other domains [6] (e.g. network routing [53]).                                                                   meaningful information about the world’s state knowledge
                                                                                                                    among teammates to make better informed decisions.
  III. COORDINATION PROBLEMS IN SIMULATED                                                                              Currently, message size is restricted to a minimum and
                   ROBOTIC SOCCER                                                                                   poses a new challenge that requires the cautious selection of
   Robotic Soccer is an instance of Periodic Team Synchro-                                                          pertinent information to convey at each instant. To circum-
nization (PTS) domains [52] in which players have sporadic                                                          vent the previous constraint an Advanced Communications
                                                                                                TABLE II
framework [42] was proposed in which a player maintains                    C OMPARISON OF DIFFERENT VISUALIZATION APPROACHES
a communicated world state (separated from his perceived                                               Information
                                                                       Approach        Usage scope                       Target behavior
world state) using only information from teammates, without                                            validity period
any prediction or perception information of his own. By                Ball-centered   Individual      Short             Reactive
                                                                                       Individual or                     Reactive or De-
comparing both worlds, a player assesses the interest of items         Active
                                                                                       Collective
                                                                                                       Short to Medium
                                                                                                                         liberative
of his perceived world state to his teammates and selects              Strategic       Collective      Medium to Long    Deliberative
the most useful information (e.g. objects positions) to share.
Information utility metrics were based on domain-specific
heuristics but were later extended to accommodate the current
                                                                     behavior of other players and the ball. The likelihood of
situation and estimated teammate’s knowledge [12].
                                                                     collaboration in a soccer match is directly related to the
   Other techniques were proposed that use little or no com-
                                                                     adequacy of a player’s position (e.g. open pass lines for attack).
munication by adding knowledge assumptions (e.g. Locker-
                                                                        During a match, at most one player can carry the ball at
Room Agreements discussed in Section VI-A) to reason over
                                                                     each instant. For this reason, players will spend most of their
players intentions based on assigned roles [20] (combined
                                                                     time without the ball and trying to figure out where to move.
with Coordination Graphs discussed in Section VII-A), offline
                                                                        The first positioning techniques proposed allowed players to
learned prediction models [54] and player’s beliefs [38][16] to
                                                                     situate themselves in an anticipated useful way for the team
adapt to their actions.
                                                                     in two different contexts [48]:
   The trend in this domain will be towards little or no com-
                                                                        • Opponent marking: player moves next to a given oppo-
munication due to the constraints mentioned in Section III and
also because communication introduces an overhead and delay               nent rather than staying at his default home position;
                                                                        • Ball-dependent: player adjusts his location, within a given
that can degrade the player performance. The combination of
implicit coordination with beliefs exchange yields better per-            movement range, based on the ball’s current position;
                                                                        • Strategic Positioning using Attraction and Repulsion [48]
formance with communication loss than explicit coordination
with intentions communication alone [16]. The exchange of                 (SPAR): player tries to maximize the distance to all
beliefs among teammates allows a more coherent and complete               players and minimize the distance to the opponent goal,
global belief about the world. This global belief can then be             the active teammate and the ball. This algorithm enables
used to predict players utilities and adapt actions to players            players to anticipate the collaborative needs of their
predicted intentions to achieve the best (joint) action. As state         teammates by positioning themselves to open pass lines
estimation accuracy reaches an acceptable upper bound it will             for the teammate with the ball.
eventually replace explicit communication.                              The previous techniques are rather reactive and demand fast
                                                                     responses from players according to the target object behavior.
B. Coordination by Intelligent Perception                            This leads to quickly wearing out stamina because the current
   The smart usage of player sensors can be an efficient way         match situation ins’t adequately considered. To solve these
to leverage coordination with other players, by collecting the       issues, techniques were proposed that distinguish between
most valuable information at each instant.                           active (e.g. ball possession) and strategic match situations [42]:
   During the match players can assume three types of visual-           • Simple Active Positioning: players always assume an
izations. These are chonse using a strategic looking mechanism            active and non-strategic position (e.g. ball recovery);
based on their internal world state information and the current         • Active Positioning with Static Formation: extends the
match situation [42]:                                                     previous so that players can return to their default home
   • Ball-centered: look at the ball to react quickly to its              position in the static formation, if there isn’t a good
      sudden velocity changes (e.g. kick by a player);                    enough active action to perform;
   • Active: look at the target location of a desired action (e.g.      • Simple Strategic Positioning: uses only one situation and
      a pass to perform);                                                 one dynamic formation;
   • Strategic: look at a strategic location to improve the             • Situation Based Strategic Positioning [44] (SBSP): de-
      world’s state accuracy (e.g. find an open space for a pass).        fines team strategy as a set of player roles (defining
   The usefulness of the information gathered using the previ-            their behavior) and a set of tactics composed of several
ous approaches is different and can be classified based on its            formations. Each formation is used for a different strate-
intended usage scope, validity over time and motivation for               gic situation and assigns each player a default spatial
player behavior in future actions as depicted in Table II.                positioning and a role. Contrarily to SPAR, it allows
   Ultimately, this information can be combined to enhance the            the team to have completely diverse but suitable shapes
player’s world state accuracy and empower better decisions.               (e.g. compact for defending) for different situations and
                                                                          teammates to have different positional behaviors;
                      V. POSITIONING
                                                                        • Delaunay Triangulation (DT): similar in idea to SBSP, it
A. Coordination for General Positioning                                   divides the soccer field into triangles according to training
  The selection of a good position to move into during the                data [2] and builds a map from a focal point (e.g. ball
match is a challenging task for players due to the unpredictable          position) to a desirable positioning of each player. It
      also allows the use of constraints to fix topological re-     are considered preemptive over time and are prevented using
      lations between different sets of training data to compose    a heuristic-criterion that considers:
      more flexible team formations, Unsupervised Learning             • Angular size of own goal from the opponent’s location;
      Methods (e.g. Growing Neural Gas) to cope with large             • Distance from the opponent’s location to own goal;
      or noisy datasets and Linear Interpolation methods (e.g.         • Distance between the ball and opponent’s location.
      Goraud Shading) to circumvent unknown inputs. Despite            This technique can achieve good performances while bal-
      its simplicity, DT has a good approximation accuracy, is      ancing gracefully the costs and rewards involved in defensive
      locally adjustable, fast running, scalable and can repro-     positioning, but it doesn’t seem to deal adequately with uneven
      duce results for identical training data. On the other hand   defensive situations:
      it requires much memory to store all training data and
                                                                       • Outnumbered defenders shouldn’t mark specific attackers
      has a high cost to maintain its consistency.
                                                                         but rather position themselves in a way that difficults their
   Another task addressed in a soccer match is the dynamic (or
                                                                         progression towards to the goal’s center;
flexible) positioning of team players that consists on switching
                                                                       • Outnumbered attackers: more than one defender should
players positions within a formation [48] to improve the team’s
                                                                         mark an attacker (e.g. ball owner) pursuing a strategy to
performance (e.g. save player’s energy for quicker responses).
                                                                         quickly intercept the ball or compel the opponent to make
However, if misused it can increase player’s movement (e.g.
                                                                         a bad decision and lose the ball.
player moves across the field to occupy its new position).
   The methods proposed to aid players weigh the cost/benefit          Marking consists on guarding an opponent to prevent him
ratio for deciding to switch positions are based on:                from advancing the ball towards the goal, making a pass or
                                                                    getting the ball. Its goal is to seize the ball and start an attack.
   • Role Exchange: continuously assesses the usefulness of
                                                                       The opponent to mark can be chosen by the player (e.g.
      exchanging positions based on tactical gains [42] (e.g.
                                                                    closest opponent), by the team captain following a preset
      distance to a strategic position, adequacy of next versus
                                                                    algorithm (e.g. as part of the Locker-Room Agreement [48]
      current position and coverage of important positions). It
                                                                    discussed in Section VI-A), using matching algorithms [47]
      extends previous work that used flexible player roles with
                                                                    or Fuzzy Logic [46]. Choosing the opponent to mark based
      protocols for switching among them [52] to accommo-
                                                                    only on its proximity isn’t suitable as it disregards relevant
      date the exchange of players positions and types in the
                                                                    information (e.g. teammates nearby) and will lead to poor
      formation and has been used in conjunction with SBSP;
                                                                    decisions. Also, the use of a fixed centralized mediator (e.g.
   • Voronoi Cells: distributes players across the field and uses
                                                                    coach) to assign opponents to teammates although faster to
      Attraction Vectors to reflect players’ tendency towards
                                                                    compute has a negative impact in players autonomy. With the
      specific objects based on the current match situation
                                                                    exception of PTS periods, this approach isn’t robust enough
      and players’ roles [8]. It claims to have solved a few
                                                                    due to the communication constraints mentioned in Section III
      restrictions in SBSP (e.g. obligation to use home positions
                                                                    and because it provides a single point of failure.
      and fixed number of players for each role);
                                                                       A Neural Network trained with a back-propagation algo-
   • Partial (Approximate) Dominant Regions [31]: divides
                                                                    rithm that uses a linear transfer function was proposed to
      the field into regions based on the players time of arrival
                                                                    decide the type of marking to perform based on the distance
      (similar to a Voronoi diagram based on the distance of
                                                                    from the player to ball, the number of opponents and team-
      arrival), each of which shows an area that players can
                                                                    mates within the player’s field of view (FoV) and the distance
      reach faster than others. It has been used for marked
                                                                    from the player to his own goal [46]. The output accuracy of
      teammates to find a good run-away position.
                                                                    this method could be improved by considering other relevant
B. Defensive Coordination                                           information that lies outside the player’s FoV (e.g. nearby
   The main goal of a defending team, without ball possession,      opponents behind the player).
is to stop the opponent’s team attack and create conditions            Aggressive marking behavior can also be learned using a
to launch their own. In general, defensive behaviors (e.g.          NeuroHassle policy [14] based on a neural network trained
marking) involve positioning decisions (e.g. move to intercept      with a back-propagation variant of the Resilient Propagation
the ball). Defensive positioning is an essential aspect of the      (RPROP) reinforcement learning technique.
game, as players without the ball will spend most of their time
                                                                    C. Offensive Coordination
moving somewhere rather than trying to intercept it.
   Collaborative defensive positioning has been described as           To improve position selection during offensive situations
a multi-criteria assignment problem where n defenders are           (e.g. the team owns the ball) players should find the best
assigned to m attackers, each defender must mark at most one        reachable position to receive a pass or score a goal.
attacker and each attacker must be marked by no more than              The Pareto Optimality Principle was applied to enable
one defender [23]. The Pareto Optimality principle was used to      systematic decision-making regarding offensive positioning
improve the usefulness of the assignments by simultaneously         [25] based on the following set of partially conflicting criteria
minimizing the required time to execute an action and the           for simultaneous optimization [41]:
threat prevented by taking care of an attacker [24]. Threats           • Players must preserve formation and open spaces;
  •  Attackers must be open for a direct pass, keep an open            Another method proposed for high-level coordination and
     path to the opponent’s goal and stay near the opponent’s       description of team strategies is Hierachical Task Network
     offside line to be able to penetrate the defense;              (HTN) planning [37] which is to be embedded in each player.
  • Non-attackers should create chances to launch the attack.       It combines high level plans (making use of previous domain
  A Simultaneous Perturbation Stochastic Approximation              knowledge to speed up the planning process) with reactive
(SPSA) combined with a RPROP learning technique (RSPSA)             basic operators, so that players can pursue a global strategy
was proposed to Overcome the Opponent’s Offside Trap                while staying reactive to changes in the environment. This
(OOOT) by coordinated passing and player movements [13].            method separates the expert knowledge specified as team
The receiver of the OOOT pass should start running into the         strategies from the player implementation making it easier to
correct direction at the right point in time, preferably being      maintain. The objective of HTN is to perform tasks which can
positioned right before the offside line while running at its       be either complex or primitive. Complex tasks are expanded
maximal velocity when the pass is executed.                         into subtasks until they become primitive.
                VI. TEAM COORDINATION                               B. Hierarchical Coordination
A. Coordination for Strategic Actions                                  In real life soccer, natural hierarchical relations exist among
   In real soccer, team strategies are rehearsed during mundane     different team members and imply a leadership connotation
training of team players and applied during a match. The same       (e.g. a coach instructs strategy to players).
strategies are often used in matches, but for some opponents           A coach and trainer are privileged agents used to advise
they must be swapped to adapt to their unexpected behavior.         players during online games and offline work out (training)
   Strategies typically consist on a set of tactics composed by     situations respectively. The need of communication from coach
formations that map a strategic position and a distinguished        to players motivated the definition of coaching languages.
role to each player to guide his behavior.                             CLang [7] is the standard coaching language used in
   To deal with the challenges of PTS domains a Locker              RoboCup since 2001 to promote a new RoboCup competition
Room Agreement (LRA), based in the definition of a flexible         focused only on coaching techniques, but it lacks the ability
team structure (consisting of roles, formations and set-plays),     to specify a team’s complete behavior with sufficient detail.
can be used for players to consent on globally accessible              Coach Unilang [43] was proposed to enable the com-
environmental cues as triggers for changes in strategy [48].        munication of behavioral changes to players during games
Team strategies are communicated with a timestamp for play-         using different kinds of strategic information (e.g. instructions,
ers to recognize changes and always keep the most recent            statistics, opponent’s information and definitions) based on
ones to disseminate to others. The team’s formation can be          real soccer concepts. Players can ignore received messages,
either static or change dynamically during the match on team        interpret them as orders (must be used and will replace
synchronization opportunities (e.g. kick-in) or via triggered-      knowledge) or as advices (can be used with a given trust level).
communication where one teammate (e.g. team captain) makes             Strategy Formalization Language [32] extends CLang by
a decision and broadcasts it to his teammates.                      representing team behavior in a human-readable format easily
   Set-plays are predefined plans for structuring a team’s          modifiable in real-time by abstracting low-level concepts.
behavior depending on the situation. A high-level generic              The main coaching techniques developed make use of:
and flexible framework that defines a language for set-play            • Neural Networks (previously trained with adequate data)

definition, management and execution was proposed in [29]. A              to recognize opponent’s team formation and provide
set-play involves players’ references (individual or role based)          appropriate counter formation to players [55];
and steps (states of execution) that can have conditions to be         • Matching Algorithms that continuously builds a table that

carried out. Each step is lead by the ball carrying player (in            assigns a preliminary opponent to mark to each teammate
charge of making the most important decisions) and can have               and briefs all players periodically [47].
several transitions (possibly with conditions) for subsequent          The ability to recognize tactics and formations used by
steps. The main transition of a step defines a list of directives   opponent teams reveals part of their strategy and can be used
consisting of actions that should (or not) be performed. The        to implement counter strategies. To address this opportunity
execution of a set-play requires a tight synchronization be-        training techniques make use of:
tween all participants to enable a successful cooperation. To          • Sequential Pattern Data Mining using Unsupervised Sym-
cope with the simulator communication restrictions, only the              bolic Learning of Prediction Rules for situations and
lead player is allowed to send messages. This technique could             behavior during matches [26];
be improved to achieve implicit coordination through a kind of         • Triangular Planar Graphs to build topological structures
belief state exchange, because the player that owns ball decides          for discovering tactical behavior patterns [40].
when to start the set-play and informs the involved parties.
                                                                                VII. LOCAL COORDINATION
From that moment on and while the set-play follows its default
path, communication among players could be dropped until a          A. Coordination for Action Selection
deviation is decided by the ball owner because all involved           Deciding what the player should do at a given moment
parties know the steps.                                             in a soccer game is critical. Player’s individual decision
should depend on the actions performed (or expected) of other      Reinforcement Learning (TPOT-RL) technique to allow team
players and balance their risks and rewards. However, these        players to learn effective policies and thus cooperate to achieve
dependencies can change rapidly in dynamic environment as          a specific goal. This technique divides the learning task among
a result of the continuously changing state, thus efficient and    teammates, using coarse action-dependent features and gathers
scalable methods must be developed to solve this issue.            rewards directly from environmental observations. It is particu-
   The action selection mechanisms proposed make use of:           larly suitable for this domain which presents huge state spaces
   • An idealized world model combined with observed               (most of them hidden) and limited training opportunities.
     player’s state information to predict the best action [50];      Policy gradient RL was proposed to coordinate decision
   • An option-evaluation architecture for different actions       making between a kicker and a receiver in free-kicks [30][15].
     with comparable probabilistic scores [49];                       Two other important subtasks of a soccer game, Keepaway
   • Player roles and a measurement opponents interference in      and Breakaway, have been used to study specific behavioral
     the current situation using a multi-layer perceptron [18];    coordination issues. Keepaway is a game situation where one
   • Coordination Graphs (CGs) [19] where each node rep-           team (the keepers), tries to maintain ball possession within a
     resents a player and its edges (possibly directed) define     limited region, while the opposing team (the takers) attempt to
     dependencies between nodes that have to coordinate their      gain possession. Breakaway is another game situation with the
     actions. This approach is based on the assumption that        purpose of the attackers trying to score goals against defenders.
     in most situations only a few players (typically nearby)      RL techniques have proven its their usefulness to improve
     need to coordinate their actions, while the remaining         decision-making in these tasks [28][51]. The recognition of
     are capable of acting individually. To solve coordination     the potential for RL techniques, lead to the proposal of the
     dependencies in CGs algorithms like Variable Elimination      following methods to accelerate them:
     (VE) [17], Max-Plus (MP) [21] and Simulated Annealing            • Preference Knowledge-Based Kernel Regression (KBKR)
     (SA) [9] were proposed. VE requires communication to                to give advice about preferred actions [28];
     always find an optimal solution but only upon termination        • Heuristic Accelerated Reinforcement Learning (HARL):
     and with a high computational cost (due to its action               using predefined heuristic information based on
     enumeration behavior for neighbors). MP solves VE high              Minimax-Q [4] and Q-Learning [6];
     computational cost and makes the solution available at           • Case Based-HARL: heuristics are derived from a case
     anytime, but it can only find near optimal solutions                base using Q-Learning [5].
     (except for tree-structured CGs) and restricts coordination   C. Ball Passing Coordination
     to pairs of players. SA improves MP being able to work
     without communication and not restricting coordination           Passing is a crucial skill in soccer and it reflects the
     between pairs, but it can only find approximate solutions     cooperative nature of the game. Without sophisticated passing
     with an associated confidence;                                skills, it will be difficult for a team to win a match. The number
   • Fuzzy logic and bidirectional neural networks to deter-
                                                                   of passing possibilities for the ball carrying player can be
     mine the odds and priorities of action selection based on     overwhelming and thus efficient methods must be employed
     human knowledge [57];                                         for real-time decision-making.
   • Case-Based Reasoning to explicitly distinguish between
                                                                      The main criteria used to decide where to pass the ball are:
     controllable and uncontrollable indexing features, corre-        • Tactical value of the pass destination;

     sponding to players positions [45].                              • Chance of opponent intercepting the pass;
                                                                      • Confidence on the receiver’s position and interception;
B. Coordination for Behavior Acquisition                              • Location and orientation upon ball reception;
   Teams often use flexible (to some extent) predefined strate-       • Situations originated if the ball is intercepted;
gies set on the LRA. However they can prove fruitless, when           • Passing travel distance;
playing against opponents that exhibit incompatible behaviors.        • Initial and final player congestion on pass execution;
Modelling the opponent’s behavior thus becomes a necessity            • Chance of providing a shoot opportunity.
to allow convenient adaptation. However, as most players’ are         Instead of relying on the previous predefined criteria that
unseen for quite some time this task becomes a challenge.          embeds the passing strategy, this strategy can be learned using
   With adequate models of players behavior, a player can          Q-Learning [27].
improve his world model accuracy and consequently make                To balance the implicit risks and gains of the previous
better decisions by anticipating collaborative needs of team-      criteria with the costs and real-time constraints of adequate
mates (e.g. open a line of pass).                                  decision-making developed techniques apply a weighted sum
   Machine learning techniques have been proposed to address       based on the player’s type [42], Fuzzy logic [46] and the Pareto
the issue of player adaptation to unforeseen situations [3][1].    Optimality Principle [22].
   Layered learning [48] has been proposed to enable learning         To improve the efficiency of the previous position searching
low-level skills and ultimately use them to train higher-level     methods, a Rational Passing Decision based on Regions [56]
skills that can involve coordination. The highest layer of the     classification (e.g. tactical, dominant, passable and falling)
previous approach uses a Team-Partitioned Opaque-Transition        was proposed. Each region captures qualitative knowledge of
passing in a natural and efficient way. This technique has a        be used to increase their efficiency and make them adequate
low computational complexity, allows the player to decide           for online usage (e.g. HARL, KBKR). It can be argued that
rationally without precise information and balances success         machine learning techniques can be more accurate than hand-
and reward of passing. However, these pros depend highly on         coding rule-based (possibly conditional) techniques.
the regions characteristics, specifically their dimension.             In order to succeed, a good coordination methodology
   Voronoi Diagrams [10] were proposed to limit the number          should always consider the following aspects:
of possible meaningful passes, but are unable to find (or learn)       • Incorporate past knowledge (e.g. using LRA) to acceler-
the selection of an optimal pass.                                        ate initial decisions for usual situations, driven from direct
                    VIII. CONCLUSION                                     human expertise or by offline learned prediction models.
                                                                         This knowledge can be tailored for specific opponents;
   Since the start of the RoboCup initiative, several coor-            • Knowledge should be adaptable according to opponent
dination techniques were proposed that tackle core MAS                   behavior in real-time;
coordination issues in simulated robotic soccer.                       • Use alternative techniques to complement and replace
   The majority of these techniques has dealt with the problem           technologies based on communication and perception.
of adequate player positioning, due to its impact on the
successful execution of other actions (e.g. passing) during a                                ACKNOWLEDGMENT
match. Also many of presented techniques are interdependent
(e.g. CG and VE) and rely heavily on coordination technolo-           This work was financially supported by Polythecnic Institute
gies. In general, positioning techniques have evolved from          of Viseu under a PROFAD scholarship.
reactive to more deliberative approaches, meaning that players
                                                                                                 R EFERENCES
now put the team’s goals in front of his own because it
is the only way for successful coordination to be achieved.          [1] A. Agah and K. Tanie, ‘Robots Playing to Win: Evolutionary Soccer
Due to its complexity, this problem as been studied in more              Strategies’, in IEEE ICRA, volume 1, pp. 632–637, Albuquerque, NM,
                                                                         USA, (1997). IEEE.
narrower scopes (e.g. defensive and offensive situations like        [2] H. Akiyama and I. Noda, ‘Multi-Agent Positioning Mechanism in the
opponent marking and ball passing respectively) with good                Dynamic Environment’, in RoboCup 2007: Robot Soccer World Cup XI,
results. However, situations where the number of teammates               eds., U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, volume 5001 of
                                                                         LNAI, 377–384, Springer, Berlin, (2008).
and opponents is uneven still don’t seem to be adequately            [3] T. Andou, ‘Refinement of Soccer Agents’ Positions using Reinforcement
addressed by any of these.                                               Learning’, in RoboCup-97: Robot Soccer World Cup I, ed., H. Kitano,
   Besides positioning, other techniques were proposed to cope           volume 1395 of LNAI, 373–388, Springer-Verlag, Berlin, (1998).
                                                                     [4] R. Bianchi, C. Ribeiro, and A. Costa, ‘Heuristic Selection of Actions
with the remaining player’s actions (e.g. marking).                      in Multiagent Reinforcement Learning’, in IJCAI-07, pp. 690–696,
   Coordination technologies have evolved a lot since the                Hyderabad, India, (2007). Morgan Kaufmann Publishers Inc.
start of RoboCup mostly due to added functionalities and             [5] R. Bianchi, R. Ros, and R. Mantaras, ‘Improving Reinforcement Learn-
                                                                         ing by Using Case Based Heuristics’, in Case-Based Reasoning Research
constraints in the latest simulator releases. Although the use of        and Development, eds., L. McGinty and D. Wilson, volume 5650 of
communication and intelligent perception can assist team co-             LNAI, 75–89, Springer, Seattle, WA, (2009).
ordination through the sharing of pertinent world information        [6] L. Celiberto and J. Matsuura, Robotic Soccer: The Gateway for Powerful
                                                                         Robotic Applications, volume 2 of Proceedings of ICINCO-2006, IST,
and enhance the player’s world state accuracy respectively, the          IC&C, Setubal, 2008.
simulator constraints discourage relying solely on them.             [7] M. Cheny, K. Dorer, E. Foroughi, F. Heintz, Z. Huangy, S. Kapetanakis,
   Team strategies are usually very complex and are typically            K. Kostiadis, J. Kummeneje, J. Murray, I. Noda, O. Obst, P. Riley,
                                                                         T. Stevens, Y. Wangy, and X. Yiny, RoboCup Soccer Server Users
embedded into players knowledge prior to a game (e.g. using              Manual, For Soccer Server Version 7.07 and later, The RoboCup
LRA). The strategic approaches have also evolve from fixed               Federation, 2003.
policies to more flexible and dynamic policies that are based        [8] H. Dashti, N. Aghaeepour, S. Asadi, M. Bastani, Z. Delafkar, F. Disfani,
                                                                         S. Ghaderi, S. Kamali, S. Pashami, and A. Siahpirani. Dynamic
on real-time match information and previous opponent knowl-              Positioning based on Voronoi Cells (DPVC), July 2005 2005.
edge. Coaching was used to tweak team strategy mostly by             [9] J. Dawei and W. Shiyuan, ‘Using the Simulated Annealing Algorithm for
giving advices to players and allow a quicker adaptation to              Multiagent Decision Making’, in RoboCup 2006: Robot Soccer World
                                                                         Cup X, eds., G. Lakemeyer, E. Sklar, D. Sorrenti, and T. Takahashi,
opponent’s behavior. Training methods have been used as a                volume 4434 of LNAI, 110–121, Springer, Berlin, (2007).
foundation to build into team members effective knowledge           [10] H. Endert, T. Karbe, J. Krahmann, and F. Trollmann. Dainamite - Team
that can accelerate team coordination during real-time match             Description, 2009.
situations (e.g. learning opponent behavior).                       [11] RoboCup Federation. RoboCup: Overview, 01-10-2010 2010.
                                                                    [12] R. Ferreira, L. Reis, and N. Lau, ‘Situation Based Communication
   Action selection and behavior acquisition must rely on a              for Coordination of Agents’, in Scientific Meeting of the Portuguese
good understanding of what can be achieved by intelligent                Robotics Open, eds., L. Reis, A. Moreira, E. Costa, P. Silva, and
perception and communication techniques.                                 J. Almeida, pp. 39–44, Porto, (2004). FEUP Ediçőes.
                                                                    [13] T. Gabel and M. Riedmiller. Brainstormers 2D - Team Description,
   Machine learning techniques (e.g. Q-Learning) were suc-               2009.
cessfully used for behavior acquisition and adaptive coordi-        [14] T. Gabel, M. Riedmiller, and F. Trost, ‘A Case Study on Improving
nation when faced with unpredicted constraints or situations.            Defense Behavior in Soccer Simulation 2D: The Neurohassle Approach’,
                                                                         in RoboCup 2008: Robot Soccer World Cup XII, eds., L. Iocchi,
Due to their high computational cost and thus unfeasibility              H. Matsubara, A. Weitzenfeld, and C. Zhou, volume 5399 of LNCS,
for real-time decision making, acceleration techniques must              61–72, Springer, Berlin, (2009).
[15] H. Igarashi, K. Nakamura, and S. Ishihara, ‘Learning of Soccer Player      [35] I. Noda and P. Stone, ‘The RoboCup Soccer Server and CMUnited
     Agents using a Policy Gradient Method: Coordination between Kicker              Clients: Implemented Infrastructure for MAS research’, Autonomous
     and Receiver during Free Kicks’, in IJCNN, ed., X. He, H.and Xu, pp.            Agents and Multi-Agent Systems, 7(1-2), 101–120, (2003).
     46–52, Hong Kong, (2008). IEEE.                                            [36] I. Noda, S. Suzuki, H. Matsubara, M. Asada, and H. Kitano. Overview
[16] M. Isik, F. Stulp, G. Mayer, and H. Utz, ‘Coordination without Nego-            of RoboCup-97, 1998.
     tiation in Teams of Heterogeneous Robots’, in RoboCup 2006: Robot          [37] O. Obst and J. Boedecker, ‘Flexible Coordination of Multiagent Team
     Soccer World Cup X, eds., G. Lakemeyer, E. Sklar, D. Sorrenti, and              Behavior using HTN Planning’, in RoboCup 2005: Robot Soccer World
     T. Takahashi, volume 4434 of LNAI, 355–362, Springer, Berlin, (2007).           Cup IX, eds., I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, 521–
[17] W. Jin, W. Tong, W. Xiao, and M. Xiangping, ‘Multi-Robot Decision               528, Springer, Berlin, (2006).
     Making based on Coordination Graphs’, in ICMA, pp. 2393–2398,              [38] E. Pagello, A. D’Angelo, F. Montesello, F. Garelli, and C. Ferrari,
     (2009).                                                                         ‘Cooperative Behaviors in Multi-Robot Systems through Implicit Com-
[18] H. Kim, H. Shim, M. Jung, and J. Kim. Action Selection Mechanism                munication’, Robotics and Autonomous Systems, 29(1), 65–77, (1999).
     for Soccer Robot, 1997.                                                    [39] J. Penders, ‘Conflict-based Behaviour Emergence in Robot Teams’,
[19] J. Kok, M. Spaan, and N. Vlassis, ‘Multi-Robot Decision Making using            in Conflicting Agents: Conflict Management in Multi-Agent Systems,
     Coordination Graphs’, in 11th ICAR, eds., A. Almeida and U. Nunes,              Multiagent Systems, Artificial Societies, and Simulated Organizations:
     pp. 1124–1129, Coimbra, Portugal, (2003).                                       International Book Series, 169–202, Kluwer Academic Publishers, Nor-
[20] J. Kok, M. Spaan, and N. Vlassis, ‘Non-Communicative Multi-Robot                well, (2001).
     Coordination in Dynamic Environments’, Robotics and Autonomous             [40] F. Ramos and H. Ayanegui, ‘Discovering Tactical Behavior Patterns
     Systems, 50(2-3), 99–114, (2005).                                               supported by Topological Structures in Soccer Agent Domains’, in
[21] J. Kok and N. Vlassis, ‘Using the Max-Plus Algorithm for Multiagent             AAMAS-2008, eds., L. Padgham, D. Parkes, J. Müller, and S. Parsons,
     Decision Making in Coordination Graphs’, in RoboCup 2005: Robot                 volume 3, pp. 1421–1424, Estoril, Portugal, (2008). IFAAMAS.
     Soccer World Cup IX, eds., A. Bredenfeld, A. Jacoff, I. Noda, and          [41] S. Razykov and V. Kyrylov, ‘While the Ball in the Digital Soccer is
     Y. Takahashi, volume 4020 of LNAI, 359–360, Springer, Berlin, (2005).           Rolling, where the Non-Player Characters should go if the Team is
                                                                                     Attacking?’, in Future Play, Ontario, Canada, (2006). ACM.
[22] V. Kyrylov, ‘Balancing Gains, Risks, Costs, and Real-Time Constraints
                                                                                [42] L. Reis, Coordination in Multi-Agent Systems: Applications in University
     in the Ball Passing Algorithm for the Robotic Soccer’, in RoboCup 2006:
                                                                                     Management and Robotic Soccer, Phd, 2003.
     Robot Soccer World Cup X, eds., G. Lakemeyer, E. Sklar, D. Sorenti, and
                                                                                [43] L. Reis and N. Lau, ‘Coach UNILANG - A Standard Language for
     T. Takahashi, volume 4434 of LNAI, 304–313, Springer, Berlin, (2007).
                                                                                     Coaching a (Robo)Soccer Team’, in RoboCup 2001: Robot Soccer World
[23] V. Kyrylov and E. Hou, ‘While the Ball in the Digital Soccer is Rolling,        Cup V, eds., A. Birk, S. Coradeschi, and S. Tadokoro, volume 2377 of
     where the Non-Player Characters should go in a Defensive Situation?’,           LNAI, 183–192, Springer, Berlin, (2002).
     in Future Play, eds., B. Kapralos, M. Katchabaw, and J. Rajnovich, pp.     [44] L Reis, N. Lau, and E. Oliveira. Situation Based Strategic Positioning
     90–96, Toronto, Canada, (2007). ACM.                                            for Coordinating a Team of Homogeneous Agents, 2001.
[24] V. Kyrylov and Eddie Hou, ‘Pareto-Optimal Collaborative Defensive          [45] R. Ros, J. Arcos, R. de Mantaras, and M. Veloso, ‘A Case-based
     Player Positioning in Simulated Soccer’, in RoboCup 2009: Robot                 Approach for Coordinated Action Selection in Robot Soccer’, Artificial
     Soccer World Cup XIII, eds., J. Baltes, M. Lagoudakis, T. Naruse, and           Intelligence, 173(9-10), 1014–1039, (2009).
     S. Shiry, volume 5949 of LNAI, Springer, Berlin, (2010).                   [46] M. Simőes, B. Silva, A. Cerqueira, and L. Silva. Bahia2D - Team
[25] V. Kyrylov and S. Razykov, ‘Pareto-Optimal Offensive Player Position-           Description, 2009.
     ing in Simulated Soccer’, in RoboCup 2007: Robot Soccer World Cup          [47] F. Stolzenburg, J. Murray, and K. Sturm, ‘Multiagent Matching Algo-
     XI, eds., U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, volume 5001        rithms with and without Coach’, Decision Systems, 15(2-3), 215–240,
     of LNAI, 228–237, Springer, Berlin, (2008).                                     (2006).
[26] A. Lattner, A. Miene, U. Visser, and O. Herzog, ‘Sequential Pattern        [48] P. Stone, Layered Learning in Multi-Agent Systems, Phd, 1998.
     Mining for Situation and Behavior Prediction in Simulated Robotic          [49] P. Stone and D. McAllester, ‘An Architecture for Action Selection
     Soccer’, in 9th RoboCup International Symposium, eds., A. Lattner,              in Robotic Soccer’, in AAMAS-06, pp. 316–323, Montreal, Quebec,
     A. Miene, U. Visser, and O. Herzog, Osaka, Japan, (2005).                       Canada, (2001). ACM.
[27] X. Li, W. Chen, J. Guo, Z. Zhai, and Z. Huang, ‘A New Passing Strategy     [50] P. Stone, P. Riley, and M. Veloso, ‘Defining and using Ideal Teammate
     based on Q-Learning Algorithm in RoboCup’, in ICCSSE, volume 1, pp.             and Opponent Agent Models’, in IAAI-00, (2000).
     524–527. IEEE, (2008).                                                     [51] P. Stone, R. Sutton, and G. Kuhlmann, ‘Reinforcement Learning
[28] R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild, ‘Giving Advice        for RoboCup Soccer Keepaway’, Adaptive Behavior, 13(3), 165–188,
     about Preferred Actions to Reinforcement Learners via Knowledge-                (2005).
     Based Kernel Regression’, in AAAI-05 and IAAI-05, eds., M. Veloso          [52] P. Stone and M. Veloso, ‘Task Decomposition, Dynamic Role As-
     and S. Kambhampati, pp. 819–824, Pittsburgh, Pennsylvania, (2005).              signment, and Low-Bandwidth Communication for Real-Time Strategic
     AAAI Press / The MIT Press.                                                     Teamwork’, Artificial Intelligence, 110(2), 241–273, (1999).
[29] L. Mota and L. Reis, ‘Setplays: Achieving Coordination by the Ap-          [53] P. Stone and M. Veloso. Team-Partitioned, Opaque-Transition Rein-
     propriate use of Arbitrary Pre-Defined Flexible Plans and Inter-Robot           forcement Learning, 1999.
     Communication’, in ROBOCOMM-2007, pp. 1–7, Athens, (2007). IEEE            [54] F. Stulp, M. Isik, and M. Beetz, ‘Implicit Coordination in Robotic Teams
     Press.                                                                          using Learned Prediction Models’, in ICRA, IEEE ICRA, 1330–1335,
[30] K. Nakamura and H. Igarashi, ‘Learning of Decision Making at Free               IEEE, New York, (2006).
     Kicks using Policy Gradient Methods’, in Robotics and Mechatronics,        [55] U. Visser, C. Drucker, S. Hubner, E. Schmidt, and H. Weland, ‘Recog-
     (2005).                                                                         nizing Formations in Opponent Teams’, in RoboCup 2000: Robot Soccer
[31] R. Nakanishi, K. Murakami, and T. Naruse, ‘Dynamic Positioning                  World Cup IV, eds., P. Stone, T. Balch, and G. Kraetzschmar, volume
     Method Based on Dominant Region Diagram to Realize Successful                   2019 of LNAI, 391–396, Springer-Verlag, Berlin, (2001).
     Cooperative Play’, in RoboCup 2007: Robot Soccer World Cup XI, eds.,       [56] X. Yuan and T. Yingzi, ‘Rational Passing Decision Based on Region for
     U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, volume 5001 of LNAI,         the Robotic Soccer’, in RoboCup 2007: Robot Soccer World Cup XI,
     488–495, Springer, Berlin, (2008).                                              eds., U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, volume 5001 of
[32] A. Nie, A. Hönemann, A. Pegam, C. Rogowski, L. Hennig, M. Diedrich,            LNAI, 238–245, Springer, Berlin, (2008).
     P. Hügelmeyer, S. Buttinger, and T. Steffens, ‘ORCA - Osnabrueck          [57] R. Zafarani and M. Yazdchi, ‘A Novel Action Selection Architecture
     RoboCup Agents Project’, Technical report, Institute of Cognitive Sci-          in Soccer Simulation Environment using Neuro-Fuzzy and Bidirectional
     ence, (2004).                                                                   Neural Networks’, International Journal of Advanced Robotic Systems,
[33] I. Noda, M. Asada, H. Matsubara, M. Veloso, and H. Kitano, ‘RoboCup             4(1), 93–101, (2007).
     as a Strategic Initiative to Advance Technologies’, in IEEE ICSMC,
     volume 6, pp. 692–697, Tokyo, Japan, (1999). IEEE Press.
[34] I. Noda, H. Matsubara, K. Hiraki, and I. Frank, ‘Soccer Server: A Tool
     for Research on Multi-Agent Systems’, Applied Artificial Intelligence,
     12(2-3), 233–250, (1998).