Multi-Agent Coordination for a Partially Observable
                                and Dynamic Robot Soccer Environment with Limited
                                Communication
                                Daniele Affinita1 , Flavio Volpi1,† , Valerio Spagnoli∗1,† , Vincenzo Suriani1 ,
                                Daniele Nardi1 and Domenico D. Bloisi2
                                1
                                    Department of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza University of Rome
                                2
                                    Faculty of Political Science and Sociopsychological Dynamics, UNINT University, Rome


                                              Abstract
                                              RoboCup represents an International testbed for advancing research in AI and robotics, focusing on a
                                              definite goal: developing a robot team that can win against the human world soccer champion team by
                                              the year 2050. To achieve this goal, autonomous humanoid robots’ coordination is crucial. This paper
                                              explores novel solutions within the RoboCup Standard Platform League (SPL), where a reduction in
                                              WiFi communication is imperative, leading to the development of new coordination paradigms. The
                                              SPL has experienced a substantial decrease in network packet rate, compelling the need for advanced
                                              coordination architectures to maintain optimal team functionality in dynamic environments. Inspired
                                              by market-based task assignment, we introduce a novel distributed coordination system to orchestrate
                                              autonomous robots’ actions efficiently in low communication scenarios. This approach has been tested
                                              with NAO robots during official RoboCup competitions and in the SimRobot simulator, demonstrating a
                                              notable reduction in task overlaps in limited communication settings.

                                              Keywords
                                              Distributed Robot Coordination, Multi-Agent Cooperation, World Modelling, RoboCup


                                1. Introduction
                                Robocup is the world’s largest robotics competition which aims to push the boundaries of
                                research covering a wide range of topics. Managing the coordination among a team of fully
                                autonomous humanoid robots is a key aspect of dealing with the RoboCup 2050’s challenge,
                                consisting of creating a team of fully autonomous humanoid robot soccer players able to win a
                                soccer game complying with the rules of FIFA against the winner of the World Cup.
                                   In the RoboCup Standard Platform League (SPL), the current trend is to rely less on WiFi
                                communication, in order to push the boundaries of the robot’s capabilities in managing the
                                distributed task assignment problem in challenging conditions. Novel approaches, like gesture-
                                based [1], have been developed and tested, but wireless communication is still the main com-
                                munication channel.
                                10th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2023)
                                †
                                    These authors contributed equally.
                                Envelope-Open affinita.1885790@studenti.uniroma1.it (D. Affinita); volpi.1884040@studenti.uniroma1.it (F. Volpi);
                                spagnoli.1887715@studenti.uniroma1.it (V. Spagnoli∗ ); suriani@diag.uniroma1.it (V. Suriani);
                                nardi@diag.uniroma1.it (D. Nardi); domenico.bloisi@unint.eu (D. D. Bloisi)
                                Orcid 0000-0003-1199-8358 (V. Suriani); 0000-0001-6606-200X (D. Nardi); 0000-0003-0339-8651 (D. D. Bloisi)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: Frame from the quarter-finals of the RoboCup SPL 2023 between SPQR and HTWK teams.
Above the names of the teams, the counters of the exchanged packets for each team are shown.


   In particular, in the last few years, the network packet rate has been reduced from the original
5 packets per second per robot to a 1.200 total amount of packets per team per match. According
to the rulebooks, from RoboCup 2019 to RoboCup 2022 the number of allowed packets per
team has been reduced by 84% [2]. In RoboCup 2023 the total number has been kept the same,
but the number of playing robots per team increased from 5 to 7 (see Fig. 1). This further
reduced the amount of packets per robot. Meanwhile, the size of the single packet has been
reduced to half of its size (now it is 128 bytes). This pushed teams to design new coordination
paradigms and architectures. To keep a team able to play a RoboCup match in a very dynamic
and partially observable environment such as the RoboCup competition, it is needed to model
the world representation, predict it when there are no updated data from the network and
limited perceptions, and subsequently assign the tasks to the involved agents.
   The main contribution of this work is the definition of a new distributed coordination
system, derived from the market-based task assignment for orchestrating the actions of multiple
autonomous robots, ensuring their efficient performance even in setups with low communication
rates. Our approach has been tested on real robots during competitions and in the SimRobot
simulator[3] to evaluate the efficacy of our contributions, demonstrating how this approach can
dramatically reduce the number of task overlaps in limited communication RoboCup matches.


2. Related Work
Coordinating a team of humanoid robots is a challenging task, especially in a very dynamic
environment with hard constraints in the communication modalities. The RoboCup competition
is one of the best testbeds for developing novel approaches, where a team of robots must
cooperate effectively to compete in soccer matches. Within this competition, different leagues
address the multi-agent coordination problem in distributed and centralized setups and using
fully observable and partially observable environments, depending on the league[4]. In SPL,
the coordination can only be distributed and asynchronous.
   An early approach for task allocation modeled as an asynchronous distributed system has
been presented in [5]. This system can either utilize each robot’s perception or employ a token-
passing mechanism to allocate tasks within the team. Lou et al. [6] propose an enhanced task
allocation algorithm based on an auction system. They categorize potential tasks into subgroups
and assign tasks to individual robots while ensuring precedence constraints are maintained. In
Middle-size League (MSL), [7] propose a task allocation strategy for the middle-size league of
soccer based on utility estimations. They determine a set of preferred positions for the team
based on the current situation and compute utility values to generate a reference pose set.
   In the 3D Simulation League, an advancement in robot coordination is introduced in[8]
involving a formation system algorithm. This algorithm computes a global world model shared
among agents and locally evaluated. After each evaluation, robots broadcast their results.
   Additionally, other solutions to the challenge of coordinating heterogeneous robots are
discussed in [9, 10]. These solutions rely on estimating the world state, mapping functions
between robots and tasks, or mapping functions between robots and roles.
   To take advantage of the auction based-mechanism [6] with the estimation of the mapping
functions between robots and roles[7, 10], and relying upon the local world model of the robot[9],
a unified approach, capable to manage also different playing contexts is presented in [11]. In
[12], to preserve the game capabilities of the robots, a dynamic sending approach is presented.
   To improve the placements of the robots on the field, some approaches rely on the Voronoi
schema [13, 2]. In contrast, our proposed method integrates these approaches by leveraging
both distributed world knowledge and task-role assignments, but increasing the autonomy
of the robots when no data are received from the teammates adding corrections on the robot
positioning using the Voronoi diagram, as elaborated upon in the following section.
   The proposed method creates a fully distributed market-based coordination system, inspired
by the one proposed in [11], that leverages both distributed world knowledge and task-role
assignment, and integrates a correction mechanism on the robot positioning using a Voronoi
diagram, which allows improving the robots’ autonomy in low-connection scenarios.


3. Proposed Method
Our main contribution is the proposal of a market-based, distributed approach for multi-agent
coordination, when there is a lack of information for an extended period. This specific topic has
been overlooked in previous works and research which mainly focus on a standard situation in
which it is always possible to share information among the agents. However, in a real-world
application, it may happen that robot communication is not always possible or is delayed,
especially when the communication medium is the network. Our methodology focuses on
addressing this particular situation, leveraging the prediction models to compensate for the
limited information exchange among agents.
   In order to represent the operative scenario, we consider 𝑀 tasks, denoted as 𝑇 = {𝜏1 , … , 𝜏𝑚 },
and 𝑁 robots, denoted as 𝑅 = {𝑟1 , … , 𝑟𝑛 }, where in general 𝑀 > 𝑁. Furthermore, we assume
that we possess knowledge of an optimal robot placement configuration depending on the
world state. In our study, a central theme that underscores the efficiency and effectiveness of
our approach is the execution of both task assignment and world modeling in a distributed
manner, without exchanging further information. In the context of the RoboCup domain, we
                        𝑂𝐿𝑀𝑡−1


                 N                                 𝐷𝑊 𝑀𝑡                    ctx
                 E          𝛿               𝑓
                                    𝑂𝐿𝑀𝑡                                  provider
                 T
                                       𝐿𝑀𝑡                                      𝐶𝑇 𝑋


                                𝐼           Ψ                              UEM

                                                        𝑉        Γ
                                           𝐿𝑀𝑡−1


                                     m tasks          𝐹 𝑖𝑙𝑡𝑒𝑟               Φ          <𝑟𝑖 , 𝑡𝑗 >
                                                                n tasks
Figure 2: The overall architecture of DWM and DTA. The input is represented by a network event. If
the event does not occur, prediction models probabilistically extend the previously estimated models.
Then all local models are merged into DWM, used to select the most valuable context and assign utility
values to each <robot, task> pair. Finally, the optimal configuration V is employed to match the number
of roles to the number of available robots. Roles are then assigned to maximize cumulative utilities.


consider robot roles as tasks; during the match, each robot must have a soccer role that defines
its subtasks and goals. The overall presented architecture is composed of several components,
aiming to guarantee the execution in a challenging environment such as a RoboCup match
where the teams can face unpredictable situations. The main components are represented by: a
distributed world modeling, a position provider based on the Voronoi diagram, and a distributed
task assignment procedure, as depicted in Fig. 2. The distributed world modeling is achieved by
fusing the information from all the robots, updated by a transition model that each robot adopts
to keep a coherent representation of the world even under low packet rates circumstances. To
easily propagate the information coming from the robot, the set of desirable positions is initially
chosen using a Voronoi-based position generator. At the end of the procedure, each robot is
capable of self-assigning a role and being aware of the teammates’ roles without an explicit
information exchange.

3.1. Distributed World Model
An essential prerequisite for an effective distributed task assignment algorithm is to have an
accurate representation of the world. The local model of the world (𝐿𝑀) contains several
components to represent the surrounding environment. The essential elements that contribute
to world modeling include the obstacle model which incorporates the estimated poses of all
robots and other rigid obstacles; a ball model, utilized for the estimation of the ball state
(position and velocity); and a lines detector, employed to identify soccer lines within the field.
Each component is derived from sensor data and is refined through the application of filtering
techniques to mitigate perception errors. The inputs that affect the models are the robot’s
perceptions referred to as 𝐼 and the events 𝑒 sent from other robots through a common network.
   We distinguish events that reflect the main situations in soccer matches. For example, an
event is triggered when a robot detects a whistle from the referee. Another example of event
triggering is when none of the team members have seen the ball for a while. In such a case, if a
robot finds the ball, the context changes, and an event is triggered to notify the other agents.
It is important to notice that these events are in general not triggered at a specific rate, but
they occur at irregular intervals, mirroring the dynamic nature of the real-world environment.
Consequently, there exists the possibility of extended temporal gaps during which no event is
sent through the network.
   To address this limitation, we employ a function denoted as 𝛿 to update the 𝐿𝑀 of other robots,
referred to as 𝑂𝐿𝑀𝑗 for robot j. Specifically, this function merges the 𝑂𝐿𝑀 of the previous step
with any received event, when available. In the absence of a received event, it uses a predictive
model to compute a probabilistic model of the world. This allows us to avoid sending the whole
𝐿𝑀 through the network obtaining a good estimation using only the available information.

                                     𝑂𝐿𝑀𝑗,𝑡 = 𝛿(𝑂𝐿𝑀𝑗,𝑡−1 , 𝑒)                                  (1)

At the same time, a function Ψ updates the robot’s local model 𝐿𝑀 by incorporating input data
received from the sensors into the previous local model.

                                       𝐿𝑀𝑡 = Ψ(𝐿𝑀𝑡−1 , 𝐼 )                                     (2)

Both the local model update functions 𝛿 and 𝜓 involve predictive models which compensate
for the absence of information, in case no network event is received. For example, a Gaussian
Mixture Model (GMM) is employed to model the obstacles, a Kalman Filter for the ball preceptor,
and odometry data for updating field elements and lines. Having an updated version of the Local
Model of every robot {𝑂𝐿𝑀1,𝑡 , … , 𝑂𝐿𝑀𝑛−1,𝑡 , 𝐿𝑀𝑡 }, it is possible to reconstruct the Distributed
World Model (𝐷𝑊 𝑀) with a merging function 𝑓 which fuses the set of local models.

                             𝐷𝑊 𝑀𝑡 = 𝑓 (𝑂𝐿𝑀1,𝑡 , … , 𝑂𝐿𝑀𝑛−1,𝑡 , 𝐿𝑀𝑡 )                          (3)

3.2. Distributed Task Assignment
In our study, our primary objective is to enhance team coordination and strategic decision-
making by adapting to the evolving configurations of the world. To achieve this, we introduce the
contexts to represent various scenarios. Specifically, we rely on a module, the Context Provider,
which uses the information within the DWM to dynamically select the most appropriate context
(CTX ) from a predefined set. The context selection relies on a priority queue. Each context is
linked to specific conditions. These contexts represent distinct situations in which a strategic
adjustment becomes necessary.
   At the same time, the information condensed in the DWM is used as input to function 𝑉
to generate a set of desirable positions representing the optimal robot configuration at that
moment. Notice that the configuration generated is role-independent, and each point within it
does not represent an assignment to a specific robot but rather signifies a collection of potential
waypoints. The points generated from 𝑉 have two purposes: filter 𝑁 out of 𝑀 tasks and further
refine the Utility Estimation Matrix (𝑈 𝐸𝑀).
   The Utility Estimation Matrix represents the main data structure used to take into account
the information from teammates and simulate task auctions locally. It is composed of N rows
for robots and M columns for tasks, where the entry (𝑖, 𝑗) contains a non-negative number
representing the utility. The utility is computed by considering several components that measure
the effectiveness of a given DWM with respect to a robot 𝑖 and a role 𝑗, so it quantifies how well
robot 𝑖 can perform the task 𝑗. The final goal is to maximize the sum of all the assignments. The
computation of 𝑈 𝐸𝑀 is also influenced by the context selected by the context provider, allowing
for adaptive role assignments based on the chosen strategy. The columns of the matrix are
filtered using a module that compares the target of the roles with the waypoints derived from 𝑉.
This filtering process transforms the matrix into an 𝑁 × 𝑁 square matrix, with an equal number
of roles and agents. Finally, a function Φ provides the pairs < 𝑟𝑖 , 𝑡𝑗 > from the filtered 𝑈 𝐸𝑀:

                                    Φ(𝑈 𝐸𝑀, 𝑡𝑎𝑠𝑘𝑠) →<
                                                   − 𝑟𝑖 , 𝑡𝑗 > ∀𝑖, 𝑗                                    (4)

   The roles in the matrix are ordered by importance, meaning that a role in position 𝑖 has more
priority than a role in position 𝑗 if 𝑖 < 𝑗. The assignment process starts with the role in position
0 and assigns it to the robot associated with the row that maximizes its utility. Subsequently,
we proceed to the next role in order of priority while considering the unassigned robots. This
process allows each robot to simulate the potential assignments of other robots. As the 𝐷𝑊 𝑀
is probabilistically identical for all agents, each robot will reach an identical set of assignments.

3.3. Voronoi Diagram
The function 𝑉 is a domain-specific optimal function that we assumed a priori, and which is
used in the selection of the best N tasks and for the refinement of the UEM. The selection of the
function is based on some precise aspects that are desired to maximize or minimize, according to
the environment. In a RoboCup soccer field, where the coexistence of many robots in a limited
space can create some issues in the evolution of the game, we are interested in maximizing
the distances with adversarial robots. For this purpose, the function we chose is the Voronoi
diagram (Fig.3), which guarantees some advantages in the spatial disposition of the agents.
   Given a set of 𝑛 points in the plane (called sites), the Voronoi diagram is the partition of the
plane in polygons based on the distance to them. In particular, it ensures every point inside
the same region is closer to its associated site than to the others. Formally, defined a metric
distance 𝑑, we call 𝑆 = {𝑠𝑖 |𝑖 = 1, ..., 𝑛} the set of sites and 𝑅 = {𝑅𝑖 |𝑖 = 1, ..., 𝑛} the set of Voronoi
regions, each one associated to the site 𝑠𝑖 . Thus, taken a point 𝑝 of the plane:

                                 𝑝 ∈ 𝑅𝑖 ⟺ 𝑑(𝑝, 𝑠𝑖 ) ≤ 𝑑(𝑝, 𝑠𝑗 ) ∀𝑗 ≠ 𝑖                                  (5)

Every point 𝑒 such that 𝑒 ∈ 𝑅𝑖 ∧ 𝑒 ∈ 𝑅𝑗 compose the Voronoi edge 𝐸𝑖𝑗 between the polygons 𝑅𝑖
and 𝑅𝑗 . So, the edge 𝐸𝑖𝑗 is constituted by all the points that have the same distances with the
sites 𝑠𝑖 and 𝑠𝑗 , i.e.:
                            𝐸𝑖𝑗 = {𝑒|𝑑(𝑒, 𝑠𝑖 ) = 𝑑(𝑒, 𝑠𝑗 )} with 𝑒 ∈ 𝑅𝑖 ∧ 𝑒 ∈ 𝑅𝑗              (6)
Every point 𝑣 that belongs to at least three different Voronoi regions is called Voronoi node:

                                       𝑣 = 𝑅𝑖 ∩ 𝑅𝑗 ∩ 𝑅𝑘 ∩ ... ∩ 𝑅𝑛                                      (7)

In our case study, among all possible methods to build the graph, we decided to consider and
construct it as the dual graph of the Delaunay triangulation, where the set of starting points
Figure 3: Voronoi graph in 2D and 3D field view. In the 2D view (left), blue points represent the
opponent robots and black connections depict the Delaunay Triangulation, while red points are the
Voronoi nodes and red links are the Voronoi edges. In the 3D view (right), just the Voronoi nodes and
edges are shown.


is composed of all positions of opponent robots. The final Voronoi nodes and edges represent
respectively the furthest points from the opponents and the optimal path to follow between
two adjacent nodes. In other words, Voronoi nodes constitute the optimal positions for the
team disposal. The filtering process for the N out of M tasks is done through the proximity
of the tasks to the nodes. In this way, we can ensure to pick the most suitable tasks for the
environment evolution. The refinement of the UEM is performed by applying offsets to the
tasks in their nearest node directions, displacing the task positions to the local optimal solution.


4. Experimental Results
The system has been tested qualitatively during the last official RoboCup competition1 , and
quantitatively in the SimRobot environment, simulating multiple matches.
  To assess the performance of our approach, we employ the metric of multiple role periods.
Specifically, we have computed for each role, in each match, the total duration during which
two or more robots assumed the same role simultaneously. Since the striker represents the most
dynamic role with the highest priority, it best reflects coordination performance.
  We performed three sets of experiments, comparing the following approaches:
      1. multi-agent fixed-rate coordination that does not utilize events and Voronoi.
      2. multi-agent event-based coordination without Voronoi schema correction.
      3. multi-agent event-based coordination with Voronoi schema: the presented approach that
         includes the events and the novel Voronoi correction in the task assignment mechanism.
  The results, displayed in Figure 4, show the cumulative role overlap duration. The x-axis
represents the roles, while the y-axis represents the cumulative time. The total simulation
duration is 60 minutes. The adoption of an event-based communication model allows for a
1
    https://2023.robocup.org/en/robocup-2023/
Figure 4: Role overlaps over time: for each role, the cumulative time (minutes) of role overlaps is shown.
This demonstrates the improvements of the proposed approach (green) w.r.t. the baseline (blue).


more adaptive approach to environment changes compared to a fixed-interval rate, enabling the
robots to communicate only when necessary. This is further improved using the Voronoi schema
which obtained the best results in terms of overlapping time between roles (orange-green bars
comparison). In fact, the Voronoi schema improves the coordination reducing role overlaps, by
distributing the tasks of each role far from the tasks of the other roles, preserving effectiveness.


5. Conclusions
In this study, we tacked the challenge of coordinating a team of fully autonomous humanoid
robots participating in the RoboCup competition, in a low communication setup. The recent
changes in SPL’s rules, such as reduced network packet rates and an increased number of
playing robots, prompted us to develop an innovative distributed coordination system based on
market-based task assignments.
   Our system allows robots to model the world locally, propagate world predictions when
network data is limited, and consequently efficiently assign tasks to team members. We adopted
a market-based approach in which every robot simulates an auction locally assigning the
available tasks to maximize the expected reward. We employed a Voronoi Graph to filter out
additional roles to match the number of tasks with the number of available robots. Additionally,
the Voronoi diagram has been also used for calculating a portion of the reward, contributing to
the differentiation of the total reward. To address limited communication, we utilized prediction
models to compensate for missing information from other agents, sending messages only when
specific events occur. Finally, we conducted extensive experiments, both in the real RoboCup
environment and the SimRobot simulator, to assess our approach’s performance.
   The results clearly indicate that our approach effectively reduces task overlaps in low-
communication scenarios, a critical factor in RoboCup matches. This research contributes
significantly to the robotics field and RoboCup competition, offering a practical solution to the
challenges posed by reduced communication rates in SPL.


References
 [1] V. Di Giambattista, M. Fawakherji, V. Suriani, D. D. Bloisi, D. Nardi, On field gesture-based
     robot-to-robot communication with nao soccer players, in: RoboCup 2019: Robot World
     Cup XXIII, Springer, Cham, 2019, pp. 367–375.
 [2] T. Röfer, T. Laue, A. Hasselbring, J. Lienhoop, Y. Meinken, P. Reichenberg, B-human
     2022–more team play with less communication, in: Robot World Cup, Springer, 2022, pp.
     287–299.
 [3] T. Röfer, T. Laue, A. Hasselbring, L. M. Monnerjahn, N. Matschull, L. Plecher, B-human
     2021–playing soccer out of the box, in: Robot World Cup, Springer, 2021, pp. 302–313.
 [4] E. Antonioni, V. Suriani, F. Riccio, D. Nardi, Game strategies for physical robot soccer
     players: A survey, IEEE Transactions on Games 13 (2021) 342–357. doi:10.1109/TG.2021.
     3075065 .
 [5] A. Farinelli, L. Iocchi, D. Nardi, V. A. Ziparo, Task assignment with dynamic perception and
     constrained tasks in a multi-robot system, in: Proceedings of the 2005 IEEE International
     Conference on Robotics and Automation, IEEE, 2005, pp. 1523–1528.
 [6] L. Luo, N. Chakraborty, K. Sycara, Multi-robot assignment algorithm for tasks with
     set precedence constraints, in: 2011 IEEE International Conference on Robotics and
     Automation, IEEE, 2011, pp. 2526–2533.
 [7] T. Weigel, J.-S. Gutmann, M. Dietl, A. Kleiner, B. Nebel, Cs freiburg: coordinating robots
     for successful soccer playing, IEEE Transactions on Robotics and Automation 18 (2002)
     685–699.
 [8] P. MacAlpine, F. Barrera, P. Stone, Positioning to win: A dynamic role assignment and
     formation positioning system, in: RoboCup 2012: Robot Soccer World Cup XVI 16,
     Springer, 2013, pp. 190–201.
 [9] S. Abeyruwan, A. Seekircher, U. Visser, Dynamic role assignment using general value
     functions, in: IEEE Humanoid Robots, HRS workshop, Osaka, Japan, 2012.
[10] D. Vail, M. Veloso, Multi-robot dynamic role assignment and coordination through shared
     potential fields, Multi-robot systems 2 (2003) 87–98.
[11] F. Riccio, E. Borzi, G. Gemignani, D. Nardi, Multi-robot search for a moving target:
     Integrating world modeling, task assignment and context, in: 2016 IEEE/RSJ International
     Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1879–1886.
[12] T. Röfer, T. Laue, A. Hasselbring, J. Lienhoop, Y. Meinken, P. Reichenberg, B-human 2022
     – more team play with less communication, in: RoboCup 2022:, Springer, Cham, 2023, pp.
     287–299.
[13] M. Malmir, S. Boluki, S. Shiry Ghidary, Offensive positioning based on maximum weighted
     bipartite matching and voronoi diagram, in: R. A. C. Bianchi, H. L. Akin, S. Ramamoorthy,
     K. Sugiura (Eds.), RoboCup 2014: Robot World Cup XVIII, Springer, Cham, 2015, pp.
     562–570.