=Paper=
{{Paper
|id=Vol-3735/paper_04
|storemode=property
|title=Enhancing Aggregation in Locomotor Multi-Agent Systems: a Theoretical Framework
|pdfUrl=https://ceur-ws.org/Vol-3735/paper_04.pdf
|volume=Vol-3735
|authors=Paolo Pagliuca,Alessandra Vitanza
|dblpUrl=https://dblp.org/rec/conf/woa/PagliucaV24
}}
==Enhancing Aggregation in Locomotor Multi-Agent Systems: a Theoretical Framework==
<pdf width="1500px">https://ceur-ws.org/Vol-3735/paper_04.pdf</pdf>
<pre>
                                Enhancing Aggregation in Locomotor Multi-Agent
                                Systems: a Theoretical Framework
                                Paolo Pagliuca1,∗,† , Alessandra Vitanza1,†
                                1
                                    Institute of Cognitive Sciences and Technologies, National Research Council (CNR-ISTC), Rome, Italy


                                               Abstract
                                               The synthesis of collective behaviors in Multi-Agent Systems is typically approached using various
                                               methods, with Evolutionary Algorithms being among the most prevalent. In these systems, agents
                                               engage in local interactions with their peers and collectively adopt strategies that manifest at a group
                                               level, resembling social behaviors seen in animal societies. We extended the AntBullet problem, which
                                               is part of the PyBullet simulation tool, to a collective scenario involving a group of five homogeneous
                                               robots to aggregate during locomotion. To evolve this behavior, we employed the OpenAI-ES algorithm
                                               alongside a multi-objective fitness function. Our findings indicate that while the robots developed
                                               successful locomotion behaviors, they did not exhibit aggregation. This discrepancy is attributed to
                                               design choices that unintentionally emphasized locomotion over aggregation capabilities. We discuss the
                                               dynamic interplay induced by the fitness function to validate our results and outline future directions.
                                               Ultimately, our goal is a first attempt to establish a framework for analyzing collective behaviors using
                                               advanced algorithms within modern simulation environments.

                                               Keywords
                                               Multi-Agent Systems, aggregation, fitness function, OpenAI-ES, PyBullet


                                1. Introduction
                                In Multi-Agent Systems (MASs) [1], various approaches based on model-free machine learning
                                techniques are employed to address the emergence of collective behaviors. Reinforcement
                                Learning (RL) and Evolutionary Algorithms (EAs) are among the most widespread methods
                                used for this purpose. Common characteristics of such systems include the exploitation of
                                local interactions between agents to identify a common strategy beneficial at the group level,
                                mimicking behaviors observed in social animals such as ants, bees, birds, and fish. Recently, in
                                the simulation panorama, the PyBullet simulation tool [2] has become a standard for customizing
                                environments and testing algorithms like Evolutionary Strategies (ESs) and Reinforcement
                                Learning. Indeed, PyBullet offers a wide range of problems, from classic control tasks to Atari
                                games and locomotion challenges. In this work, we propose a novel framework to investigate
                                collective behaviors in swarms, focusing on the aggregation problem — a common behavior
                                observed in nature [3–6] — that is utilized for tasks such as foraging, collective motion, and
                                defense against predators. Specifically, we extend the AntBullet problem, a standard benchmark

                                WOA 2024: 25th Workshop “From Objects to Agents”, July 8–10, 2024, Forte di Bard (AO), Italy
                                ∗
                                    Corresponding author.
                                †
                                    The authors contributed equally.
                                Envelope-Open paolo.pagliuca@istc.cnr.it (P. Pagliuca); alessandra.vitanza@istc.cnr.it (A. Vitanza)
                                Orcid 0000-0002-3780-3347 (P. Pagliuca); 0000-0002-7760-8167 (A. Vitanza)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
in PyBullet, to a collective scenario involving a group of five homogeneous robots with the
ultimate goal to aggregate while locomoting. We define a fitness function that rewards agents
for both capabilities and use the OpenAI-ES algorithm to evolve such behavior. Our outcomes
indicate that the robots evolve a successful locomotion behavior, but do not aggregate. Truly,
the fitness function has been designed in a way that achieving a positive reward for aggregation
conflicts with the locomotion reward, which is considerably easier to obtain. Consequently,
agents behave as individuals living in the same environment but acting egoistically. However,
this work represents a first step towards creating a setup that enables the analysis of collective
behaviors using modern algorithms in advanced simulation environments.
   After a brief introduction about the common techniques and tools adopted in literature
(Subsections 1.1 and 1.2), the remaining part of the manuscript will cover the following: an
overview of related works to set the relative background (Section 2), followed by the presentation
of the theoretical framework (Section 3) and a description of the experimental setup (Section 4).
Subsequently, the obtained results will be presented (Section 5), and finally, the discussion and
conclusions will be drawn (Section 6).

1.1. Model-free machine learning methods for MASs
Model-free machine learning methods are techniques used in the field of machine learning,
where the learning algorithm does not explicitly create or maintain a model of the underlying
data distribution. Instead, these methods focus on learning directly from the data through trial
and error, often utilizing feedback signals such as rewards or penalties. Therefore, Reinforcement
Learning (RL) is a major example of this approach. In this sense, Evolutionary Strategies (ESs)
fall under this category because they use a fitness function to guide the optimization process
without explicitly modeling the underlying dynamics of the environment. Truly, Evolutionary
Algorithms (EAs), including ESs, can be classified as model-free optimization methods rather
than traditional model-free learning methods.
   Model-free methods are instrumental in scenarios involving complex, highly dynamic, or
poorly understood systems, situations that make it challenging to construct an accurate model.
These methods excel in tasks such as robotic control, game playing, and autonomous decision-
making in uncertain environments. Specifically, the decentralized approaches intrinsic to
model-free learning methods are well-suited for MASs, where agents must adapt and coordinate
with limited information. Indeed, these approaches provide a powerful framework for learning
adaptive behaviors in MASs, fostering the emergence of collective intelligence and enabling
agents to navigate dynamic and complex environments effectively. Definitely, applying model-
free machine learning in MASs represents opportunities for developing novel algorithms and
techniques tailored to multi-agent settings.

1.2. Physics Engines and Simulation Platforms
Physics engines are essential tools for simulating robots and evaluating their performance across
various tasks. There are several physics engines and simulation platforms used for simulating
robot scenarios. Among these physics engines, we can mention: (i) ODE (Open Dynamics
Engine) [7], the most famous open-source and high-performance library for simulating physics
dynamics. It focuses on real-time simulation and is capable of multi-agent simulation. Widely
used in robotics, games, and simulation environments, it is commonly used in research projects
[8–10]. (ii) PhysX [11], a robust physics engine developed by NVIDIA, known for its accuracy
and performance in simulating physical interactions. It supports real-time simulations and GPU
acceleration, making it popular in game development and high-fidelity simulations in robotics
scenarios [12, 13]. Finally, (iii) Bullet Physics [14], the open-source physics engine chosen
for this work. It includes support for collision detection, and rigid and soft body dynamics,
and is suitable for simulating complex robotic scenarios. These physics engines provide a
robust foundation for implementing simulation platforms, each offering unique features and
capabilities useful for studying swarm robotics scenarios. Notable simulation platforms include:

    • Gazebo [15] is a powerful open-source 3D robotics simulator widely used in research
      and development. It is a physics-realistic simulator built on ODE, although it recently
      supports multiple physics engines (including Bullet and others) and integrates with ROS
      (Robot Operating System) [16]. Due to its features, it is ideal for simulating complex
      environments with multiple robots, including swarm robotics.
    • CoppeliaSim (formerly V-REP) [17] is a versatile robot simulation software. It supports
      a wide range of robots and environments. The simulator offers four physics engines (i.e.,
      Bullet Physics, ODE, Newton, and Vortex Dynamics) and is frequently used for swarm
      robotics due to its flexibility and ease of creating complex interactions and behaviors.
    • Webots [18] is an open-source robot simulation software that allows the modeling,
      programming, and simulation of mobile robots built on ODE. It is excellent for educational
      purposes and research in swarm robotics, supporting realistic simulations and prototyping.
    • ARGoS (Autonomous Robots Go Swarming) [19] is a simulation framework specif-
      ically designed for swarm robotics. Its most important features are its high efficiency,
      capability of simulating large-scale robot groups, and customization of both the robots and
      the environment. It was primarily designed for research in swarm robotics to facilitate
      experiments and results acquisition.
    • FARSA (Framework for Autonomous Robotics Simulation Applications) [20] is
      a simulation framework specifically designed for studying and developing autonomous
      robots, with a particular focus on swarm robotics. It is particularly suited for researchers
      who develop and test algorithms for collective intelligence and autonomous decision-
      making in robotic systems. The major features of FARSA are its high modularity and
      extensibility, which allow users to extend and customize the simulation framework
      to specific needs. FARSA supports the rapid creation of custom robots and simulates
      complex environments in which the robots operate. FARSA is supported by a community
      of researchers and developers, contributing to its continuous improvement and the
      availability of shared resources.

1.2.1. PyBullet and AntBullet problem
PyBullet [2] is a physics simulator built on top of the Bullet physics engine [14]. Developed
primarily in C++, it was designed with a focus on robotics manipulation, enabling the simulation
of articulated rigid bodies, physical joints, and constraints. It also integrates learning algorithms
and supports a range of features including various sensors, gripper and multi-agent simulation,
and lightweight graphics rendering. Despite its advantages, PyBullet faces challenges such as
long simulation times and complexity in setting up the environment. However, its versatility
and open-source nature make it a valuable tool in robotics simulation and research. PyBullet
is easy to use with zero overhead when integrating a physics engine into a Python program.
Moreover, it is tailored to encourage and facilitate the use of constraint-based descriptions
to abstract physics for modern robotic learning algorithms. Therefore, it can be used with
any machine-learning technique that supports PyTorch [21], SimPy [22], in pure Python or
conjunction with CUDA [23].
   AntBullet [2] is an environmental simulation implemented in PyBullet where an ant robot
with 8 joints is simulated (AntBulletEnv). It serves as a benchmark for reinforcement-learning
simulations [24–28]. The goal is to reach a specified end position in the fewest steps possible.
In the context of benchmarking, the AntBullet environment is used to evaluate and compare
the performance of various algorithms. Generally, the goal is to optimize the movements of the
Ant robot, enabling it to perform effectively in the simulation. Therefore, AntBullet requires
minimal modifications to enable swarm-based robotics.


2. Related works
The background of the present proposal draws direct inspiration from [26]. In this work, the
authors describe an experiment comparing the efficacy of various neuro-evolutionary strategies
for continuous control optimization. They discuss different algorithms, including OpenAI-ES
[29], CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [30], sNES (Separable Natural
Evolutionary Strategy)[31], and xNES (Exponential Natural Evolution Strategy) [31], analyzing
their performance across a range of tasks. The experiment utilized a variety of benchmark
problems, such as locomotion tasks, Atari games, the double-pole balancing problem [32] and
a swarm foraging scenario introduced in [33]. The algorithms were evaluated based on their
ability to maximize a predefined reward function. Algorithm performance was assessed by the
total number of evaluations required to find a solution.
   The OpenAI-ES algorithm consistently outperformed other algorithms across all tested
problems, demonstrating robustness to changes in hyper-parameters. Results indicate that the
success of this method is attributed to the Adam optimizer [34] rather than the virtual batch
normalization technique [29, 35]. Furthermore, the effectiveness of the OpenAI-ES algorithm
also in achieving collective decision-making for aggregation tasks is demonstrated by the
results of our recent study [36]. In that work, we assessed the efficacy of the method both
quantitatively, through a performance analysis, and qualitatively by evaluating the emergent
behavior in different environmental multi-agent setups.
   Similar analyses were conducted in [37], where two evolutionary algorithms, CMA-ES and
xNES, were compared in the context of a group of robots making collective decisions by means
of aggregation behaviors. The aim of the study was to determine to what extent the performance
and the distribution of the robots was affected by the environmental conditions (i.e., dimensions
of the sites), and to evaluate the final aggregation of the swarm.
   In particular, regarding the AntBullet problem [26], which represents the starting point of
this study, the OpenAI-ES evolutionary strategy outperforms the PPO (Proximal-Policy Opti-
mization) reinforcement learning algorithm [38], despite encountering some issues with the
reward function. Specifically, achieving effective behaviors and optimal performance across all
replications requires a small adjustment in the reward function, typically a bonus or punishment
of approximately ±0.01. Primarily, the authors recommend that future comparisons between
evolutionary and reinforcement learning algorithms should incorporate reward functions specif-
ically tailored to each algorithm class. This emphasis underscores the importance of utilizing
appropriate reward functions, as evidenced by the fact that reward functions designed for
reinforcement learning may not necessarily be effective for evolutionary strategies and vice
versa. Therefore, the study highlights the significant impact of the employed reward function
on algorithm performance (for a review of the fitness/reward functions used for Evolutionary
Algorithms, see [39]). In this context, Table 1 summarizes some interesting works by comparing
the algorithms used, the type of task, the defined fitness function, and the homogeneity of the
group. It serves to place our proposal in the context of the state-of-the-art.

Table 1
State-of-the-art. Unintroduced acronyms: SARSA (State-Action-Reward-State-Action), DQN_RLaR
(Deep Q-Learning with Reinforcement Learning as a Rehersal), RLA (Reinforcement Learning-based
Aggregation), FLDDPG (Federated Learning Deep Deterministic Policy Gradient), STDP (Spike timing
dependent plasticity), MOEA/D (MultiObjective Evolutionary Algorithm based on Decomposition), GGA
(Generational Genetic Algorithm).
                Work                   Approach          Algorithm                     Task            Fitness   Homogeneous
        Iima and Kuroe [40]              RL             SARSA [41]                Path planning        Mono          Yes
     Nguyen and Banerjee [42]            RL             DQN_RLaR                     Foraging          Mono          Yes
     Sadeghi Amjadi et al. [43]          RL               RLA [43]                 Aggregation         Mono          Yes
            Na et al. [44]               RL             FLDDPG [44]                Navigation          Multi         Yes
         Vitanza et al. [45]             RL              STDP [46]              Role Specialization    Mono          Yes
 Ordaz-Rivas and Torres-Trevi𝑛o
                              ̃ [47]     EA            MOEA/D [48]                 Localization        Multi         Yes
          Trianni et al. [49]            EA              GGA [50]                  Aggregation         Mono          Yes
         Kengyel et al. [51]             EA       Wolfpack-inspired EA [52]        Aggregation         Mono          No
                                                        CMA-ES [30]
                                                       OpenAI-ES [29]
         Pagliuca et al. [26]            EA                                          Foraging          Mono          Yes
                                                         sNES [31]
                                                         xNES [31]
                                                          CMA-ES
    Pagliuca and Vitanza [36, 37]        EA              OpenAI-ES                 Aggregation         Multi         Yes
                                                           xNES
                                                                                     Foraging
     Pagliuca and Vitanza [53]           EA                 GGA               Escaping from Predator   Mono          No
                                                                                   Aggregation
     Pagliuca and Vitanza (this)         EA              OpenAI-ES                 Aggregation         Multi         Yes
Algorithm 1 OpenAI-ES algorithm
 1: Initialize:
        𝜎 ← 0.02
        𝜆 = 20
        𝛾, 𝑓, 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑟, 𝜃0
 2: for 𝑡 ← 0 to 𝛾 do
 3:     for 𝑖 ← 0 to 𝜆 do
 4:          sample noise vector: 𝜖𝑖 ∼ 𝒩 (0, 𝕀)
 5:          evaluate score: 𝑠𝑖 + ← 𝑓 (𝜃𝑡 + 𝜎 × 𝜖𝑖 )
 6:          evaluate score: 𝑠𝑖 − ← 𝑓 (𝜃𝑡 − 𝜎 × 𝜖𝑖 )
 7:     end for
 8:     compute normalized ranks: 𝑢 ← 𝑟𝑎𝑛𝑘𝑠(𝑠), 𝑢𝑖 ∈ [−0.5, 0.5]
                                       𝜆
 9:     estimate gradient: 𝑔𝑡 ← 𝜆1 ∑𝑖=1 𝑢𝑖 × 𝜖𝑖
10:     𝜃𝑡+1 ← 𝜃𝑡 + 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑟(𝑔𝑡 )
11: end for


3. Evolutionary Strategy for Aggregation
Intending to investigate the possibility of synthesizing a successful aggregation behavior in a
swarm of AntBullet robots, we employed the OpenAI-ES algorithm [26, 29, 54] (for a description,
see Alg. 1), which has been adopted to evolve aggregation in similar settings [36, 55]. Specifically,
we considered a swarm of five homogeneous AntBullet robots, controlled through a feed-forward
neural network with 28 inputs, 50 internal units and 8 outputs. The list of inputs and outputs of
the controller is reported in Table 2.
   As we pointed out in Section 1.2.1, the AntBullet robot aims to locomote towards a target
direction. The fitness function used to evolve such behavior is defined in [26] and can be
expressed by Eq. 1:

                                       𝐹𝑎𝑛𝑡 = 𝑃 + 0.01 + 𝑆 + 𝐽                                     (1)
  where the components 𝑃, 𝑆 and 𝐽 are computed as follows:

                                       𝑃 = ‖𝑝𝑜𝑠𝑐𝑢𝑟𝑟 − 𝑝𝑜𝑠𝑝𝑟𝑒𝑣 ‖                                    (2)

                                                        𝑁𝑚
                                                     1
                                      𝑆 = −0.01 ∗      ∑𝑚 2                                        (3)
                                                    𝑁𝑚 𝑗=1 𝑗

                                           𝐽 = −0.1 ∗ 𝑁𝑗                                           (4)
  In Eqs. 2 - 4, 𝑝𝑜𝑠𝑐𝑢𝑟𝑟 indicates the current position; 𝑝𝑜𝑠𝑝𝑟𝑒𝑣 denotes the previous position; ‖⋅‖ is
the norm operator; 𝑁𝑚 represents the number of motors; 𝑚𝑗 is the value of the 𝑗 − 𝑡ℎ motor; 𝑁𝑗
indicates the number of joints at limit.
  Starting from 𝐹𝑎𝑛𝑡 , we extended the fitness function to deal with our collective scenario, in
which the robots must aggregate while locomoting. In particular, we computed the components
Table 2
List of input and output data to the robot’s neural network controller. Concerning the inputs, the
symbols are defined as follows: 𝑧 denotes the z coordinate (i.e., height) of the agent; 𝑧𝑖𝑛𝑖𝑡 indicates
the initial z coordinate; 𝑎𝑛𝑔𝑙𝑒𝑡𝑜_𝑡𝑎𝑟𝑔𝑒𝑡 represents the relative angle with the target location; 𝑣𝑥 , 𝑣𝑦 and
𝑣𝑧 indicate the robot velocities along the three axes; 𝑟𝑜𝑙𝑙 and 𝑝𝑖𝑡𝑐ℎ are the roll and pitch of the robot;
(𝑝𝑜𝑠𝑗 , 𝑣𝑒𝑙𝑗 ) represents the position and velocity of the 𝑗 − 𝑡ℎ joint; 𝑓 𝑜𝑜𝑡_𝑐𝑜𝑛𝑡𝑎𝑐𝑡𝑓 denotes the contact flag of
the 𝑓 − 𝑡ℎ foot (i.e., 1 if the foot touches the ground, 0 otherwise). For outputs, the symbol 𝑚𝑗 represents
the motor value applied to the 𝑗 − 𝑡ℎ joint.
                 Id                                            Input
                 0                                          𝑧 − 𝑧𝑖𝑛𝑖𝑡
                 1                                    sin(𝑎𝑛𝑔𝑙𝑒𝑡𝑜_𝑡𝑎𝑟𝑔𝑒𝑡 )
                 2                                    cos(𝑎𝑛𝑔𝑙𝑒𝑡𝑜_𝑡𝑎𝑟𝑔𝑒𝑡 )
                 3                                          0.3 × 𝑣𝑥
                 4                                          0.3 × 𝑣𝑦
                 5                                          0.3 × 𝑣𝑧
                 6                                            𝑟𝑜𝑙𝑙
                 7                                           𝑝𝑖𝑡𝑐ℎ
               8–23                          (𝑝𝑜𝑠𝑗 , 𝑣𝑒𝑙𝑗 ) 𝑓 𝑜𝑟𝑗 ∈ 𝑁 𝑢𝑚_𝑗𝑜𝑖𝑛𝑡𝑠
               24–27    𝑓 𝑜𝑜𝑡_𝑐𝑜𝑛𝑡𝑎𝑐𝑡𝑓   𝑓 𝑜𝑟𝑓 ∈ [𝑓 𝑟𝑜𝑛𝑡_𝑙𝑒𝑓 𝑡/𝑟𝑖𝑔ℎ𝑡_𝑓 𝑜𝑜𝑡, 𝑏𝑎𝑐𝑘_𝑙𝑒𝑓 𝑡/𝑟𝑖𝑔ℎ𝑡_𝑓 𝑜𝑜𝑡]
                 Id                                          Output
                0–7                                𝑚𝑗     𝑓 𝑜𝑟𝑗 ∈ 𝑁 𝑢𝑚_𝑗𝑜𝑖𝑛𝑡𝑠


𝑃, 𝑆 and 𝐽 for each of the five agents (i.e., 𝑃𝑖 , 𝑆𝑖 and 𝐽𝑖 for the generic 𝑖 − 𝑡ℎ agent), and we
introduced a new component 𝐷 rewarding agents for the capability of staying at a target
distance (set to 1.5m) from the other mates. Given an agent 𝑖, the component can be defined as
follows:
                                                              𝑁 −1
                                               −100∗ 𝑁 1−1 ∑𝑗=1 |𝑑𝑖𝑠𝑡𝑡𝑎𝑟𝑔𝑒𝑡 −𝑑𝑖𝑠𝑡𝑖𝑗 |
                                      𝐷𝑖 = 𝑒                                                                   (5)
  where 𝑁 is the number of agents (here 𝑁 = 5), 𝑑𝑖𝑠𝑡𝑡𝑎𝑟𝑔𝑒𝑡 represents the desired target distance
between each pair of agents and 𝑑𝑖𝑠𝑡𝑖𝑗 denotes the distance between the agents 𝑖 and 𝑗.
  From Eq. 5, it is evident that robots very far from their mates will not receive any score.
Similarly, collisions between peers (i.e., distance below 𝑑𝑖𝑠𝑡𝑡𝑎𝑟𝑔𝑒𝑡 ) are discouraged since they
prevent agents from locomoting. The 𝐷 component is intended to foster aggregation in the
swarm during the motion. Accordingly, the following equation defines the resulting fitness
function for the multi-agent scenario:
                                                     𝑁
                                                1
                                         𝐹=       ∑ 𝑃 + 𝐷𝑖 + 𝑆𝑖 + 𝐽𝑖                                           (6)
                                                𝑁 𝑖=1 𝑖
  where the single components refer to the generic agent 𝑖. We remove the bonus of 0.01 to
prevent local minima behaviors, such as staying still.
4. Experimental Setup
We extended the AntBullet problem [2] to deal with a collective scenario involving a group of 5
robots. The starting locations of the agents remain almost constant, and the initial formation of
the swarm is a cross, as shown in Fig. 1. A small amount of noise is added to the joints’ initial
positions to make the problem stochastic (parameter input noise in Table 3).


Figure 1: The evolutionary task. The swarm consists of five AntBullet robots. Initially, the agents are
placed to form a cross.


  The experiment has been repeated 30 times. Evolution lasts 5 × 107 evaluation steps. The
ability of the swarm to cope with the problem is estimated by evaluating the group in a single
episode lasting up to 1000 steps. Moreover, the generalization capability of the swarm is
computed over 3 post-evaluation episodes, each lasting a maximum of 1000 steps. An evaluation
episode is prematurely stopped if at least one of the agents falls on the ground. The full list of
evolutionary parameters is provided in Table 3.


5. Results and Analysis
In this section, we report the outcomes of our experiments. The average fitness (𝐴𝑣𝑔𝐹) obtained
in 30 replications of the experiment is 1769.389, with a standard deviation (𝑆𝑡𝑑𝐹) of 419.417.
The best replication achieved a fitness score of 2551.119, while the worst replication obtained a
fitness value of 862.517. Overall, 16 out of 30 replications (i.e., 53.33%) got a fitness value over
average (see Fig. 2).
   If we analyze the impact of the 𝑃, 𝐷, 𝑆 and 𝐽 components of the fitness function 𝐹, we observe
that the score is mostly due to the first one (see Fig. 3). In fact, progressing toward a target (𝑃)
is easier and does not depend on the behavioral capabilities of the mates. Conversely, reducing
the distance from the peers (𝐷) — i.e. aggregating — implies moving towards other mates, thus
decreasing the reward provided by the 𝑃 component. However, the 𝐷 component rewards agents
Table 3
Evolutionary parameters. The symbols 𝛾, 𝜆 and 𝜎 have the same meaning as indicated in Alg. 1. The 𝑈
symbol denotes the uniform distribution.
                                          Parameter                         Value
                                         # of replications                      30
                                     # of evaluation steps (𝛾)               5 × 107
                                   # of symmetric samples (𝜆)                   20
                                                 𝜎                            0.02
                                            # of robots                          5
                                           # of episodes                        1
                                  # of post-evaluation episodes                  3
                                        # of episode steps                    1000
                                       # of hidden neurons                      50
                              activation function (hidden neurons)            tanh
                              activation function (output neurons)           linear
                                            input noise                𝜂𝑖𝑛 ∈ 𝑈 (−0.1, 0.1)
                                           output noise              𝜂𝑜𝑢𝑡 ∈ 𝑈 (−0.01, 0.01)


only if the mutual distance with their peers is very close to the target distance 𝑑𝑖𝑠𝑡𝑡𝑎𝑟𝑔𝑒𝑡 (see
Eq. 5). This explains why it does not play any role in this context. Similarly, the 𝑆 component
has almost no impact on 𝐹, that is the OpenAI-ES algorithm synthesizes solutions not applying


                       2500


                       2000
         Performance


                       1500


                       1000


                       500


                         0
                               S1
                               S2
                               S3
                               S4
                               S5
                               S6
                               S7
                               S8
                               S9
                              S10
                              S11
                              S12
                              S13
                              S14
                              S15
                              S16
                              S17
                              S18
                              S19
                              S20
                              S21
                              S22
                              S23
                              S24
                              S25
                              S26
                              S27
                              S28
                              S29
                              S30


Figure 2: Fitness distribution. Light blue bars represent the performance obtained in each of the 30
replications of the experiment. The horizontal line marks the average fitness achieved.
strong values to the motors. Finally, the 𝐽 component provides a punishment of around 0.25
per step during the evaluation episode. This means that the AntBullet robots typically push
the joints of one of the four legs at their limits. This strategy is helpful to keep balance on the
ground (see Fig. 4).


             2.0

             1.5
                                                                                          P
             1.0                                                                          D
                                                                                          S
                                                                                          J
             0.5                                                                          F


             0.0

             0.5
                   0          200         400           600       800         1000
                                                Steps
Figure 3: Fitness components (𝑃, 𝐷, 𝑆 and 𝐽, see Eqs. 2, 4) and overall fitness 𝐹 (see Eq. 6). These data
have been collected from 30 replications of the experiment.


Figure 4: An example of the behavioral strategy of one AntBullet robot: the back right leg is almost
fully extended in order to maintain stability on the ground.


   Overall, our outcomes demonstrate that the fitness function 𝐹 makes the robots act egoistically,
since the two objectives — locomote and aggregate — conflict with each other. This strategy is
spontaneously discovered by the OpenAI-ES algorithm, which attempts to optimize 𝐹. A similar
finding can be seen in [56], where no global incentive to cooperate or compete is given to the
group, and the chosen fitness does not drive agents to a preferred solution. In this case, the
evolutionary strategy evolves agents that implicitly include selfish sub-tasks if the coordination
within a team proves complex. Indeed, in this way, they can adapt to situational circumstances
and may change their propensity depending on the specific situation.
   Moreover, the individualistic inclination of the agents can be further explained by considering
that the group is formed by homogeneous robots, which share the same network controller and
the same physical embodiment. However, in [53] the authors show how the group dynamics
in heterogeneous MASs are influenced by the different skills of the agents, which are more
adaptive and behave according to the particular mates they are evaluated with.


                                                                       8                                                                6
                                                                       6                                                               4
                                                                      4
                                                                         y                                                             2 y
                                                                      2
                                                                                                                                       0
                                                                      0
                                                                                                                                       2
                                                                      2
                                                                      4                                                                4

                                                                 40                                                               40
                                                            30                                                               30
              0                                          20 x                0                                          20
                  200                                                            200                                         x
                        400                         10                                 400                         10
                         steps 600 800          0                                       steps 600 800          0
                                         1000                                                           1000

                                   (a)                                                            (b)


                                                                                                                                       4
                                                                      2                                                                2
                                                                      0                                                                0
                                                                       2 y                                                             2
                                                                                                                                         y
                                                                      4
                                                                                                                                       4
                                                                      6
                                                                                                                                       6
                                                                      8

                                                                 40                                                              40
                                                           30                                                             30
             0                                           20 x                0                                          20 x
                  200                                                            200
                        400                         10                                 400                         10
                         steps 600 800          0                                       steps 600 800          0
                                         1000                                                           1000

                                   (c)                                                            (d)
Figure 5: Examples of trajectories performed by the agents of the swarm. The behavioral strategies of
the robots evolved in the replications that obtained a fitness score 𝐹 > 𝐴𝑣𝑔𝐹 + 𝑆𝑡𝑑𝐹: S6 (a), S13 (b), S14
(c) and S20 (d) (see Fig. 3 for details) were analyzed. The robots usually move by maintaining the initial
cross formation throughout the evaluation episode (e.g., (a), (b), (d)). In some cases, the pressure to stay
closer emerges during the motion, with some of the agents moving close and even crossing their paths
(e.g., (c)). Nevertheless, the overall mutual distance between the group’s members is not sufficient to get
a score from the 𝐷 component (see Fig. 3). This explains why the 𝑃 component of the fitness function
plays a more relevant role than the 𝐷 component.


  Fig. 5 shows the behavioral strategies exhibited by the agents evolved in the replications
where 𝐹 > 𝐴𝑣𝑔𝐹 + 𝑆𝑡𝑑𝐹. Data are collected in a post-evaluation episode. As can be seen,
the agents generally move towards the target regardless of the behavior of their mates. The
pressure to aggregate does not emerge throughout the episode, although some of the robots
may sometimes cross their paths (see Fig. 5, c).
   To summarize, the results reveal the fitness function’s importance on the swarm’s evolved
behavior. Indeed, the agents successfully tackle the evolutionary problem (i.e., they manage to
locomote towards a target destination), though the aggregation is not reached. Carefully design-
ing the fitness function to address all the objectives is paramount for enhancing a successful
collective strategy.


6. Discussion and Conclusion
In this work, we proposed a theoretical framework to analyze collective behaviors in MASs by
exploiting modern simulation environments and using state-of-the-art Evolutionary Algorithms.
In particular, we extended the benchmark AntBullet problem to a collective scenario involving a
swarm of five AntBullet robots, whose goal was to aggregate during locomotion. We designed a
multi-objective fitness function rewarding agents for both capabilities. The results we achieved
demonstrate that agents successfully developed a locomotion capability, but they did not
aggregate. Undoubtedly, the definition of a multi-objective fitness function presents several
challenges because, in many cases, the objectives conflict with each other. Essentially, improving
performance on one objective might degrade performance on another. This was specifically the
case where the attempt to optimize the 𝐷 component might have degraded the 𝑃 component. This
contrast made it difficult to find an optimal solution that satisfied all objectives simultaneously.
Moreover, in this case, a problem with scaling may have occurred. The different components
might have had different scales during the trials, especially because, in dynamic environments,
the importance of different objectives might change over time. This implied that one objective
might have dominated the others, leading to biased results. This aligns with the recommendation
from the aforementioned study [26], emphasizing the importance of utilizing appropriate
reward functions tailored to each task. Thus, addressing these challenges often requires careful
consideration of the specific characteristics of the problem at hand.
   Regarding future research directions, several ideas emerge. Undoubtedly, future work should
focus on refining the multi-objective fitness function to better balance conflicting sub-goals.
Additionally, investigating alternative state-of-the-art evolutionary algorithms could produce
more sophisticated strategies for achieving both locomotion and aggregation. Comparing these
alternatives with OpenAI-ES, as done in some of our works, might provide insights into which
algorithms are more suitable for multi-objective tasks in dynamic environments. Moreover,
introducing communication mechanisms among the agents could potentially enhance the
emergence of aggregation behavior. Understanding how communication affects the swarm’s
ability to aggregate while maintaining effective locomotion could lead to developing more
sophisticated strategies. Lastly, as we discussed in Section 5, using a heterogeneous MAS similar
to [53] might foster the emergence of more complex dynamics, in which the individual skills
may play different roles and the resulting overall behavior cannot be predicted a prior. In
conjunction with this, efficient communication requires strong collaboration, and robots need
more intelligence to handle unknown situations. These demands are challenging to meet with
current architectures and hardware. Creating communication networks between heterogeneous
agents and designing efficient communication protocols are key challenges that need further
exploration. Indeed, considering hybrid solutions within the context of Edge Intelligence
platforms (EIP) or the Internet of Things (IoT), combined with evolutionary strategies for agent
populations, could be a promising direction for future research.


Acknowledgement
A.V. acknowledges support from the PNRR MUR project PE0000013-FAIR - Future Artificial
Intelligence Research.


References
 [1] P. G. Balaji, D. Srinivasan, An Introduction to Multi-Agent Systems, Springer Berlin
     Heidelberg, Berlin, Heidelberg, 2010, pp. 1–27.
 [2] E. Coumans, Y. Bai, Pybullet, a python module for physics simulation for games, robotics
     and machine learning (2016).
 [3] J.-L. Deneubourg, A. Lioni, C. Detrain, Dynamics of aggregation and emergence of
     cooperation, The Biological Bulletin 202 (2002) 262–267.
 [4] S. Garnier, C. Jost, R. Jeanson, J. Gautrais, M. Asadpour, G. Caprari, G. Theraulaz, Aggre-
     gation behaviour as a source of collective decision in a group of cockroach-like-robots, in:
     European conference on artificial life, Springer, 2005, pp. 169–178.
 [5] R. Jeanson, C. Rivault, J.-L. Deneubourg, S. Blanco, R. Fournier, C. Jost, G. Theraulaz,
     Self-organized aggregation in cockroaches, Animal behaviour 69 (2005) 169–180.
 [6] B. Oldroyd, A. Smolenski, S. Lawler, A. Estoup, R. Crozier, Colony aggregations in apis
     mellifera l, Apidologie 26 (1995) 119–130.
 [7] R. Smith, Open dynamics engine, 2008. URL: http://www.ode.org/.
 [8] V. Ondroušek, The Solution of 3D Indoor Simulation of Mobile Robots Using ODE,
     Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 215–220. doi:10.1007/
     978- 3- 642- 05022- 0_37 .
 [9] P. Arena, L. Patané, P. S. Termini, A. Vitanza, R. Strauss, Software/hardware issues in
     modelling insect brain architecture, in: S. Jeschke, H. Liu, D. Schilberg (Eds.), Intelligent
     Robotics and Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, pp. 46–55.
[10] A robotic simulation framework for cognitive systems, Spatial Temporal Patterns for
     Action-Oriented Perception in Roving Robots II: An Insect Brain Computational Model
     (2014) 153–176.
[11] NVIDIA, Physx library, 2008. URL: https://developer.nvidia.com/physx-sdk.
[12] J. Lächele, A. Franchi, H. H. Bülthoff, P. Robuffo Giordano, Swarmsimx: Real-time simula-
     tion environment for multi-robot systems, in: I. Noda, N. Ando, D. Brugali, J. J. Kuffner
     (Eds.), Simulation, Modeling, and Programming for Autonomous Robots, Springer Berlin
     Heidelberg, Berlin, Heidelberg, 2012, pp. 375–387.
[13] M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y. Guo, H. Mazhar,
     A. Mandlekar, B. Babich, G. State, M. Hutter, A. Garg, Orbit: A unified simulation frame-
     work for interactive robot learning environments, IEEE Robotics and Automation Letters
     8 (2023) 3740–3747. doi:10.1109/LRA.2023.3270034 .
[14] E. Coumans, Bullet physics simulation, in: ACM SIGGRAPH 2015 Courses, SIGGRAPH ’15,
     Association for Computing Machinery, New York, NY, USA, 2015. doi:10.1145/2776880.
     2792704 .
[15] N. Koenig, A. Howard, Design and use paradigms for Gazebo, an open-source multi-robot
     simulator, in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
     (IEEE Cat. No.04CH37566), volume 3, 2004, pp. 2149–2154.
[16] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng, et al.,
     Ros: an open-source robot operating system, in: ICRA workshop on open source software,
     volume 3, Kobe, Japan, 2009, p. 5.
[17] E. Rohmer, S. P. N. Singh, M. Freese, Coppeliasim (formerly v-rep): a versatile and scalable
     robot simulation framework, in: Proc. of The International Conference on Intelligent
     Robots and Systems (IROS), 2013.
[18] Webots, http://www.cyberbotics.com, ???? URL: http://www.cyberbotics.com, open-source
     Mobile Robot Simulation Software.
[19] C. Pinciroli, V. Trianni, R. O’Grady, G. Pini, A. Brutschy, M. Brambilla, N. Mathews,
     E. Ferrante, G. Di Caro, F. Ducatelle, M. Birattari, L. M. Gambardella, M. Dorigo, Argos: a
     modular, parallel, multi-engine simulator for multi-robot systems, Swarm intelligence 6
     (2012) 271–295.
[20] G. Massera, T. Ferrauto, O. Gigliotta, S. Nolfi, FARSA: An Open Software Tool for Embodied
     Cognitive Science, in: Proceedings of the 12th European Conference on Artificial Life
     (ECAL 2013), 2013, pp. 538–545.
[21] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
     N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani,
     S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style,
     high-performance deep learning library, 2019. arXiv:1912.01703 .
[22] A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar,
     S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi,
     H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka, A. Saboo,
     I. Fernando, S. Kulal, R. Cimrman, A. Scopatz, Sympy: symbolic computing in python,
     PeerJ Computer Science 3 (2017) e103. doi:10.7717/peerj- cs.103 .
[23] S. Cook, CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st
     ed., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2012.
[24] R. Akrour, D. Tateo, J. Peters, Continuous action reinforcement learning from a mixture of
     interpretable experts, IEEE Transactions on Pattern Analysis and Machine Intelligence 44
     (2021) 6795–6806.
[25] S. Dankwa, W. Zheng, Twin-delayed ddpg: A deep reinforcement learning technique to
     model a continuous movement of an intelligent robot agent, in: Proceedings of the 3rd
     international conference on vision, image and signal processing, 2019, pp. 1–5.
[26] P. Pagliuca, N. Milano, S. Nolfi, Efficacy of modern neuro-evolutionary strategies for
     continuous control optimization, Frontiers in Robotics and AI 7 (2020) 98.
[27] D. Reda, T. Tao, M. van de Panne, Learning to locomote: Understanding how environ-
     ment design matters for deep reinforcement learning, in: Proceedings of the 13th ACM
     SIGGRAPH Conference on Motion, Interaction and Games, 2020, pp. 1–10.
[28] Q. Zhang, L. Zhang, Q. Ma, J. Xue, The lstm-per-td3 algorithm for deep reinforcement
     learning in continuous control tasks, in: 2023 China Automation Congress (CAC), IEEE,
     2023, pp. 671–676.
[29] T. Salimans, J. Ho, X. Chen, S. Sidor, I. Sutskever, Evolution strategies as a scalable
     alternative to reinforcement learning (2017).
[30] N. Hansen, A. Ostermeier, Completely derandomized self-adaptation in evolution strategies,
     Evolutionary computation 9 (2001) 159–195.
[31] D. Wierstra, T. Schaul, T. Glasmachers, Y. Sun, J. Peters, J. Schmidhuber, Natural evolution
     strategies, The Jrnl of Machine Learning Research 15 (2014).
[32] A. P. Wieland, Evolving neural network controllers for unstable systems, in: IJCNN-
     91-Seattle International Joint Conference on Neural Networks, volume 2, IEEE, 1991, pp.
     667–673.
[33] P. Pagliuca, S. Nolfi, Robust optimization through neuroevolution, PLOS ONE 14 (2019)
     1–27.
[34] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980 (2014).
[35] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved
     techniques for training gans, Advances in neural information processing systems 29
     (2016).
[36] P. Pagliuca, A. Vitanza, Self-organized aggregation in group of robots with OpenAI-ES, in:
     Int. Conf. on Soft Computing and Pattern Recognition, Springer, 2022, pp. 770–780.
[37] P. Pagliuca, A. Vitanza, Evolving aggregation behaviors in swarms from an evolutionary
     algorithms point of view, in: Applications of Artificial Intelligence and Neural Systems to
     Data Science, Springer, 2023, pp. 317–328.
[38] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization
     algorithms, arXiv preprint arXiv:1707.06347 (2017).
[39] A. L. Nelson, G. J. Barlow, L. Doitsidis, Fitness functions in evolutionary robotics: A survey
     and analysis, Robotics and Autonomous Systems 57 (2009) 345–370.
[40] H. Iima, Y. Kuroe, Swarm reinforcement learning algorithms based on sarsa method, in:
     2008 SICE Annual Conference, IEEE, 2008, pp. 2045–2049.
[41] R. S. Sutton, A. G. Barto, Temporal-difference learning (1998).
[42] T. Nguyen, B. Banerjee, Reinforcement learning as a rehearsal for swarm foraging, Swarm
     Intelligence 16 (2022) 29–58.
[43] A. Sadeghi Amjadi, C. Bilaloğlu, A. E. Turgut, S. Na, E. Şahin, T. Krajník, F. Arvin, Re-
     inforcement learning-based aggregation for robot swarms, Adaptive Behavior 32 (2024)
     265–281.
[44] S. Na, T. Rouček, J. Ulrich, J. Pikman, T. Krajník, B. Lennox, F. Arvin, Federated reinforce-
     ment learning for collective navigation of robotic swarms, IEEE Transactions on cognitive
     and developmental systems 15 (2023) 2122–2131.
[45] A. Vitanza, L. Patané, P. Arena, Spiking neural controllers in multi-agent competitive
     systems for adaptive targeted motor learning, Journal of the Franklin Institute 352 (2015)
     3122–3143. URL: https://www.sciencedirect.com/science/article/pii/S001600321500174X.
     doi:https://doi.org/10.1016/j.jfranklin.2015.04.014 , special Issue on Advances
     in Nonlinear Dynamics and Control.
[46] S. Song, K. D. Miller, L. F. Abbott, Competitive hebbian learning through spike-timing-
     dependent synaptic plasticity, Nat Neurosci 3 (2000) 919–926.
[47] E. Ordaz-Rivas, L. Torres-Treviño, Improving performance in swarm robots using multi-
     objective optimization, Mathematics and Computers in Simulation 223 (2024) 433–457.
[48] Q. Zhang, H. Li, Moea/d: A multiobjective evolutionary algorithm based on decomposition,
     IEEE Transactions on evolutionary computation 11 (2007) 712–731.
[49] V. Trianni, R. Groß, T. H. Labella, E. Şahin, M. Dorigo, Evolving aggregation behaviors in
     a swarm of robots, in: Advances in Artificial Life: 7th European Conference, ECAL 2003,
     Dortmund, Germany, September 14-17, 2003. Proceedings 7, Springer, 2003, pp. 865–874.
[50] J. J. Grefenstette, et al., Genetic algorithms for changing environments, in: Ppsn, volume 2,
     Citeseer, 1992, pp. 137–144.
[51] D. Kengyel, H. Hamann, P. Zahadat, G. Radspieler, F. Wotawa, T. Schmickl, Potential of
     heterogeneity in collective behaviors: A case study on heterogeneous swarms, in: PRIMA
     2015: Principles and Practice of Multi-Agent Systems: 18th International Conference,
     Bertinoro, Italy, October 26-30, 2015, Proceedings 13, Springer, 2015, pp. 201–217.
[52] P. Zahadat, T. Schmickl, Wolfpack-inspired evolutionary algorithm and a reaction-diffusion-
     based controller are used for pattern formation., in: GECCO, 2014, pp. 241–248.
[53] P. Pagliuca, A. Vitanza, N-mates evaluation: a new method to improve the performance
     of genetic algorithms in heterogeneous multi-agent systems., Proceedings of the 24th
     Edition of the Workshop From Object to Agents (WOA23) 3579 (2023) 123–137.
[54] P. Pagliuca, S. Nolfi, The dynamic of body and brain co-evolution, Adaptive Behavior 30
     (2022) 245–255.
[55] J. Rais Martínez, F. Aznar Gregori, Comparison of evolutionary strategies for reinforcement
     learning in a swarm aggregation behaviour, in: Proceedings of the 2020 3rd International
     Conference on Machine Learning and Machine Intelligence, 2020, pp. 40–45.
[56] P. Pagliuca, D. Inglese, A. Vitanza, Measuring emergent behaviors in a mixed competitive-
     cooperative environment, International Journal of Computer Information Systems and
     Industrial Management Applications 15 (2023) 69–86.

</pre>