1. Introduction

P. Pagliuca);

1613-0073

Paolo Pagliuca

paolo.pagliuca@istc.cnr.it 0

Giuseppe Trivisano

g.trivisano@alumni.uniba.it 1

Alessandra Vitanza

alessandra.vitanza@istc.cnr.it 0

Workshop

Multi-Agent Systems, Multi-Objective Optimization, Evolutionary Algorithms, OpenAI-ES, Aggregation

0 Institute of Cognitive Sciences and Technologies, National Research Council (CNR-ISTC) , Rome , Italy 1 Università degli Studi di Bari Aldo Moro , Bari , Italy

000 0 0002

Multi-Agent Systems are characterized by multiple agents interacting to solve tasks that may be dificult, or even impossible, for a single agent. While discovering solutions to problems with a single objective might be relatively straightforward, the picture changes when coping with Multi-Objective Optimization (MOO), where problems require the simultaneous optimization of multiple objectives that potentially conflict with each other. This is particularly relevant in Multi-Agent Systems (MASs), since each agent's behavior afects the overall system performance. For example, the capability of a system, composed of many robots, to both locomote and aggregate simultaneously requires the definition of appropriate fitness measures and the usage of suitable algorithms. In this work, we investigate the conditions necessary to promote aggregation in a robotic MAS, with a particular focus on how conflicting objectives can hinder the learning of efective behaviors. Specifically, we designed a novel fitness function and tested it in a relatively simple aggregation scenario. Furthermore, we considered a recently introduced MOO problem, in which a MAS of five robots must develop the ability to aggregate while in motion. Our outcomes show that, despite the challenges in designing efective fitness functions, the proposed formulation successfully supports aggregation in the simpler scenario and enhances aggregation capabilities in the more complex one.

1. Introduction

Multi-Agent Systems (MASs) [ 1–3 ] are characterized by the presence of multiple intelligent entities, termed as “agents”, that coexist in a shared environment and can interact, thus reciprocally influencing. As a result of these local interactions, MASs are capable of developing very complex behaviors and can solve problems that are not feasible for a single agent. In particular, a MAS requires the presence of at least two agents (i.e., a two-agent system or dyad). Thanks to the interaction between agents, the whole system can give rise to sophisticated behaviors. Examples of complex strategies discovered in robotic MASs include foraging [ 4–11 ], aggregation [12–18] and predator avoidance [19–23]. A subfield of MASs is swarm robotics [24–26], where a group of agents (e.g., robots or drones) performs tasks not feasible for a single element. The idea is to mimic biological systems like colonies of ants, herds of sheep, schools of fish or flocks of birds [

27, 28].

Generally, the development of the agents’ skills in a MAS is often achieved through Evolutionary Algorithms (EAs) [29], i.e., optimization methods that are inspired from biological evolution and are capable of providing solutions to a broad set of problems, such as classic control [30–32], robot navigation [33–35], foraging [ 4, 36 ], function optimization [37, 38], or competitive co-evolution [39, 40].

EAs have proven to be valuable tools to cope with Multi-Objective Optimization (MOO) problems [41–44], in which multiple conflicting objectives must be optimized simultaneously [ 45]. In such scenarios, findings solutions that satisfy all objectives is very dificult. Therefore, Pareto optimality (A. Vitanza)

CEUR

ceur-ws.org [46] is used to identify a set of valuable solutions from which the experimenter can select the best. MOO in a MAS is a significantly complex scenario, where multiple agents must interact in order to optimize diferent, often conflicting, goals. Finding solutions to this kind of problems is far from trivial. As illustrated in [47], the authors employed OpenAI-ES (OpenAI Evolutionary Strategy) [48] to evolve two behaviors, locomotion and aggregation, in a swarm of five Ant Pybullet robots [ 49]. Although the evolved agents exhibited good locomotion capability, they failed to develop any form of aggregation. This underscores the dificulty of simultaneously optimizing multiple objectives, especially in MASs. Moreover, the study emphasizes that the definition of the fitness function is paramount in these kinds of problems.

Building on these insights and taking inspiration from this previous study, this work explores how these conflicting objectives interact to shape the evolution of distinct behaviors in a MAS. To this end, we first introduced a new fitness function specifically tailored to promote aggregation and tested it in a simple robot aggregation scenario. Next, we adapt the fitness function originally proposed in [47] by replacing only its aggregation reward component with our new formulation. Our results show that (a) in the simple robotic aggregation scenario, the new fitness function eficiently evolves efective aggregation behaviors; (b) in the AntBullet Swarm scenario, the new adapted function shows improvements in aggregation, although the swarm does not yet display fully coordinated aggregation.

The rest of the paper starts with an overview of related works in the field of MAS, EAs and MOO (Section 2), with a specific focus on aggregation. Then, the problems addressed and the experimental settings are described (Section 3). In Section 4, we present the quantitative and qualitative outcomes of our experiments. Finally, Section 5 contains our final remarks and possible future research directions.

2. Background

Aggregation is a process in which individuals gather in groups for specific purposes [ 50], such as protection against predators [51] or speeding up for food [52]. This phenomenon is frequently observed in nature, both in microorganisms (e.g., Dictyostelium discoideum [53], Capsaspora owczarzaki [54] and Brachionus calyciflorus [55]) and in complex animals (e.g., flocks of birds [ 56] and schools of ifsh [ 57, 58 ]). Aggregation represents one of the fundamental collective behaviors, as it gives rise to various cooperative behaviors [18], such as coordinated movement [ 59 ], self-assembly [ 60 ] or collective transport [ 61 ]. A specific body of research focuses on self-organized aggregation [ 18], in which the individuals of the group aggregate autonomously, without any central control. This kind of aggregation allows the development of control systems that are robust to partial failure, flexible and scalable, and can be performed using only local interactions between individuals. In biological systems, self-organized aggregation manifests through either positive or negative feedback mechanisms [18]: the former can occur as an attraction force towards a given signal source, while the latter acts as a regulatory or repulsive mechanism between individuals.

The efectiveness and evolutionary importance of this behavior have prompted research to replicate it in simulated environments, following the principles of swarm robotics [25, 26]. Several studies have used evolutionary algorithms [29], and specifically evolutionary strategies [ 62, 63 ], to develop aggregation behaviors in MASs or robot swarms. Evolutionary algorithms — translating the principles of natural evolution into algorithms — are promising solutions for the automatic learning of complex control policies that are dificult to design manually from scratch. In [ 64 ], diferent algorithms, including CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [ 65 ], GA (Genetic Algorithm) [ 66 ] and OpenAI-ES [48], were compared on an aggregation task. The study evaluates convergence time, policy quality, scalability and generalization for swarms of diferent sizes. In general, all the algorithms proved to be efective in completing the task, with diferences in the stability of the aggregated cluster and the aggregation times, especially for smaller swarms (5, 10 and 20 robots). In [15], CMA-ES and xNES (Exponential Natural Evolution Strategies) [ 67 ] were compared on a specific aggregation task, highlighting the importance of communication. Similarly, a comparison of the diferent aggregation behaviors evolved by CMA-ES, xNES and OpenAI-ES under diferent experimental conditions is proposed forming a cross. The index of the central robot is 1. Starting from the robot placed in the upper right corner, the indices are progressively increased in a counter-clockwise direction. in [16]. Finally, the paper [ 68 ] demonstrates the validity of an automatic approach based on evolutionary optimization via PSO (Particle Swarm Optimization) [ 69 ] to design interpretable and scalable PFSM (Probabilistic Finite State Machines) controllers for the fundamental task of aggregation in swarm robotics. The use of this type of controllers ofers greater policy interpretability and potentially better transferability to real-world robotic environments compared to the typically used neural network controllers.

The works discussed so far focus on the optimization of a single objective, known as Mono-Objective Optimization. In such problems, there exists at least one optimal solution, as well as multiple equivalent solutions. Instead, when the problem involves multiple and potentially conflicting objectives, as previously mentioned, we speak of Multi-Objective Optimization (MOO) [45]. Compared to the previous scenario, solving a MOO problem poses significant challenges, since the aim is to obtain the optimal solution for all the objectives simultaneously, which is not feasible. In this case, the possible solutions, called Pareto-optimal [46], are potentially infinite and lie on the Pareto front. The choice of one of these solutions depends on the subjective preferences of a human decision maker. In contrast to the problem presented in [47], another approach that investigated the development of aggregation, in conjunction with other tasks, is the one illustrated in [ 70 ], where a decentralized algorithm for robotic swarms based on limited local interactions is introduced and applied to a MOO problem in which agents are rewarded for their ability to aggregate and avoid obstacles. The approach, tested both in simulation and in real-world settings, has been proven to be efective in developing cohesive behavior and dynamic obstacle avoidance.

3. Materials and Methods 3.1. Robot Aggregation problem

1, is ( 1, 1) = (2.5, 2.5) The first scenario involves a group of 5 MarXbot robots [ 71 ] that are placed in a square-walled arena of 5 × 5

. The initial configuration of the robots is shown in Fig. 1: the central robot is located at the center of a theoretical square, whose vertices represent the initial positions of the other robots. The radius of each MarXbot robot ( ) is 8.5 . The position of the central robot, labeled with index . The initial locations of the other robots, whose indices are increased counter-clockwise, are determined according to the following rules (Eq. 1): consisting of 4 sectors of 90∘ each. The camera enables robots to detect colored objects in the environment. Additionally, each robot can turn on/of both frontal and rear LEDs (red and blue, respectively) in order to signal its presence to the other robots.

The robots’ goal is to aggregate within the arena. To this end, we defined the fitness function as reported in Eqs. 2 - 4: ̂ = −1 ∑ | − , | 1 − 1 =1 = × = − ̂2 2 ∑ 1 =1

In Eqs. 2 - 4, the symbol , indicates the distance between the agents and (with ≠ ), ̂ is the resulting average distance for the − ℎ agent and is the number of robots. The parameter represents the target distance considered suficient to solve the problem (in our experiments, we set = 1.5

). The symbols and denote, respectively, the maximum value achieved by the function when the target distance is reached and the standard deviation of the Gaussian function. As regards the experiments reported here, we used the values = 7 and = 8 . Finally, the symbol represents the distance reward (see Fig. 2). This function was chosen empirically after a preliminary investigation phase (see Fig. 3), during which we observed that is characterized by a more gradual slope than alternative functions, while still maintaining the desired maximization.

A feed-forward neural network controls each robot. The network has 12 inputs, one layer of 10 hidden neurons and 4 outputs. The hidden and output neurons have biases, and their activation function is the tanh. The infrared sensors and the camera provide input data feeding the neural network, which performs its computation and produces two outputs to control the wheel speeds, and two outputs to control the frontal and rear LEDs.

3.2. AntBullet Swarm problem

provided in Fig. 4. we applied the following modifications: The second scenario is notably more challenging and requires dealing with a MOO problem. In particular, we consider the problem introduced in [47], which involves a MAS composed of 5 AntBullet robots that must aggregate while locomoting in an unbounded environment. A snapshot of the problem is The agents’ goal is to learn how to aggregate and locomote. Due to the pitfalls highlighted in [47], • we split the evolutionary process into two phases and adopted an incremental evolution approach [ 72 ]. During the first phase, agents learn only to locomote; in the second phase, instead, they have to aggregate while locomoting; i 4 D

• we modified the aggregation reward function by using the one defined in Eq. 3; • during the second phase, the reward for locomotion is notably reduced — but not removed — in order to make the agents exploit the skills acquired in the first phase.

This latter decision was made because existing studies [ 34, 73 ] have shown that training agents to learn two diferent skills sequentially may lead to a “vanishing” phenomenon, where previously acquired capabilities are forgotten.

Therefore, with the aim to promote the evolution of successful strategies, the fitness function has been designed according to Eqs. 5 - 7:

= 1 = ⎧ 1 if < 2 ⎨ ⎩ 2 if ≥ 2 1

∑ + + =1 2 = 1

∑ 0.1 × + + + =1 (5) (6) (7)

The symbol in Eq. 5 represents the total number of evaluation steps performed during evolution. The symbols , and in Eqs. 6 - 7 indicate, respectively, the progress reward (i.e., how much the agent locomotes), the stall cost related to the magnitude of the actions performed by agent , and the number of joints extended at their limits. In Eq. 7, is the distance reward computed according to Eq. 3 (diferently from the previous scenario, here we set the target distance to 1.5 meters, coherently with [47]). We modified this component to improve the performance of the aggregation shown in [47], aiming to address the challenges of MOO more efectively. In fact, as pointed out by the authors, i 4 D M × e di M × e di the original fitness function (see [ 47], Eqs. 5-6) was inefective due to the diferent magnitudes of the components and . Moreover, locomotion is easier to achieve than aggregation: a robot can locomote regardless of the others, whereas aggregation involves the need to interact and coordinate with others. Consequently, the outcomes presented in [47] show that agents, evolved with OpenAI-ES, are characterized by good locomotion capabilities, but do not aggregate.

In addition to the fitness modification, we endowed the robots with an omni-directional camera capable of detecting other robots within an 8-meter range. The camera view is split into 6 sectors of 60∘ each. This sensor has been designed to make each robot capable of perceiving the presence of nearby mates, hence encouraging the evolution of more efective aggregation behaviors. The information provided by the camera input for a generic robot is defined as follows: if robot detects a teammate , the corresponding sector is activated and returns a value defined according to Eq. 8: = (1.0 −

) , (8) where the symbol , indicates the distance between the robots and and is the maximum distance (with = 8). A detailed list of the robot’s equipment is provided in Table 1.

The robot controller is a feed-forward neural network with 34 input neurons, an internal layer containing 20 hidden neurons, and 8 output neurons. All neurons have associated biases. The activation function of hidden neurons is the tanh function, while the activation function of output neurons is the linear function.

3.3. Experimental setting

For both scenarios, the OpenAI Evolutionary Strategy (OpenAI-ES) [48] was employed, as it represents a modern and quite sophisticated algorithm for evolving successful locomotion [ 48, 74, 75 ] and aggregation [ 14, 16, 64 ] behaviors. OpenAI-ES works by initializing a single solution, called centroid, which encodes the connection weights of the neural network controller determining the robot’s behaviors. The centroid

Sensor − _ _ 0.3 × 0.3 × 0.3 × Motor ) )

ID 6 7 8–23 24–27 28–33 0–7 ∈

The environment is unbounded, allowing the agents to move freely.

Robot’s equipment (i.e., sensors and motors). The symbol indicates the agent’s height, while the symbol represents the initial height of the agent. The symbol denotes the relative angle between the _ robot and the target (for locomotion). The symbols , , are the velocity components along the three axes.

The symbols

and ℎ represent the robot’s orientation. The symbol ( , ) identifies the position and Lastly, the symbol indicates the torque applied to joint . velocity of the joint . The symbol ground ( _ = 1) or not ( _ _

represents a flag indicating whether the foot touches the = 0), while denotes the input from sector of the camera.

Sensor ℎ _ /ℎ ( , ) ∈ ∈ _ , _ _ _ ∈ [ _ /ℎ _ ] is iteratively updated through an advanced process consisting of mirrored sampling [ 76 ], gradient estimation, and optimization using the Adam optimizer [ 77 ]. The algorithm is illustrated in Fig. 5.

In more detail, OpenAI-ES seeks to identify the most promising areas of the solution space (i.e., the connection weights corresponding to better displayed behaviors), so that evolution is targeted toward those regions, increasing the chance to discover efective strategies for the considered problem. To this end, OpenAI-ES evaluates samples in both directions (i.e., mirrored sampling [ 76 ]) and ranks them based on fitness. This process aims to reveal the existence of relationships between the weights and the ifnal performance. Lastly, OpenAI-ES performs a gradient estimation based on the fitness ranking and updates the centroid using the Adam optimizer [ 77 ], which retains historical data through a pair of initialize centroid max number of generations reached

sample from Gaussian distribution update centroid using

Adam optimizer

Yes End

generate offspring (symmetric sampling)

estimate gradient evaluate offspring through fitness function rank offspring by evaluation values momentum vectors (mean and variance).

The experiments were conducted using evorobotpy3 [ 78 ], a modern simulation tool that contains the implementations of a variety of EAs and some predefined problems. Moreover, it is integrated with libraries such as Gymnasium [ 79 ] and Pybullet [49], enabling users to easily create customized simulations.

A detailed list of the parameters used for both scenarios is provided in Table 2.

4. Results 4.1. Robot Aggregation problem

In the first scenario, we aim to test whether the distance reward , defined in Eq. 3, enables the discovery of efective aggregation behaviors. By examining the performance of the MAS at the end of the evolutionary process, we can see that the average fitness is 1.190 (see Fig. 6), with a standard deviation of 0.028. During the evolutionary process, the analysis of the fitness curve reveals that OpenAI-ES rapidly improves performance in the initial steps and stabilizes at around 5 × 108 steps (see Fig. 6). This implies that the algorithm quickly finds good solutions, whereas the refinement of the discovered strategies requires a longer evolutionary process. If we analyze the final aggregation achieved by the agents, we can observe that the robots manage to discover strategies that ultimately lead to swarm aggregation (see Fig. 7). In particular, the MAS forms a more compact group, located around the center of the arena. This underscores that the distance reward fosters the development of efective aggregation behaviors.

(a) (b) (c) 0.0 0.2 0.4

4.2. AntBullet Swarm problem

Based on the considerations reported in the previous section, we investigated whether the fitness function defined in Eq. 5 allows the OpenAI-ES algorithm to evolve strategies in which robots aggregate during locomotion. As we already pointed out in Section 3, we divided the evolutionary process into two separate phases and exploited incremental evolution to enhance agents’ performance and capabilities. The mean fitness at the end of evolution is 4444.679 (standard deviation 230.541) and indicates that OpenAI-ES discovers behavioral strategies in which agents exhibit a good locomotion behavior (see the ifrst half of the evolutionary process in Fig. 8). The average fitness obtained before the switch (indicated by the vertical line in Fig. 8) is 1655.366, with a standard deviation of 832.893. These results align with those reported in [47], although the diferent number of inputs and hidden neurons prevents a direct comparison.

It is worth noting that the ability to locomote is independent of other agents. However, being able to locomote is crucial for the development of aggregation behaviors. In the second phase of the evolutionary process, thanks to the use of the reward function defined in Eq. 3, the agents increase their performance soon after the switch (see the second half of the evolutionary process in Fig. 8), although the improvements in aggregation capability are quite limited. This underscores the dificulty of designing adequate and efective reward functions for MOO problems.

Furthermore, if we examine the final positions of the agents (Fig. 9), we can observe that the aggregation reached by the MAS here is less efective than the previous scenario. In fact, agents fail to move close to one another. Sometimes, the majority of agents succeed in approaching each other (Fig. 9-(a)), while in other cases the MAS disperses (Fig. 9-(b)). Finally, in most cases, the final configuration of the MAS looks similar to the initial cross formation (Fig. 9-(c)). Overall, the modified iftness function and the added camera sensor, useful to perceive the others, allow slight improvements compared to the results reported in [47], where the authors underline the complete absence of such a capability. Nonetheless, the function defined in Eq. 3 does not lead to further enhancements with respect to the aggregation capability.

(a) (b) (c)

To reinforce the aggregation analysis, we calculated the dispersion metric (see [82]), which assesses swarm cohesion and is defined in Eq. 9. This metric was introduced in our previous studies ([14, 15]: = 1 ∑ || − || ̄ 2 (9)

4 2 =1

Eq. 9 takes into account the final spatial arrangement of the whole group, where denotes the position of agent , ̄ indicates the COG of the swarm, refers to the robot radius. The notation || ⋅ || denotes the Euclidean norm.

Thus, to analyze the dynamics of the solutions, Fig. 10 illustrates how dispersion varies during the evaluation of the swarm. As can be seen, the dispersion generally increases on average, reaching a final value of 222.868. This implies that the swarm is dispersing, as the agents have a higher propensity to locomote and are unable to aggregate properly. Examining the dispersion values achieved by the swarm reported in Fig. 11, we observe that the group tends to increase its cohesion, with final dispersion values of 85.621 (Fig. 11-(a)) and 97.025 (Fig. 11-(b)). Therefore, the MOO problem can be addressed more efectively in the best cases, with the agents capable of achieving a higher level of aggregation.

Interestingly, the adoption of the incremental evolution paradigm improves locomotion. Specifically, some agents exhibit refined gaits that make use of all their legs to move (see behavior at https://youtu. be/gW_LdjBOIbs). This allows overcoming the local minima reported in [47], where at least one leg remained extended to maintain stability by avoiding falling. Another discovered locomotion strategy resembles a horse’s gallop (see behavior at https://youtu.be/MRquB4HhEFo); indeed, moving quickly clearly helps to maximize the component in Eqs. 6 - 7. However, this type of locomotion conflicts with the objective of aggregation with others, as the rapid movement tends to reduce coordination among agents.

5. Conclusions

In a Multi-Agent System (MAS), agents interact with other entities in order to solve problems that may be very dificult, if not impossible, for a single agent to address. While solving problems consisting of a single objective might be relatively trivial for a MAS, dealing with Multi-Objective Optimization (MOO) is more challenging. It requires the design of appropriate fitness functions and/or the usage of suitable methods in order to make agents able to evolve efective behaviors. In fact, MOO is characterized by conflicting objectives that should be optimized simultaneously, which requires the identification of compromise solutions that perform well across all objectives. For example, evolving both locomotion and aggregation behaviors is particularly challenging, as demonstrated in [47].

In this work, we delve into the analysis of methods that allow the evolution of aggregation in a robotic MAS, particularly focusing on how conflicting objectives can interfere with the evolution of collective behaviors. In more detail, we design a new function (Eq. 3), specifically tailored to promote aggregation, and we test it in two diferent scenarios. The first one is a Mono-Objective Steps

Optimization problem where a MAS of 5 MarXbot robots must develop an aggregation capability. The second scenario involves a MOO problem, the AntBullet Swarm problem introduced in [47], in which 5 AntBullet robots must evolve the ability to aggregate while locomoting. In order to foster the development of aggregation behaviors in the latter scenario, we adopted an incremental evolution framework by splitting evolution into two distinct phases. In the first phase, agents must evolve only a locomotion capabilities, which are mandatory to aggregate with others. Instead, in the second phase, agents must evolve both aggregation and locomotion simultaneously. For this purpose, we modified the fitness function defined in [ 47] by using the new function introduced here and endowing agents with an omni-directional camera that allows them to perceive others. The results indicate that the function successfully promotes aggregation in the Mono-Objective Optimization scenario. Moreover, it slightly improves the aggregation outcomes with respect to [47], although the agents fail in discovering behavioral strategies that optimize both objectives (i.e., aggregation and locomotion). Finally, the modifications to the AntBullet Swarm problem allow the evolution of an improved locomotion capabilities, which resemble strategies observed in natural organisms.

For future research, we plan to further investigate the design of functions that enable the evolution of efective aggregation behavioral strategies in the AntBullet Swarm problem, as well as the adoption of diferent frameworks, like ontogenetic approaches [ 83], to optimize the agent’s neural network controller. In this respect, using learning methods, like back-propagation [84] or Spike-Timing-Dependent Plasticity (STDP) [85, 86], could promote the diferentiation and/or specialization of agents, ultimately leading to better aggregation capabilities. In addition, employing groups of heterogeneous agents [22] may be valuable for promoting diversity within the MAS. Moreover, more studies will focus on all aspects that could afect the generalizability of learned behaviors. Investigating how performance varies in response to changes in initial conditions, sensory inputs or parameters will reinforce the validity of our approach.

Acknowledgments

The work of A.V. was supported by the National Recovery and Resilience Plan (PNRR)-Ministry of University and Research (MUR) Project through FAIR–Future Artificial Intelligence Research under Grant PE0000013-CUP B53D22000980006.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools. ference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), 2025. [11] P. Pagliuca, M. Favia, S. Livi, A. Vitanza, Interdipendenza nei gruppi: esperimenti con robot sociali,

Sistemi intelligenti 2 (2025) 335–355. [12] E. Bahgeçi, E. Sahin, Evolving aggregation behaviors for swarm robotic systems: A systematic case study, in: Proc. IEEE Swarm Intelligence Symposium, 2005, pp. 333–340. [13] M. Hiraga, Y. Wei, K. Ohkura, Evolving collective cognition for object identification in foraging robotic swarms, Artificial Life and Robotics 26 (2021) 21–28. [14] P. Pagliuca, A. Vitanza, Self-organized aggregation in group of robots with OpenAI-ES, in: Int.

Conf. on Soft Computing and Pattern Recognition, Springer, 2022, pp. 770–780. [15] P. Pagliuca, A. Vitanza, Evolving aggregation behaviors in swarms from an evolutionary algorithms point of view, in: Applications of Artificial Intelligence and Neural Systems to Data Science, Springer, 2023, pp. 317–328. [16] P. Pagliuca, A. Vitanza, A comparative study of evolutionary strategies for aggregation tasks in robot swarms: Macro- and micro-level behavioral analysis, IEEE Access 13 (2025) 72721–72735. [17] D. H. Stolfi, G. Danoy, Evolutionary swarm formation: From simulations to real world robots,

Engineering Applications of Artificial Intelligence 128 (2024) 107501. [18] V. Trianni, R. Groß, T. H. Labella, E. Şahin, M. Dorigo, Evolving aggregation behaviors in a swarm of robots, in: European Conference on Artificial Life, Springer, 2003, pp. 865–874. [19] H. M. La, R. S. Lim, W. Sheng, J. Chen, Cooperative flocking and learning in multi-robot systems for predator avoidance, in: 2013 IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, IEEE, 2013, pp. 337–342. [20] H. M. La, R. Lim, W. Sheng, Multirobot cooperative learning for predator avoidance, IEEE

Transactions on Control Systems Technology 23 (2014) 52–63. [21] J. Li, S. X. Yang, Intelligent collective escape of swarm robots based on a novel fish-inspired self-adaptive approach with neurodynamic models, IEEE Transactions on Industrial Electronics (2024). [22] P. Pagliuca, A. Vitanza, N-mates evaluation: a new method to improve the performance of genetic algorithms in heterogeneous multi-agent systems., Proceedings of the 24th Edition of the Workshop From Object to Agents (WOA23) 3579 (2023) 123–137. [23] P. Pagliuca, A. Vitanza, The role of n in the n-mates evaluation method: a quantitative analysis, in: 2024 Artificial Life Conference (ALIFE 2024), MIT press, 2024, pp. 812–814. [24] M. Brambilla, E. Ferrante, M. Birattari, M. Dorigo, Swarm robotics: a review from the swarm engineering perspective, Swarm Intelligence 7 (2013) 1–41. [25] E. Şahin, Swarm robotics: From sources of inspiration to domains of application, in: International workshop on swarm robotics, Springer, 2004, pp. 10–20. [26] H. Hamann, Swarm robotics: A formal approach, volume 221, Springer, 2018. [27] M. Dorigo, G. Theraulaz, V. Trianni, Swarm robotics: Past, present, and future [point of view],

Proceedings of the IEEE 109 (2021) 1152–1165. [28] N. Horsevad, H. L. Kwa, R. Boufanais, Beyond bio-inspired robotics: how multi-robot systems can support research on collective animal behavior, Frontiers in Robotics and AI 9 (2022) 865414. [29] T. Back, Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms, Oxford university press, 1996. [30] F. J. Gomez, R. Miikkulainen, et al., Solving non-markovian control tasks with neuroevolution, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), volume 99, 1999, pp. 1356–1361. [31] C. Igel, Neuroevolution for reinforcement learning using evolution strategies, in: The 2003

Congress on Evolutionary Computation, 2003. CEC’03., volume 4, IEEE, 2003, pp. 2588–2595. [32] P. Pagliuca, N. Milano, S. Nolfi, Maximizing adaptive power in neuroevolution, PloS one 13 (2018) e0198788. [33] C. Lamini, S. Benhlima, A. Elbekri, Genetic algorithm based approach for autonomous mobile robot path planning, Procedia Computer Science 127 (2018) 180–189. [34] P. Pagliuca, S. Nolfi, Integrating learning by experience and demonstration in autonomous robots,

Adaptive Behavior 23 (2015) 300–314. [35] A. Ram, G. Boone, R. Arkin, M. Pearce, Using genetic algorithms to learn reactive control parameters for autonomous robotic navigation, Adaptive behavior 2 (1994) 277–305. [36] P. Pagliuca, D. Y. Inglese, The importance of functionality over complexity: A preliminary study on feed-forward neural networks, in: Advanced Neural Artificial Intelligence: Theories and Applications, Springer, 2025, pp. 447–458. [37] S. M. Elsayed, R. A. Sarker, D. L. Essam, A new genetic algorithm for solving optimization problems,

Engineering Applications of Artificial Intelligence 27 (2014) 57–69. [38] P. Pagliuca, Analysis of the exploration-exploitation dilemma in neutral problems with evolutionary algorithms, Journal of Artificial Intelligence and Autonomous Intelligence 1 (2024) 8. [39] S. Nolfi, D. Floreano, Coevolving predator and prey robots: Do “arms races” arise in artificial evolution?, Artificial life 4 (1998) 311–335. [40] S. Nolfi, P. Pagliuca, Global progress in competitive co-evolution: a systematic comparison of alternative methods, Frontiers in Robotics and AI 11 (2025) 1470886. [41] K. Deb, Multi-objective optimisation using evolutionary algorithms: an introduction, in: Multiobjective evolutionary optimisation for product design and manufacturing, Springer, 2011, pp. 3–34. [42] C. M. Fonseca, P. J. Fleming, An overview of evolutionary algorithms in multiobjective optimization,

Evolutionary computation 3 (1995) 1–16. [43] J. Horn, N. Nafpliotis, D. E. Goldberg, A niched pareto genetic algorithm for multiobjective optimization, in: Proceedings of the first IEEE conference on evolutionary computation. IEEE world congress on computational intelligence, Ieee, 1994, pp. 82–87. [44] E. Zitzler, Evolutionary algorithms for multiobjective optimization: Methods and applications, volume 63, Shaker Ithaca, 1999. [45] K. Deb, K. Sindhya, J. Hakanen, Multi-objective optimization, in: Decision sciences, CRC Press, 2016, pp. 161–200. [46] Y. Censor, Pareto optimality in multiobjective problems, Applied Mathematics and Optimization 4 (1977) 41–59. [47] P. Pagliuca, A. Vitanza, Enhancing aggregation in locomotor multi-agent systems: a theoretical framework, Proceedings of the 25th Edition of the Workshop From Object to Agents (WOA24) 3735 (2024) 42–57. [48] T. Salimans, J. Ho, X. Chen, S. Sidor, I. Sutskever, Evolution strategies as a scalable alternative to reinforcement learning, arXiv preprint arXiv:1703.03864 (2017). [49] E. Coumans, Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016. [50] Z. Firat, E. Ferrante, Y. Gillet, E. Tuci, On self-organised aggregation dynamics in swarms of robots with informed robots, Neural Computing and Applications 32 (2020) 13825–13841. [51] J. Menezes, E. Rangel, B. Moura, Aggregation as an antipredator strategy in the rock-paper-scissors model, Ecological Informatics 69 (2022) 101606. [52] J. Nauta, P. Simoens, Y. Khaluf, Memory induced aggregation in collective foraging, in: International conference on swarm intelligence, Springer, 2020, pp. 176–189. [53] T. M. Konijn, K. B. Raper, Cell aggregation in dictyostelium discoideum, Developmental Biology 3 (1961) 725–756. [54] R. Q. Kidner, E. B. Goldstone, H. J. Rodefeld, L. P. Brokaw, A. M. Gonzalez, N. Ros-Rocher, J. P. Gerdt, Exogenous lipid vesicles induce endocytosis-mediated cellular aggregation in a close unicellular relative of animals, bioRxiv (2024) 2024–05. [55] S.-H. Cheng, H.-Y. Zhang, M.-Y. Zhu, L. M. Zhou, G.-H. Yi, X.-W. He, J.-Y. Wu, J.-L. Sui, H. Wu, S.-J.

Yan, et al., Observations of linear aggregation behavior in rotifers (brachionus calyciflorus), PLoS One 16 (2021) e0256387. [56] A. Cavagna, A. Cimarelli, I. Giardina, G. Parisi, R. Santagati, F. Stefanini, M. Viale, Scale-free correlations in starling flocks, Proceedings of the National Academy of Sciences 107 (2010) [80] X. Glorot, Y. Bengio, Understanding the dificulty of training deep feedforward neural networks, in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 249–256. [81] S. Iofe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167 (2015). [82] M. Gauci, J. Chen, W. Li, T. J. Dodd, R. Groß, Self-organized aggregation without computation,

The Int. Jrnl. of Robotics Research 33 (2014) 1145–1161. [83] D. Floreano, P. Husbands, S. Nolfi, Evolutionary robotics, Handbook of robotics (2008). [84] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors,

Nature 323 (1986) 533–536. [85] S. Song, K. D. Miller, L. F. Abbott, Competitive hebbian learning through spike-timing-dependent synaptic plasticity, Nature neuroscience 3 (2000) 919–926. [86] A. Vitanza, L. Patané, P. Arena, Spiking neural controllers in multi-agent competitive systems for adaptive targeted motor learning, Journal of the Franklin Institute 352 (2015) 3122–3143.

[1]

Dorri ,

S. S.

Kanhere ,

Jurdak , Multi-agent systems: A survey , IEEE Access 6 ( 2018 ) 28573 - 28593 .

[2]

Julian , V. Botti, Multi-agent systems , Applied Sciences 9 ( 2019 ) 1402 .

[3]

Van der Hoek , M. Wooldridge, Multi-agent systems , Foundations of Artificial Intelligence 3 ( 2008 ) 887 - 928 .

[4]

Aldana-Franco ,

Montes-González , S. Nolfi, The improvement of signal communication for a foraging task using evolutionary robotics , Journal of Applied Research and Technology 22 ( 2024 ) 90 - 101 .

[5]

Calvez , G. Hutzler, Automatic tuning of agent-based models using genetic algorithms , in: International workshop on multi- agent systems and agent-based simulation , Springer, 2005 , pp. 41 - 57 .

[6]

Hiraga ,

Wei ,

Ohkura , Evolving collective cognition of robotic swarms in the foraging task with poison , in: 2019 IEEE Congress on Evolutionary Computation (CEC) , IEEE, 2019 , pp. 3205 - 3212 .

[7]

Pagliuca ,

Nolfi , Robust optimization through neuroevolution , PLOS ONE 14 ( 2019 ) 1 - 27 .

[8]

Pagliuca ,

Inglese ,

Vitanza , Measuring emergent behaviors in a mixed competitivecooperative environment , International Journal of Computer Information Systems and Industrial Management Applications 15 ( 2023 ) 69 - 86 .

[9]

Pagliuca , Learning and evolution: factors influencing an efective combination , AI 5 ( 2024 ) 2393 - 2432 .

[10]

Pagliuca ,

Favia ,

Livi ,

Vitanza , Conceptualizing evolving interdependence in groups: Insights from the analysis of two-agent systems , in: Proceedings of the 21st International Con11865-11870.

[57]

B. L.

Partridge ,

T. J.

Pitcher , The sensory basis of fish schools: relative roles of lateral line and vision , Journal of comparative physiology 135 ( 1980 ) 315 - 325 .

[58]

T. J.

Pitcher ,

B. L.

Partridge ,

Wardle , A blind fish can school , Science 194 ( 1976 ) 963 - 965 .

[59]

Shibata , T. Fukuda, Coordinative behavior in evolutionary multi-agent system by genetic algorithm , in: IEEE International Conference on Neural Networks, IEEE , 1993 , pp. 209 - 214 .

[60] R. O'Grady , R.

Groß , A. L.

Christensen , M.

Dorigo , Self-assembly strategies in a group of autonomous mobile robots , Autonomous Robots 28 ( 2010 ) 439 - 455 .

[61]

Asad ,

Hayakawa , T. Yasuda, Evolutionary design of cooperative transport behavior for a heterogeneous robotic swarm , Journal of Robotics and Mechatronics 35 ( 2023 ) 1007 - 1015 .

[62] I. Rechenberg , Evolutionsstrategie: Optimierung technischer systeme nach prinzipien der biologischen evolution, frommann-holzboog, Stuttgart-Bad Cannstatt ( 1973 ) 47 .

[63]

H.-P.

Schwefel , Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie: mit einer vergleichenden Einführung in die Hill-Climbing-und Zufallsstrategie , volume 1 , Springer, 1977 .

[64]

J. Rais

Martínez ,

Aznar Gregori , Comparison of evolutionary strategies for reinforcement learning in a swarm aggregation behaviour , in: Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence , 2020 , pp. 40 - 45 .

[65]

Hansen ,

Ostermeier , Completely derandomized self-adaptation in evolution strategies , Evolutionary computation 9 ( 2001 ) 159 - 195 .

[66]

J. H.

Holland , Adaptation in natural and artificial systems , University Michigan Press, 1975 .

[67]

Wierstra ,

Schaul ,

Glasmachers ,

Sun ,

Peters ,

Schmidhuber , Natural evolution strategies , The Journal of Machine Learning Research 15 ( 2014 ) 949 - 980 .

[68]

Katada , Evolutionary design method of probabilistic finite state machine for swarm robots aggregation , Artificial Life and Robotics 23 ( 2018 ) 600 - 608 .

[69]

Kennedy ,

Eberhart , Particle swarm optimization , in: Proceedings of ICNN'95-international conference on neural networks , volume 4 , ieee, 1995 , pp. 1942 - 1948 .

[70]

Leccese ,

Gasparri ,

Priolo , G. Oriolo,

Ulivi , A swarm aggregation algorithm based on local interaction with actuator saturations and integrated obstacle avoidance , in: 2013 IEEE International Conference on Robotics and Automation , IEEE, 2013 , pp. 1865 - 1870 .

[71]

Bonani ,

Longchamp ,

Magnenat ,

Rétornaz ,

Burnier ,

Roulet ,

Vaussard ,

Bleuler ,

Mondada , The marxbot, a miniature mobile robot opening new perspectives for the collectiverobotic research , in: 2010 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems , IEEE, 2010 , pp. 4187 - 4193 .

[72]

Gomez ,

Miikkulainen , Incremental evolution of complex general behavior , Adaptive Behavior 5 ( 1997 ) 317 - 342 .

[73]

Kober ,

J. A.

Bagnell ,

Peters , Reinforcement learning in robotics: A survey , The International Journal of Robotics Research 32 ( 2013 ) 1238 - 1274 .

[74]

Pagliuca ,

Milano ,

Nolfi , Eficacy of modern neuro-evolutionary strategies for continuous control optimization , Frontiers in Robotics and AI 7 ( 2020 ) 98 .

[75]

Pagliuca , S. Nolfi,

The dynamic of body and brain co-evolution, Adaptive Behavior 30 (

2022 ) 245 - 255 .

[76]

Brockhof ,

Auger ,

Hansen ,

D. V.

Arnold , T. Hohm, Mirrored sampling and sequential selection for evolution strategies , in: International Conference on Parallel Problem Solving from Nature , Springer, 2010 , pp. 11 - 21 .

[77]

D. P.

Kingma ,

Ba , Adam: A method for stochastic optimization , preprint arXiv:1412.6980 ( 2014 ).

[78]

Pagliuca ,

Nolfi , A. Vitanza, Evorobotpy3: a flexible and easy-to-use simulation tool for evolutionary robotics , in: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2025 Companion) , 2025 .

[79]

Towers ,

Kwiatkowski ,

Terry ,

J. U.

Balis , G. De Cola,

Deleu ,

Goulao ,

Kallinteris ,

Krimmel ,

KG , et al., Gymnasium: A standard interface for reinforcement learning environments , preprint arXiv:2407.17032 ( 2024 ).